Skip to main navigation Skip to search Skip to main content

GraphQL-Aware Healing in Service-Oriented Architectures via Multi-Signal Learning

  • Nariman Mani
  • , Salma Attaranasl
  • , Sen He

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper introduces an adaptive test and runtime healing approach that delivers resolver-level resilience for GraphQL service-oriented architectures by unifying three telemetry streams: semantic log embeddings obtained from large language models, structural dependencies encoded via graph neural networks, and statistically grounded operational metrics. These signals are fused into a single reinforcement learning state vector, enabling a deep Q-network to learn context-aware recovery actions including selective retry, safe skip, dependency reordering, and escalation without obscuring root causes. The approach is evaluated in a production-grade case study involving a real-world lifestyle coaching platform used by thousands of active users. The application's asynchronous, cloud-native architecture with complex resolver interactions and AI-powered personalization provides a realistic and challenging environment for assessing the system's robustness. Across more than one thousand simulated failure episodes that inject realistic cloud uncertainty, the approach improves test and runtime success rates from 68.7% to 92%, reduces mean-time-to-recovery from 687 ms to 203 ms, and trims CI compute time by 61% using a KL-stability early-stop rule. It also preserves tail-latency accuracy within a 5% error bound while incurring only 11.8 ms median inference overhead per healed request. These results demonstrate that statistically principled, reinforcement-learning-driven healing offers a practical, fine-grained self-recovery solution for serviceoriented systems deployed in modern, real-world cloud applications.

Original languageEnglish (US)
Title of host publicationProceedings - 19th IEEE International Conference on Service-Oriented System Engineering, SOSE 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages140-150
Number of pages11
ISBN (Electronic)9798331589110
DOIs
StatePublished - 2025
Externally publishedYes
Event19th IEEE International Conference on Service-Oriented System Engineering, SOSE 2025 - Tucson, United States
Duration: Jul 21 2025Jul 24 2025

Publication series

NameProceedings - 19th IEEE International Conference on Service-Oriented System Engineering, SOSE 2025

Conference

Conference19th IEEE International Conference on Service-Oriented System Engineering, SOSE 2025
Country/TerritoryUnited States
CityTucson
Period7/21/257/24/25

Keywords

  • Adaptive Test Healing
  • Flaky Tests
  • Graph Neural Networks (GNN)
  • Large Language Models (LLMs)
  • Reinforcement Learning (RL)

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management
  • Control and Optimization

Fingerprint

Dive into the research topics of 'GraphQL-Aware Healing in Service-Oriented Architectures via Multi-Signal Learning'. Together they form a unique fingerprint.

Cite this