Event-Driven Remediation with Azure Service Bus and Ansible - Solution Guide

Overview

Organizations running workloads on Microsoft Azure use Azure Service Bus as their enterprise messaging backbone – routing events, alerts, and telemetry between services, monitoring tools, and operational systems. When Azure Monitor, Defender for Cloud, or a custom application publishes a critical event to a Service Bus queue or topic, the response today is manual: an engineer reads the message, investigates the affected resource, and remediates by hand.

This guide demonstrates how to connect Azure Service Bus Queues directly to Event-Driven Ansible (EDA), creating a real-time pipeline that consumes Azure events, enriches them with AI-driven root cause analysis, and triggers automated remediation – turning Azure’s messaging layer into the trigger for a closed-loop AIOps workflow.

This guide builds on the AIOps reference architecture.

For the full end-to-end AIOps pipeline – including AI inference, Lightspeed playbook generation, and the Crawl/Walk/Run maturity model – see AIOps automation with Ansible. This guide focuses specifically on using Azure Service Bus as the event transport layer.

Overview
Background
Solution
- Who Benefits
Prerequisites
Azure Service Bus to Ansible Workflow
- Operational Impact per Stage
- Workflow Diagram
Solution Walkthrough
Validation
- Troubleshooting
Maturity Path
Related Guides
Summary

Background

Azure Service Bus is a fully managed enterprise messaging service from Microsoft that supports message queuing and publish-subscribe patterns. It decouples event producers (monitoring tools, applications, Azure services) from consumers (automation systems, dashboards, alerting tools), providing reliable message delivery with features like dead-letter queues, sessions, and scheduled delivery.

Azure Service Bus Messaging – microsoft.com

In an AIOps context, Azure Service Bus acts as the event transport layer – sitting between Azure’s monitoring and alerting tools (Azure Monitor, Azure Alerts, Defender for Cloud) and Ansible’s automation engine. Events are published to a Service Bus queue or topic, and Event-Driven Ansible subscribes to consume them in real time.

This architecture is particularly valuable for organizations that already use Azure Service Bus as their event backbone. Instead of building custom integrations for each alert type, you route all events through Service Bus and let a single EDA rulebook handle the dispatch to enrichment and remediation workflows.

What is AIOps? – redhat.com

Solution

What makes up the solution?

Azure Service Bus as the managed messaging layer for event transport [Link]
Event-Driven Ansible (EDA) to subscribe to Service Bus queues and trigger automation [Link]
Red Hat AI for AI-driven root cause analysis of the event context [Link]
Ansible Automation Platform (AAP) workflows for orchestrating enrichment and remediation [Link]
Ansible Lightspeed to generate remediation playbooks from AI-enriched context [Link]

EDA is part of Ansible Automation Platform.

EDA uses rulebooks to monitor events, then executes specified job templates or workflows based on the event. Think of it simply as inputs and outputs. EDA is an automatic way for inputs into Ansible Automation Platform, where Ansible Automation Platform is the output (running a job template or workflow).

Who Benefits

Persona	Challenge	What They Gain
IT Ops Engineer / SRE	Monitoring Azure Service Bus queues for critical events and manually investigating each one across hybrid Azure and on-prem infrastructure	Azure events automatically trigger enrichment and remediation workflows – the fix starts before the engineer even sees the alert
Automation Architect	Integrating Azure’s messaging layer with Ansible requires understanding Service Bus authentication, EDA event source plugins, and credential management	A reference architecture with production-ready EDA rulebooks, Azure Service Bus configuration, and tested collection usage
IT Manager / Director	Azure event volume is growing but response times aren’t improving; hybrid cloud operations span Azure and on-prem with no unified automation trigger	A single event pipeline that bridges Azure monitoring with Ansible remediation across hybrid environments – measurable MTTR reduction

Prerequisites

Ansible Automation Platform

Ansible Automation Platform 2.5+ – Required for enterprise Event-Driven Ansible support and the Azure Service Bus event source plugin.

Featured Ansible Content Collections

Collection	Type	Purpose
ansible.eda	Certified	EDA event sources and filters – includes the `azure_service_bus` event source plugin
azure.azcollection	Certified	Manage Azure resources (VMs, networking, storage, etc.) for remediation tasks
ansible.controller	Certified	AAP configuration as code (job templates, workflows, surveys)
redhat.ai	Certified	AI model inference using the OpenAI-compatible API via InstructLab
ansible.scm	Certified	Git operations (commit and push generated playbooks)

External Systems

System	Required	Notes
Azure Service Bus namespace	Yes	With at least one queue or topic/subscription configured
Azure subscription	Yes	With permissions to create Service Bus resources and configure Shared Access Policies
AI inference endpoint	Yes	Red Hat AI (RHEL AI + InstructLab) or any OpenAI-compatible API
Ansible Lightspeed	Recommended	For dynamic playbook generation at the Run maturity level
Git repository	Yes	GitHub, GitLab, or Gitea for storing generated playbooks
Chat or ITSM tool	Recommended	Slack, Mattermost, or ServiceNow for human-in-the-loop notifications

Azure Service Bus to Ansible Workflow

The workflow has four stages, matching the AIOps reference architecture:

Azure Event → Service Bus → EDA – Azure Monitor, Defender for Cloud, or a custom application publishes an event to a Service Bus queue. EDA subscribes to the queue and consumes the message.
Enrichment Workflow – AAP gathers additional context from the affected Azure resource or on-prem host, sends the enriched data to Red Hat AI for root cause analysis, and notifies the operations team.
Remediation Workflow – Ansible Lightspeed generates a remediation playbook from the AI analysis, commits it to Git, and creates a Job Template.
Execute Remediation – The generated playbook runs against the affected infrastructure (Azure VMs, on-prem hosts, or network devices), resolving the issue.

Operational Impact per Stage

Stage	Operational Impact	Why
1. Azure Event → EDA	None	Read-only – EDA reads a message from the Service Bus queue. No changes to systems.
2. Enrichment Workflow	Low	Collects logs and system info, calls an AI API, posts to chat/ITSM. No infrastructure changes.
3. Remediation Workflow	Low	Generates a playbook, commits to Git, creates a Job Template. Prepares the fix but does not touch production.
4. Execute Remediation	High	Modifies production infrastructure. Should go through a change window or approval gate.

Workflow Diagram

Azure Monitor/Alert → Service Bus Queue → EDA Rulebook → Enrichment Workflow → AI Analysis → Remediation Workflow → Execute Fix

Azure Service Bus is the event transport layer.

In the AIOps reference architecture, Kafka or a direct webhook handles event transport. When using Azure Service Bus, it replaces that layer – Azure services publish events to a queue, and EDA subscribes directly using the built-in ansible.eda.azure_service_bus plugin.

Solution Walkthrough

1. Configure Azure Service Bus Queue

Operational Impact: None (Azure configuration only)

Create a Service Bus namespace and queue in Azure, then configure a Shared Access Policy that EDA will use to authenticate.

Using the Azure CLI:

az servicebus namespace create \
  --name aiops-events \
  --resource-group rg-aiops \
  --location eastus \
  --sku Standard

az servicebus queue create \
  --name infrastructure-alerts \
  --namespace-name aiops-events \
  --resource-group rg-aiops

az servicebus namespace authorization-rule create \
  --name eda-listener \
  --namespace-name aiops-events \
  --resource-group rg-aiops \
  --rights Listen

Or automate the same setup with the azure.azcollection:

- name: Create Azure Service Bus queue for AIOps events
  azure.azcollection.azure_rm_servicebusqueue:
    resource_group: rg-aiops
    namespace: aiops-events
    name: infrastructure-alerts
    max_size_in_mb: 1024
    default_message_time_to_live: "PT1H"
  delegate_to: localhost

Route Azure Monitor alerts to Service Bus.

In the Azure Portal, navigate to Monitor → Alerts → Action Groups and create an action group with a Service Bus Queue action type. This routes Azure Monitor alerts directly into your queue where EDA can consume them.

2. EDA Rulebook for Azure Service Bus Events

Operational Impact: None

The EDA rulebook subscribes to the Azure Service Bus queue using the built-in ansible.eda.azure_service_bus event source plugin. When a message arrives, the rulebook evaluates conditions and triggers the enrichment workflow.

---
- name: Azure Service Bus AIOps response
  hosts: all
  sources:
    - ansible.eda.azure_service_bus:
        conn_str: "{{ azure_service_bus_connection_string }}"
        queue_name: infrastructure-alerts
        logging_enable: true

  rules:
    - name: Azure infrastructure alert detected
      condition: event.body is defined
      action:
        run_workflow_template:
          organization: "Default"
          name: "Azure Alert Enrichment Workflow"
          extra_vars:
            alert_name: "{{ event.body.data.essentials.alertRule | default('Unknown Alert') }}"
            severity: "{{ event.body.data.essentials.severity | default('Sev3') }}"
            affected_resource: "{{ event.body.data.essentials.alertTargetIDs[0] | default('') }}"
            description: "{{ event.body.data.essentials.description | default('') }}"
            fired_time: "{{ event.body.data.essentials.firedDateTime | default('') }}"

The rulebook extracts key fields from the Azure Monitor alert schema – the alert rule name, severity, affected Azure resource ID, description, and timestamp – and passes them to the enrichment workflow.

Azure Monitor Common Alert Schema.

Azure Monitor alerts follow a Common Alert Schema with essentials (severity, alert rule, target resource) and alertContext (metric/log details). Your rulebook conditions can filter on any of these fields – for example, only triggering automation on Sev0 or Sev1 alerts.

3. Enrichment Workflow – Gather Context and Analyze with AI

Operational Impact: Low

Once EDA triggers the enrichment workflow, Ansible gathers additional context from the affected Azure resource and sends everything to Red Hat AI for root cause analysis. This is the same pattern described in the AIOps reference architecture – Log Enrichment and Prompt Generation Workflow.

Step 3a: Gather context from the affected Azure VM

- name: Gather Azure VM context for alert
  hosts: localhost
  tasks:
    - name: Get Azure VM details
      azure.azcollection.azure_rm_virtualmachine_info:
        resource_group: "{{ resource_group }}"
        name: "{{ vm_name }}"
      register: vm_info

    - name: Get recent Azure activity log entries
      azure.azcollection.azure_rm_resource_info:
        url: "/subscriptions/{{ subscription_id }}/providers/Microsoft.Insights/eventtypes/management/values"
        api_version: "2015-04-01"
      register: activity_log

Step 3b: Gather diagnostics from the host itself

- name: Gather host diagnostics
  hosts: "{{ affected_host }}"
  become: true
  tasks:
    - name: Check for failed systemd services
      ansible.builtin.shell:
        cmd: "systemctl list-units --state=failed --no-pager"
      register: failed_services

    - name: Collect recent error logs
      ansible.builtin.command:
        cmd: "journalctl --since '2 hours ago' --priority=err --no-pager"
      register: recent_errors

    - name: Check disk and memory usage
      ansible.builtin.setup:
        filter:
          - ansible_memtotal_mb
          - ansible_memfree_mb
          - ansible_mounts

Step 3c: Analyze with Red Hat AI

- name: Analyze Azure alert with Red Hat AI
  hosts: localhost
  tasks:
    - name: Build enriched prompt
      ansible.builtin.set_fact:
        enriched_prompt: |
          An Azure Monitor alert fired: {{ alert_name }}
          Severity: {{ severity }}
          Affected resource: {{ affected_resource }}
          Description: {{ description }}
          VM status: {{ vm_info.vms[0].power_state | default('unknown') }}
          Failed services: {{ hostvars[affected_host].failed_services.stdout | default('none') }}
          Recent errors: {{ hostvars[affected_host].recent_errors.stdout | default('none') | truncate(2000) }}

          Diagnose the root cause and recommend a remediation step.

    - name: Send to Red Hat AI for analysis
      redhat.ai.completion:
        base_url: "http://{{ rhelai_server }}:{{ rhelai_port }}"
        token: "{{ rhelai_token }}"
        prompt: "{{ enriched_prompt }}"
        model_path: "/root/.cache/instructlab/models/granite-8b-lab-v1"
      delegate_to: localhost
      register: ai_response

4. Notify and Remediate

Operational Impact: Low (notification) → High (remediation execution)

The enrichment workflow posts the AI analysis to your team’s communication channel and – depending on your maturity level – either queues a remediation playbook for human approval or executes it automatically.

    - name: Post AI diagnosis to Slack
      ansible.builtin.uri:
        url: "{{ slack_webhook_url }}"
        method: POST
        body_format: json
        body:
          text: |
            :rotating_light: *Azure Alert:* {{ alert_name }}
            *Severity:* {{ severity }}
            *Resource:* {{ affected_resource }}
            *AI Diagnosis:* {{ ai_response.choice_0_text }}

    - name: Create ServiceNow incident with enriched context
      servicenow.itsm.incident:
        instance:
          host: "{{ snow_instance }}"
          username: "{{ snow_username }}"
          password: "{{ snow_password }}"
        short_description: "Azure Alert: {{ alert_name }}"
        description: "AI Analysis: {{ ai_response.choice_0_text }}"
        priority: "{{ '1' if severity == 'Sev0' else '2' if severity == 'Sev1' else '3' }}"
      when: snow_instance is defined

From here, the remediation follows the same pattern as the AIOps Remediation Workflow – Lightspeed generates a playbook, it gets committed to Git, and a Job Template is created for execution.

Validation

Stage	What to Verify	Success Indicator
1. Azure Event → EDA	Message arrives in queue and EDA consumes it	AAP shows the rulebook activation as Running; event log shows the Azure alert payload
2. Enrichment Workflow	AI analyzed the alert and notifications were sent	Workflow Visualizer shows all nodes green; Slack/ITSM received the AI diagnosis
3. Remediation Workflow	Playbook was generated and committed	New playbook file exists in the Git repository; Job Template was created
4. Execute Remediation	The fix was applied	Azure resource returns to healthy state; Azure Monitor alert auto-resolves

Quick validation test – Send a test message to the Service Bus queue using the Azure CLI:

az servicebus queue send \
  --namespace-name aiops-events \
  --queue-name infrastructure-alerts \
  --resource-group rg-aiops \
  --body '{
    "data": {
      "essentials": {
        "alertRule": "Test High CPU Alert",
        "severity": "Sev2",
        "alertTargetIDs": ["/subscriptions/xxx/resourceGroups/rg-aiops/providers/Microsoft.Compute/virtualMachines/web01"],
        "description": "CPU usage exceeded 95% for 5 minutes",
        "firedDateTime": "2026-02-18T14:30:00Z"
      }
    }
  }'

Troubleshooting

Symptom	Likely Cause	Fix
EDA rulebook is active but no events are received	Connection string is wrong or queue name doesn’t match	Verify `conn_str` has `Listen` permissions; check queue name matches exactly
EDA receives event but condition doesn’t match	Azure alert payload structure differs from expected schema	Use `debug` action in the rulebook to print `event.body` and compare against your conditions
Enrichment workflow runs but AI response is empty	Prompt is too vague or AI endpoint is unreachable	Verify the Red Hat AI server is running; test with a simple prompt first
Azure VM info retrieval fails	Azure credentials in AAP are expired or lack permissions	Verify the Azure Service Principal credential; test with `azure.azcollection.azure_rm_virtualmachine_info`
Messages pile up in the dead-letter queue	EDA consumer is crashing or message format is unexpected	Check EDA logs in AAP; inspect dead-letter messages in Azure Portal for error details

Maturity Path

Maturity	Approach	How It Works	AI Role
Crawl	Alert Enrichment	Azure alert → Service Bus → EDA → AI diagnoses → enriched context posted to Slack/ITSM → human investigates and remediates	Read-only: AI interprets, humans act
Walk	Curated Remediation	Azure alert → Service Bus → EDA → AI diagnoses → AI selects a pre-approved playbook → human approves → playbook executes	AI selects from existing automation
Run	Auto-Remediation	Azure alert → Service Bus → EDA → AI diagnoses → Lightspeed generates a remediation playbook → policy engine validates → playbook executes	AI generates new automation within policy boundaries

Start with Crawl.

Route Azure Monitor alerts to Service Bus and deploy the EDA rulebook and enrichment workflow first. Let your team see AI-enriched Azure alerts in Slack for a few weeks before adding automated remediation. This builds confidence in the pipeline and catches edge cases early.

AIOps reference architecture: See AIOps automation with Ansible for the full end-to-end pipeline, including Lightspeed playbook generation, policy enforcement, and the broader AIOps maturity journey.
Using Splunk as the trigger? See Triggering Automated Remediation from Splunk Alerts for connecting Splunk alerts to the same AIOps pipeline.
Looking for ServiceNow integration? See Unlock AIOps with ServiceNow LEAP and Ansible MCP server for LEAP/MCP-driven remediation and related ITSM patterns (see also KB 7127603).
Need to deploy the AI backend? See AI Infrastructure automation with Ansible for automating Red Hat AI provisioning with the infra.ai and redhat.ai collections.
New to Event-Driven Ansible? See Get started with EDA (Ansible Rulebook) for the fundamentals of rulebooks, event sources, and actions.

Summary

With Azure Service Bus connected to Event-Driven Ansible, your Azure monitoring alerts become the trigger for automated enrichment and remediation. Instead of engineers manually triaging every Azure Monitor alert, events flow through Service Bus into an AIOps pipeline that diagnoses root cause with AI and remediates with Ansible – reducing MTTR and bridging the gap between Azure’s cloud monitoring and your operational automation, whether the affected systems are in Azure, on-prem, or hybrid.