Observability tools like IBM Instana detect thousands of infrastructure and application anomalies daily – but detection alone does not fix the problem. Most organizations still route alerts to human on-call engineers for manual triage and remediation, creating alert fatigue, inconsistent responses, and mean time to resolution (MTTR) measured in hours instead of minutes.
Most organizations already have proven Ansible automation for service restarts, database maintenance, deployment rollbacks, and dozens of other operational runbooks – built and refined by their own automation teams. Red Hat Ansible Automation Platform turns these existing, trusted playbooks into an governed remediation layer that observability tools can trigger directly. This guide demonstrates how to connect IBM Instana Observability to Ansible Automation Platform so that every high-signal Incident triggers the right remediation from your existing automation catalog – automatically, with full RBAC control and audit trail.
Business value: Reduced MTTR from hours (manual triage and remediation) to minutes (automated response). Reduced alert fatigue through automated triage – only novel or unresolvable Incidents escalate to humans. Compliance-ready audit trail for every remediation action: who triggered it, what changed, pass or fail. For mission-critical workloads – payments, trading, customer-facing services – automated remediation directly reduces downtime impact on revenue and SLA compliance.
Technical value: Governed remediation with RBAC-scoped job templates – only authorized teams can trigger remediation within their scope. Credential isolation – secrets stored in automation controller and injected at runtime, never exposed in playbooks or logs. Bidirectional observability-automation feedback loop via Host Agent REST API annotations, linking remediation actions to Incidents on the Instana timeline.
Modern observability platforms excel at detecting anomalies, identifying root causes, and correlating events. But without an automation layer to act on those signals, organizations are left with dashboards and alert noise. The value of observability is only realized when it connects to remediation – and that remediation must be governed, auditable, and consistent across teams and shifts.
Most enterprises already invest heavily in Ansible automation – service restart playbooks, database maintenance runbooks, deployment rollback procedures, compliance hardening, and more. These playbooks are written by teams who understand the infrastructure, tested in staging, reviewed through pull requests, and promoted through change management. They represent institutional knowledge codified as automation.
The challenge is not writing more playbooks. It is connecting the right playbook to the right signal at the right time – without requiring a human to interpret the alert, find the runbook, and execute it manually.
Red Hat Ansible Automation Platform solves this by turning your existing automation library into a governed remediation catalog that observability tools can trigger directly:
IBM Instana Observability is a full-stack observability platform that provides the intelligent detection feeding AAP’s automation layer:
Beyond detection, Instana includes its own AI-powered capabilities that complement AAP’s governed execution:
Instana’s native AI generates investigation guidance and suggested scripts. AAP provides the governed execution layer for trusted, existing automation. The two are complementary – Instana identifies what to do, AAP ensures it is done safely and consistently.
IBM owns both Instana and Red Hat, which means tighter integration than third-party observability tools. The ibm.instana Ansible Content Collection is available on Red Hat automation hub, and Instana monitors Ansible natively via a callback plugin – creating a bidirectional feedback loop where the automation layer is itself observed.
Ansible Automation Platform – the automation layer:
ibm.instana Ansible Content Collection – dedicated instana_webhook EDA source plugin for parsing Instana webhook payloadsIBM Instana – the detection layer:
Optional:
| Persona | Challenge | What They Gain |
|---|---|---|
| IT Ops Engineer / SRE | Alert fatigue from observability tools; manual triage even when the remediation playbook already exists and has been run dozens of times | Instana Incidents trigger the same tested playbooks they already trust – no new automation to write, just a faster path to execution |
| Automation Architect | Existing automation library is disconnected from observability; teams have built reliable playbooks but still rely on manual triage to invoke them | Two production-ready integration patterns that connect existing job templates to Instana signals with proper RBAC, credential management, and approval workflows |
| IT Manager / Director | MTTR measured in hours despite having automation in place; no audit trail linking observability events to remediation actions; difficulty connecting automation outcomes to business impact | Existing automation investment delivers more value – playbooks that were run manually now execute in minutes with the same governance and a complete audit trail. Automated remediation of mission-critical services produces measurable improvements in uptime and SLA compliance that map directly to business outcomes |
Ansible Automation Platform 2.5+ – required for Event-Driven Ansible controller (GA) and the ansible.eda collection.
| Collection | Source | Purpose |
|---|---|---|
ibm.instana |
Red Hat automation hub / Ansible Galaxy | Dedicated instana_webhook EDA source plugin |
ansible.eda |
Ansible Certified Content (bundled) | Fallback webhook source, event filters |
community.mysql |
Community | Database remediation tasks (Use Case 2) |
kubernetes.core |
Ansible Certified Content | Kubernetes rollback tasks (Use Case 3) |
| System | Required | Notes |
|---|---|---|
| IBM Instana (SaaS or self-hosted) | Yes | API token with “Configuration of alert channels” permission |
| Automation controller | Yes | Job template execute permissions for EDA service account |
| Event-Driven Ansible controller | Yes (Path A) | Must be reachable from Instana SaaS for webhook delivery |
| AI inference endpoint | Optional | For LLM enrichment step only |
ibm.instana collection)Operational Impact: Medium – remediation playbooks modify production services. Validate all automation in a non-production environment before enabling auto-trigger.
This guide covers two production-ready integration paths. Both are GA and can be used independently or together.
The end-to-end flow starts the same way in both paths: an Instana Smart Alert fires when a metric – such as service response time or error rate – crosses an adaptive threshold tuned to the workload’s daily and weekly patterns.
In Path A, Instana delivers the alert as an HTTP POST to an Event-Driven Ansible webhook. An EDA rulebook evaluates the incoming payload – matching on severity, entity type, and alert text – and triggers the appropriate remediation job template on automation controller.
In Path B, the alert stays inside Instana. An automation policy evaluates the trigger conditions and selects an action from the action catalog. The Automation Action Ansible sensor on the Instana host agent forwards that action to automation controller for execution – no Event-Driven Ansible infrastructure required.
From this point, both paths converge. Automation controller executes the remediation playbook with full RBAC scoping and credential injection. The playbook performs the fix – restart a service, recycle database connections, or roll back a deployment – and posts an annotation back to Instana via the Host Agent REST API. Instana displays this annotation as a Change event on the Incident timeline, closing the feedback loop so operators can see exactly what automation did and when.
| Stage | Impact | Why |
|---|---|---|
| Shared setup (API token, credential type) | None | Configuration only – no changes to running systems |
| Path A setup (webhook channel, alert config, rulebook) | Low | Configures event routing – no production changes |
| Path B setup (sensor config, automation policy) | Low | Configures event routing – no production changes |
| Use Case 1: Service restart | Medium | Restarts a running service; validated by health check |
| Use Case 2: DB connection recycle | Medium | Kills idle database connections; validated by connection count |
| Use Case 3: Deployment rollback | High | Reverts production code; use approval gates until validated |
| AI enrichment (optional) | Low | Read-only API call to inference endpoint; no infrastructure changes |
Instana Smart Alert fires
-> Instana webhook alert channel sends HTTP POST
-> Event-Driven Ansible controller receives event via ibm.instana.instana_webhook source
-> Rulebook condition matches on event.payload.issue.severity, .text, .type
-> run_job_template action triggers automation controller job template
-> Automation controller executes remediation playbook (RBAC-scoped, credential-injected)
-> Playbook posts annotation back to Instana via Host Agent REST API
-> Instana timeline shows remediation Change event alongside the Incident
Instana Smart Alert or event fires
-> Instana automation policy evaluates trigger conditions
-> Policy triggers action automatically (or operator triggers manually)
-> Instana AI recommends best action from action catalog (confidence score)
-> Automation Action Ansible sensor connects to automation controller
-> Automation controller executes remediation job template (same RBAC, same audit trail)
-> Action output reported back to Instana Incident timeline
In both paths, Instana’s built-in alert channels can simultaneously notify collaboration and ITSM platforms – Slack, PagerDuty, ServiceNow, Microsoft Teams, OpsGenie, and more – ensuring the right teams are informed the moment an incident is detected. Path A adds another layer: AAP can enrich those notifications with AI-driven remediation recommendations, so teams receive not just what happened but what to do about it and why.
| Event-Driven Automation (Path A) | Native Instana Integration (Path B) |
|---|---|
| Alerts trigger automated workflows with intelligent remediation selection | Remediation actions available directly within the Instana console |
| Multi-step orchestration with approval gates and notifications | Streamlined, single-action execution for fast response |
| Coordinates across multiple teams, tools, and platforms | Empowers operators to remediate without leaving the observability view |
| Integrates into broader enterprise automation workflows | Leverages Instana’s application intelligence for context-aware actions |
| Full audit trail across observability, automation, and collaboration systems | Unified detection and remediation with audit trail in both platforms |
| Instana notifies ITSM and collaboration tools with detection context, while AAP enriches notifications with AI-recommended remediation, confidence level, and reasoning | Instana notifies ITSM and collaboration tools with full detection context in parallel with remediation execution |
| Best for: Incidents requiring orchestration, coordination, or intelligent decision-making | Best for: Known remediations where speed and simplicity are priorities |
Together: Both paths share the same automation content – build your remediation once, execute through either path based on operational needs.
Tip: Both paths converge on automation controller for execution. The RBAC policies, credential types, audit trail, and approval workflows are identical regardless of which path triggers the job template.
Operational Impact: None
Operational Impact: None
Only needed if your playbooks call the Instana backend API (e.g.,
POST /api/releasesfor deployment markers). The remediation playbooks in this guide use the Host Agent REST API onlocalhost:42699, which requires no authentication.
Input configuration:
fields:
- id: instana_api_token
type: string
label: Instana API Token
secret: true
- id: instana_base_url
type: string
label: Instana Base URL
required:
- instana_api_token
- instana_base_url
Injector configuration:
extra_vars:
instana_api_token: !unsafe "{{ instana_api_token }}"
instana_base_url: !unsafe "{{ instana_base_url }}"
Operational Impact: None
Navigate to Settings > Events & Alerts > Alert Channels > Add Alert Channel > Generic Webhook in Instana.
| Field | Value |
|---|---|
| Name | EDA Webhook - Remediation |
| Webhook URL | https://eda.example.com:5000/instana |
| Custom HTTP Headers | (optional: X-EDA-Token: <bearer-token> for auth) |
Note: Instana states “The Instana Webhook format is not compatible with third-party tools expecting Incoming Webhooks in their format.” This is expected – the
ibm.instana.instana_webhooksource plugin handles parsing.
Use the Test Channel button to verify delivery before proceeding.
EDA Webhook - Remediation alert channelWhen the alert fires, Instana sends an HTTP POST with the following default payload:
{
"issue": {
"id": "abc123-def456",
"type": "issue",
"state": "OPEN",
"start": 1709500000000,
"severity": 10,
"text": "Erroneous call rate is too high",
"suggestion": "Check application logs for errors",
"link": "https://instana.example.com/#/?snapshotId=abc123",
"zone": "production",
"fqdn": "app-server-01.example.com",
"entity": "jvm",
"entityLabel": "checkout-service",
"container": "checkout-pod-7b8f9"
}
}
Key fields for EDA rulebook conditions:
| Payload Field | EDA Accessor | Description |
|---|---|---|
issue.severity |
event.payload.issue.severity |
5 = Warning, 10 = Critical |
issue.text |
event.payload.issue.text |
Alert title (used for pattern matching in rulebook conditions) |
issue.state |
event.payload.issue.state |
OPEN or CLOSED |
issue.entity |
event.payload.issue.entity |
Entity type (jvm, Host, mysql, etc.) |
issue.fqdn |
event.payload.issue.fqdn |
Target host FQDN (passed as limit to the job template) |
issue.suggestion |
event.payload.issue.suggestion |
Instana’s remediation suggestion |
issue.link |
event.payload.issue.link |
Direct link to the Incident in Instana UI |
Operational Impact: Low
The following rulebook uses the dedicated ibm.instana.instana_webhook source plugin and handles all three use cases covered in this guide. Each rule matches on specific Instana event patterns and triggers the corresponding automation controller job template via the run_job_template action.
---
- name: Instana Incident Remediation
hosts: all
sources:
- ibm.instana.instana_webhook:
host: 0.0.0.0
port: 5000
rules:
- name: Service Latency - restart affected service
condition: >
event.payload.issue.severity == 10 and
event.payload.issue.text is match(".*Slow.*|.*latency.*|.*response time.*", ignorecase=true) and
event.payload.issue.state == "OPEN"
action:
run_job_template:
name: "Instana - Service Latency Remediation"
organization: "Default"
job_args:
extra_vars:
target_host: "{{ event.payload.issue.fqdn }}"
entity_label: "{{ event.payload.issue.entityLabel }}"
instana_link: "{{ event.payload.issue.link }}"
instana_suggestion: "{{ event.payload.issue.suggestion }}"
- name: Database Performance - clear cache and recycle connections
condition: >
event.payload.issue.severity >= 5 and
event.payload.issue.entity is match(".*sql.*|.*db.*|.*database.*|.*mysql.*|.*postgres.*", ignorecase=true) and
event.payload.issue.state == "OPEN"
action:
run_job_template:
name: "Instana - Database Performance Remediation"
organization: "Default"
job_args:
extra_vars:
target_host: "{{ event.payload.issue.fqdn }}"
entity_label: "{{ event.payload.issue.entityLabel }}"
- name: Deployment Error Spike - trigger rollback
condition: >
event.payload.issue.severity == 10 and
event.payload.issue.text is match(".*error rate.*|.*erroneous call.*", ignorecase=true) and
event.payload.issue.state == "OPEN"
action:
run_job_template:
name: "Instana - Deployment Rollback"
organization: "Default"
job_args:
extra_vars:
target_host: "{{ event.payload.issue.fqdn }}"
entity_label: "{{ event.payload.issue.entityLabel }}"
instana_link: "{{ event.payload.issue.link }}"
- name: Log all unmatched events for debugging
condition: event.payload.issue is defined
action:
debug:
msg: >
Unmatched Instana event: {{ event.payload.issue.text }}
(severity={{ event.payload.issue.severity }})
Create a rulebook activation in the Event-Driven Ansible controller:
| Field | Value |
|---|---|
| Name | Instana Incident Remediation |
| Project | Instana AIOps (Git repo containing rulebooks) |
| Rulebook | instana_remediation.yml |
| Decision environment | Custom image with ibm.instana and aiohttp>=3.8.4 installed |
| Credential | Automation controller credential (for run_job_template) |
| Restart policy | Always |
Operational Impact: Low
The Automation Action Ansible sensor runs on the Instana host agent and connects to automation controller via the Ansible automation connector.
Add the following to the Instana agent configuration file:
com.instana.plugin.action.ansible:
enabled: true
url: https://aap-controller.example.com
token: <aap_api_token>
apiPath: /api/v2 # optional, default
maxConcurrentActions: 10 # optional, default
defaultTimeout: 300 # optional, seconds
Key capabilities:
Operational Impact: None
Operational Impact: Medium
Alternative: Route through Event-Driven Ansible
If you prefer EDA for complex multi-step orchestration, create a Script action in Instana that forwards event data to the EDA webhook:
#!/bin/bash if [ -z "${INSTANA_EVENT}" ]; then curl -s -H 'Content-Type: application/json' \ -d '{"message": "Test event from Instana automation framework"}' \ @@eda_server@@/instana else curl -s -H 'Content-Type: application/json' \ -d "${INSTANA_EVENT}" \ @@eda_server@@/instana fi
Operational Impact: Medium
Instana detects response time exceeding the adaptive threshold on a microservice. A Smart Alert fires based on seasonality-adjusted latency thresholds. The EDA rulebook matches on event.payload.issue.text containing latency-related patterns and triggers the remediation job template on automation controller.
Remediation playbook – featured tasks:
- name: Gather service state before restart
ansible.builtin.systemd:
name: "{{ service_name }}"
register: service_state
- name: Restart service to clear thread pool exhaustion
ansible.builtin.systemd:
name: "{{ service_name }}"
state: restarted
when: service_state.status.ActiveState == "active"
- name: Wait for service health check to pass
ansible.builtin.uri:
url: "http://{{ inventory_hostname }}:{{ service_port }}/health"
status_code: 200
retries: 10
delay: 6
register: health_check
until: health_check.status == 200
- name: Post remediation annotation to Instana via Host Agent REST API
ansible.builtin.uri:
url: "http://localhost:42699/com.instana.plugin.generic.event"
method: POST
body_format: json
body:
title: "AAP Remediation: {{ service_name }} restarted"
text: >
Service {{ service_name }} restarted by automation controller
due to latency spike. Health check passed.
severity: -1
duration: 30000
status_code: [200, 201, 204]
Job template configuration in automation controller:
| Field | Value |
|---|---|
| Name | Instana - Service Latency Remediation |
| Inventory | Application Servers |
| Project | Instana AIOps Playbooks |
| Playbook | remediate_service_latency.yml |
| Credentials | Machine credential |
| Extra variables | target_host (prompt on launch), service_name, service_port |
| Limit | {{ target_host }} (dynamic, passed from Event-Driven Ansible) |
The Host Agent REST API annotation (severity -1 = Change) creates a visible marker on the Instana timeline, linking the remediation action to the original Incident for post-incident review.
Operational Impact: Medium
Instana detects slow query execution times or connection pool exhaustion on a monitored MySQL database. The entity field in the webhook payload matches database-related patterns. Automation controller runs a job template that identifies and kills idle connections.
Remediation playbook – featured tasks:
- name: Check current connection count
community.mysql.mysql_query:
login_host: "{{ db_host }}"
login_user: "{{ db_admin_user }}"
login_password: "{{ db_admin_password }}"
query: "SHOW STATUS WHERE Variable_name = 'Threads_connected'"
register: db_connections
- name: Kill idle connections exceeding threshold
community.mysql.mysql_query:
login_host: "{{ db_host }}"
login_user: "{{ db_admin_user }}"
login_password: "{{ db_admin_password }}"
query: >
SELECT CONCAT('KILL ', id, ';') FROM information_schema.processlist
WHERE command = 'Sleep' AND time > 300
register: idle_connections
when: db_connections.query_result[0][0].Value | int > connection_threshold
- name: Post remediation annotation to Instana via Host Agent REST API
ansible.builtin.uri:
url: "http://localhost:42699/com.instana.plugin.generic.event"
method: POST
body_format: json
body:
title: "AAP Remediation: DB connections recycled on {{ db_host }}"
text: >
Idle connections killed. Previous count:
{{ db_connections.query_result[0][0].Value }}
severity: -1
duration: 60000
status_code: [200, 201, 204]
Tip: Store database credentials in an automation controller credential type – never hardcode
db_admin_userordb_admin_passwordin playbook variables. Use injectors to pass them as extra variables or environment variables at runtime.
Operational Impact: High
Instana detects a spike in erroneous call rate correlated with a recent deployment Change event. Probable Root Cause identifies the deployment as the likely cause. Automation controller runs a job template that triggers a Kubernetes rollback.
Warning: This use case has high operational impact – a rollback reverts production code. Use automation controller approval workflow nodes (see Maturity Path) until this pattern is validated in your environment.
Remediation playbook – featured tasks:
- name: Get deployment rollout history
kubernetes.core.k8s_info:
kind: Deployment
name: "{{ app_name }}"
namespace: "{{ app_namespace }}"
register: current_deploy
- name: Roll back to previous revision
ansible.builtin.command:
cmd: >
kubectl rollout undo deployment/{{ app_name }}
-n {{ app_namespace }}
register: rollback_result
- name: Wait for rollout to complete
ansible.builtin.command:
cmd: >
kubectl rollout status deployment/{{ app_name }}
-n {{ app_namespace }} --timeout=120s
register: rollout_status
- name: Post rollback annotation to Instana via Host Agent REST API
ansible.builtin.uri:
url: "http://localhost:42699/com.instana.plugin.generic.event"
method: POST
body_format: json
body:
title: "AAP Remediation: {{ app_name }} rolled back"
text: >
Deployment rolled back due to error rate spike.
{{ rollback_result.stdout }}
severity: -1
duration: 120000
status_code: [200, 201, 204]
Tip: For Kubernetes remediation, store the
kubeconfigin an automation controller credential of type “OpenShift or Kubernetes API Bearer Token” and ensure the execution environment includes thekubernetes.coreAnsible Certified Content Collection.
The three use cases above use deterministic EDA rulebook conditions to select the right job template. For most well-understood failure modes, deterministic routing is the right approach because it is predictable, testable, and auditable.
For situations where the mapping is ambiguous – a novel failure mode, an alert that could match multiple existing playbooks, or an entity type your rulebook does not yet cover – you can add an AI inference step that dynamically selects from your existing job template catalog rather than generating new automation.
This uses a workflow job template pattern in automation controller: Event-Driven Ansible triggers a workflow that queries the available job templates, passes them with the event context to an AI inference endpoint, and conditionally runs the recommended template.
Remediation playbook – featured tasks:
- name: Get available job templates from automation controller
ansible.builtin.uri:
url: "https://{{ controller_host }}/api/controller/v2/job_templates/?page_size=100"
method: GET
headers:
Authorization: "Bearer {{ controller_token }}"
validate_certs: true
register: job_templates_response
- name: Build job template catalog for AI context
ansible.builtin.set_fact:
job_template_catalog: >-
{{ job_templates_response.json.results | map(attribute='name')
| zip(job_templates_response.json.results | map(attribute='description'))
| map('join', ': ')
| join('\n') }}
- name: Ask AI to recommend a job template from existing catalog
ansible.builtin.uri:
url: "{{ ai_inference_url }}/v1/chat/completions"
method: POST
headers:
Authorization: "Bearer {{ ai_api_token }}"
Content-Type: "application/json"
body_format: json
body:
model: "{{ ai_model }}"
messages:
- role: system
content: >
You are an SRE assistant. Given an Instana Incident and a catalog of
available Ansible job templates, recommend the single best job template
to remediate the issue. If no existing template is a good match, respond
with NO_MATCH. Respond in JSON format:
{"template": "<exact template name>", "confidence": "high|medium|low",
"reasoning": "<one sentence>", "variables": {"key": "value"}}
- role: user
content: |
INCIDENT:
Text: {{ instana_issue_text }}
Entity: {{ instana_entity_label }}
Entity Type: {{ instana_entity_type }}
Severity: {{ instana_severity }}
FQDN: {{ instana_fqdn }}
Suggestion: {{ instana_suggestion }}
AVAILABLE JOB TEMPLATES:
{{ job_template_catalog }}
register: ai_response
- name: Parse AI recommendation
ansible.builtin.set_fact:
ai_recommendation: "{{ ai_response.json.choices[0].message.content }}"
As teams add new job templates to automation controller, the AI automatically considers them without requiring rulebook changes. The AI selects from existing templates – it does not generate new playbooks.
When to use AI-assisted routing vs. direct remediation:
| Scenario | Approach |
|---|---|
| Well-understood, recurring failure with a single correct response | Direct remediation – deterministic rulebook routing to an existing job template |
| Novel failure mode or alert that could match multiple existing playbooks | AI-assisted routing – LLM selects the best existing job template from your catalog |
| Instana Probable Root Cause provides a clear suggestion | Direct remediation using the suggestion field to select the matching job template |
Tip: The AI recommendation includes the exact template name, confidence level, reasoning, and suggested variables – enough detail for an operator to execute immediately or for a workflow to run conditionally based on confidence threshold.
Tip: The generic AIOps automation with Ansible guide covers the AI inference pattern in depth, including how to use Red Hat AI, OpenAI, or any OpenAI-compatible endpoint.
| Stage | What to Verify | Success Indicator |
|---|---|---|
| Instana API token | Token has correct permissions | Settings > API Tokens shows token with alert channel scope |
| Webhook alert channel | Channel is configured and reachable | Test Channel button returns success |
| Alert rule | Alert is bound to webhook alert channel | Settings > Alerts shows the channel under the rule |
| Rulebook activation | Activation is running | Event-Driven Ansible controller shows status Running |
| Webhook delivery | Instana event reaches Event-Driven Ansible | Rulebook activation log shows received event JSON |
| Condition match | Correct rule fires for test event | Activation log shows matched rule name |
| Job template launch | Job template triggered by Event-Driven Ansible | Automation controller job history shows new job |
| Remediation | Service/DB/deployment recovers | Target system returns to healthy state |
| Instana annotation | Remediation visible in Instana | Timeline shows Change event with AAP remediation details |
Send a synthetic Instana-format event to the Event-Driven Ansible controller to validate the end-to-end flow without waiting for a real Incident:
curl -X POST https://eda.example.com:5000/instana \
-H "Content-Type: application/json" \
-d '{
"issue": {
"id": "test-001",
"type": "issue",
"state": "OPEN",
"start": 1709500000000,
"severity": 10,
"text": "Slow response time detected on checkout-service",
"suggestion": "Check thread pool configuration",
"link": "https://instana.example.com/#/?snapshotId=test",
"fqdn": "app-server-01.example.com",
"entity": "jvm",
"entityLabel": "checkout-service",
"zone": "production"
}
}'
The automation controller job output should show:
PLAY [Remediate service latency] ************************************************
TASK [Gather service state before restart] **************************************
ok: [app-server-01.example.com]
TASK [Restart service to clear thread pool exhaustion] **************************
changed: [app-server-01.example.com]
TASK [Wait for service health check to pass] ************************************
ok: [app-server-01.example.com]
TASK [Post remediation annotation to Instana via Host Agent REST API] ***********
ok: [app-server-01.example.com]
PLAY RECAP *********************************************************************
app-server-01.example.com : ok=4 changed=1 unreachable=0 failed=0
| Symptom | Likely Cause | Fix |
|---|---|---|
| Rulebook activation fails to start | ibm.instana collection not in decision environment |
Build a custom decision environment image with ibm.instana and aiohttp>=3.8.4 |
| Webhook events received but no rule fires | Condition field paths don’t match payload | Add the catch-all debug rule (included in the rulebook) to log raw payload and verify field paths |
| Rule fires but job template fails to launch | Job template name mismatch or missing credential | Verify the exact job template name and organization in automation controller; check credential type is “Red Hat Ansible Automation Platform” |
| Instana Test Channel returns error | Event-Driven Ansible controller not reachable | Verify firewall rules allow inbound HTTPS from Instana SaaS IPs to the EDA port |
| Remediation runs but Instana still shows Incident | Instana auto-closes Issues based on metric recovery | Wait for Instana’s evaluation cycle; verify the root metric recovered |
| Annotation POST returns connection refused | Instana host agent not running on target | Verify agent status: systemctl status instana-agent |
| Automation Action Ansible sensor can’t reach automation controller | Ansible automation connector misconfigured | Verify the URL and API token in the Instana host agent configuration |
| Maturity | Description | What to Build |
|---|---|---|
| Crawl | Instana alerts forwarded to Event-Driven Ansible, which enriches and routes notifications – no remediation, human decides. Ticket enrichment is the essential starting point – every organization should have this in place before progressing further | Instana webhook -> EDA rulebook -> run_job_template that sends a notification (Slack, email, or ITSM ticket) with Incident context, Instana link, and optionally an AI-recommended job template with suggested variables so the operator has a ready-to-execute recommendation. This stage alone reduces triage time by giving on-call engineers everything they need to act in a single notification instead of manually correlating alerts across tools. For full ITSM enrichment, see the ServiceNow ITSM Ticket Enrichment guide. |
| Walk | Event-Driven Ansible triggers automation controller job templates with a human approval gate before execution | Enable ask_variables_on_launch on job templates; use automation controller approval workflow nodes; operator reviews before remediation runs |
| Run | Fully automated closed-loop: Instana detects -> Event-Driven Ansible triggers -> automation controller remediates -> Instana confirms resolution | Remove approval gates for well-understood failure patterns; add policy guardrails (e.g., max 3 auto-remediations per hour per service, business-hours-only for critical services) |
By connecting IBM Instana to Ansible Automation Platform, you have turned your existing automation library into a governed, event-driven remediation pipeline:
Start capturing these metrics before enabling automated remediation – having a baseline makes the impact measurable from day one.
| Metric | What to Capture | Where to Find It |
|---|---|---|
| Mean time to resolution (MTTR) | Time from Instana Incident open to close, before and after automation | Instana Incident timeline; automation controller job duration |
| Mean time to remediate (MTTR-auto) | Time from alert firing to successful remediation completion | EDA activation logs (event received timestamp) vs. automation controller job completion |
| Alert-to-action ratio | Percentage of alerts that trigger automated remediation vs. manual escalation | EDA activation logs (matched rules vs. unmatched debug events) |
| Remediation success rate | Percentage of automated remediations that resolve the Incident without human intervention | Automation controller job status (successful vs. failed); Instana Incident auto-close |
| Change success rate | Percentage of automated changes that complete without rollback | Automation controller job history; deployment revision history |
| Repeat Incidents | Number of recurring Incidents for the same service or failure pattern | Instana Incident history filtered by entity and alert type |
| On-call escalation volume | Number of Incidents that still require human triage after automation is enabled | PagerDuty/OpsGenie/Instana alert channel delivery counts |
| SLA compliance | Service uptime percentage for mission-critical applications | Instana Smart Alert history; service-level objectives dashboard |
Tip: Identify 3-5 metrics most relevant to your environment and begin capturing baselines during the Crawl stage. Organizations that define success metrics before enabling automation can demonstrate measurable impact within the first quarter.