Incidents
Incident list, detail, and execution detail.
The most heavily used section of the dashboard. Three routes: the list, the incident detail, and the per-execution live log.
List
Route: /incidents
Role gating: none.
Searchable, filterable table of every incident in the tenant.
Columns
| Column | Notes |
|---|---|
| Severity | critical / high / medium / low / info |
| Type | service_down, disk_full, cpu_high, memory_high, port_unavailable, custom, etc. |
| Server | Hostname; links to the server detail |
| Status | open → classifying → recipe_proposed → awaiting_approval → executing → resolved (or failed / escalated) |
| Occurrences | Dedup counter for repeat alerts |
| Assigned | User or agent name |
| Source | daemon, webhook, manual, proactive |
| Created | Absolute timestamp |
| Resolution timer | Live ticker; turns red on SLA breach |
Filters
- Status, severity, source.
- Free-text search across hostname, type, evidence.
Actions
- Create incident manually. Modal: pick a server, type, severity,
and initial evidence. Setting type to
customtriggers an informational query — the agent runs the requested check fresh and resolves with the output. - Delete an incident.
- Click a row → incident detail.
Incident detail
Route: /incidents/{id}
Role gating: none for read; approve / reject requires admin.
Header
- Severity and status badges.
- Live SLA timer.
- Assignment control. The incident may be assigned to a human user or to an AI agent; the Assign Agent button starts the agent pipeline immediately.
Tabs
- Timeline. The agent's full reasoning trace. Every stage handoff (triage → diagnose → execute → review), every tool call with its arguments and output, every event recorded by the agent. This is the audit trail for what the agent did and why.
- Evidence. JSON dump of all evidence collected on the incident (monitor output, daemon report, alert payload, anything the agent observed).
- Executions. List of recipe executions tied to this incident. Pending executions show Approve and Reject buttons; approval starts the playbook immediately. Each row links to the execution detail.
- Report. Post-mortem RCA generated by the review agent on resolved incidents.
Review section
Resolved incidents show a Review Agent Performance panel that lets operators score the diagnosis and remediation quality. Reviews feed into agent learning over time.
Execution detail
Route: /executions/{id}
Role gating: none for read; rollback requires admin.
Reached from the Executions tab on an incident.
Sections
- Metadata. Recipe, server, parent incident, who approved it, timestamps.
- Live Output. WebSocket-fed stdout from the running playbook
(
/ws/executions/{id}). - Playbook Output. Per-Ansible-task summary with status
(
ok/changed/failed), expandable stdout/stderr, return code.
Actions
- Rollback — only enabled when the execution succeeded and the recipe defines a rollback playbook. Re-runs the rollback against the same target.