Security model
Required environment, authentication, isolation, audit.
This document consolidates the security posture of OpenRemedy: the boundaries the platform enforces, the credentials it requires, and the mitigations applied to its agentic surface. It is intended for operators deploying OpenRemedy and for auditors reviewing the platform.
The model has been hardened through a focused security pass; every mitigation below corresponds to a closed finding. Where relevant, the source file and behaviour are cited.
Required environment variables
The deploy aborts at compose interpolation time if any of these are missing. The application also validates the values on boot and refuses to start with known dev defaults.
| Variable | Purpose | Constraints |
|---|---|---|
OREMEDY_SECRET_KEY | JWT signing key (HS256) | At least 32 characters. Rejected if it matches the historical dev defaults (changeme-dev-secret-key-32chars!!, dev-secret-key-change-in-production, changeme). Generate with openssl rand -base64 48. |
OREMEDY_ENCRYPTION_KEY | AES-256-GCM data key for stored secrets | Exactly 64 hex characters (32 bytes). Rejected if it matches dev placeholders such as 64×a or the example 0123…. Generate with openssl rand -hex 32. |
POSTGRES_PASSWORD | Database password | Required, no default. |
In production:
OREMEDY_ENV=productionactivates the production validators.OREMEDY_DEBUG=trueis rejected whenOREMEDY_ENV=production.OREMEDY_CORS_ORIGINSmust not contain*. Default in the production compose file ishttps://${DOMAIN}; override per deployment to add origins.
Authentication
Web sessions: HttpOnly cookies
Browser sessions use two cookies, both HttpOnly + Secure +
SameSite=strict, set on a successful POST /auth/login or
POST /auth/register:
| Cookie | Lifetime | Purpose |
|---|---|---|
access_token | OREMEDY_ACCESS_TOKEN_EXPIRE_MINUTES (default 480) | API auth |
refresh_token | OREMEDY_REFRESH_TOKEN_EXPIRE_DAYS (default 30) | Refresh path |
POST /auth/refresh reads the refresh cookie, validates the JWT, and
re-issues a new pair via Set-Cookie. POST /auth/logout clears
both. Tokens never appear in JavaScript scope, so an XSS payload
cannot read them out of localStorage or out of a fetch() response
body.
Programmatic clients: Bearer header
The get_current_user dependency reads the JWT from the
access_token cookie first and falls back to
Authorization: Bearer <jwt>. CLI tools, the Go daemon, and any
non-browser caller can keep using Bearer.
WebSocket handshake
/ws/incidents and /ws/executions/{id} accept the cookie (the
browser sends it automatically on a same-origin upgrade) or, as a
fallback for non-browser clients, the
Sec-WebSocket-Protocol: bearer, <jwt> slot. URL query params are
not supported because they leak into proxy access logs. Pre-handshake
auth failures close the WS with policy-violation status.
Login rate limiting
POST /auth/login is rate-limited at 10 requests per minute per
client IP via slowapi. The bucket key uses
the leftmost X-Forwarded-For value only when the immediate TCP peer
is in trusted (RFC1918 / loopback) space; otherwise the actual peer
address.
Webhook authentication
POST /api/v1/webhooks/alerts/{tenant_slug} requires every request
to carry an HMAC-SHA256 signature of the raw body, computed against
the tenant's webhook_secret:
X-OpenRemedy-Signature: sha256=<lowercase hex digest>Each tenant has a unique 32-byte URL-safe webhook_secret,
auto-generated at tenant creation (or backfilled by Alembic
migration m9c2e8f1a4d3 for pre-existing tenants). Verification
uses hmac.compare_digest for constant-time comparison.
The endpoint is also rate-limited at 60 requests per minute per client IP.
Signing examples in bash, Python, and Node.js are in integrations.
Daemon authentication and command signing
Session token
The Go daemon authenticates every call with its session token. On
/daemon/v1/heartbeat and /daemon/v1/evidence the token sits in
the JSON body. On /daemon/v1/tasks the token is sent in the
Authorization: Bearer header. The legacy query-string form
(?session_token=…) is still accepted for backwards compatibility
but logs a deprecation warning on every call — tokens leak into
reverse-proxy access logs and the migration to header-based auth is
in progress.
Custom monitor command signatures
Monitors of type=custom carry an HMAC-SHA256 signature in the
/daemon/v1/tasks response. The signature is keyed by the daemon's
own session token:
signature = HMAC-SHA256(session_token, command).hex()The daemon recomputes the HMAC before exec and refuses to run unsigned or mismatched commands. Daemon binary v0.2.0 or later is required for signature enforcement. Older daemons silently ignore the new field and remain at their previous risk level until updated.
The threat closed: an attacker with DB write access (SQL injection,
leaked credentials) who flips a custom monitor's command no longer
gets RCE. The platform-computed HMAC will not match their tampered
command and the daemon catches the mismatch before exec.
Tenant isolation
Database scoping
Most resources carry a non-nullable tenant_id column with an index.
The exceptions:
audit_logs.tenant_idis nullable (Alembick4f7a3b2c8d9) so system events such as failed logins from unknown emails can be recorded without inventing a placeholder UUID.- The
recipestable is global by design — recipes are a curated catalog. Read and execute are open to all tenant roles. Write (create, update, delete) requiressuperadminso a tenant admin cannot injectplaybook_paths that other tenants would execute.
WebSocket fanout
Both real-time channels are tenant-scoped server-side. the platform code
and the platform code resolve tenant_id from the incident before
publishing; the WS handler then drops messages that do not match the
connection's JWT-bound tenant_id. Superadmin connections see all
tenants.
/ws/executions/{id} additionally verifies, before subscribing, that
the connection's tenant owns the execution.
Impersonation
Superadmin can POST /admin/impersonate/{tenant_id} to switch their
session into another tenant. The endpoint writes:
- a fresh tenant-scoped
access_tokencookie (30-minute TTL), - the original superadmin token preserved in
original_access_token(alsoHttpOnly).
POST /admin/stop-impersonating swaps them back. The frontend banner
reads the impersonated tenant name from sessionStorage (cosmetic
only); the tokens themselves are never visible to JavaScript.
Approval gate (trust × risk)
swarm/guardrails.py implements the only path through which a recipe
can be auto-executed without human review:
| Trust × risk | low | medium | high |
|---|---|---|---|
autonomous | auto | approval | approval |
supervised | approval | approval | approval |
manual | approval | approval | approval |
The LLM cannot self-approve. Risk classification on a recipe is operator-controlled at create time and cannot be modified by the agent at runtime.
Custom tool sandbox
Tenant operators can define custom tools the agents may call. Each type has explicit guardrails.
shell_command
Operator template + LLM-supplied parameters, executed via Ansible's
shell module on the target server.
_render_shell_template (in the platform code) wraps every
parameter value with shlex.quote before substituting it into the
template. Operator-controlled shell features (|, &&, redirects)
keep working as written; LLM-supplied values cannot break out of
their argument slot. A value such as "nginx; rm -rf /" becomes the
literal argument 'nginx; rm -rf /' and is rejected by the target
binary as invalid input.
http_request
Outbound HTTP call.
_is_safe_public_url resolves the URL host and rejects anything that
falls into:
- RFC1918 (
10/8,172.16/12,192.168/16) - Loopback (
127/8,::1) - Link-local / cloud metadata (
169.254/16) - IPv6 ULA (
fc00::/7) and link-local (fe80::/10)
This blocks SSRF to internal services and to cloud-provider metadata endpoints. Header values containing CRLF are rejected. TLS verification is enabled.
python_script
Disabled. Running LLM-supplied Python on the API container is a
remote shell with a JSON Schema; there is no safe lightweight
sandbox available. Existing tools of this type return a clear error
pointing at the migration paths (a fixed shell_command template, or
an Ansible playbook recipe).
run_diagnostic_command (built-in)
The most-used built-in diagnostic tool no longer accepts a free-form shell command. It accepts an enum verb plus a regex-validated argument (see dashboard/tools for the verb list). Anything outside the enum requires the agent to propose a recipe — a curated, operator-reviewed playbook — instead of improvising a shell command.
LLM client TLS
build_client in the platform code disables TLS verification
only when the resolved base URL points at a literal local host
(localhost, 127.0.0.1, ::1, 0.0.0.0). A provider configured
with verify_ssl: false still gets full certificate validation if
the URL points anywhere else.
Encryption at rest
SSH private keys, bearer tokens, and other secrets stored in the
secrets table are encrypted with AES-256-GCM keyed by
OREMEDY_ENCRYPTION_KEY. A fresh 12-byte nonce is generated for
every encryption operation.
Audit log
Every state-changing action writes a row to audit_logs:
- Resource type and ID.
- Action (
created,updated,deleted,executed,approved,auth.login,auth.login_failed, etc.). - Actor (user email, agent name, or
NULLfor unauthenticated events such as failed logins). - Tenant (nullable for system events).
- Timestamp (UTC, second precision).
- Detail JSON.
- IP address — captured from
X-Forwarded-Foronly when the immediate TCP peer is in trusted (RFC1918 / loopback) space. Otherwise the actual peer is used. Pure header trust (the unconditional pre-hardening behaviour) is gone.
The table is append-only; the application never updates or deletes rows. Read access is tenant-scoped.
Phoenix tracing
Phoenix (Arize) ingests every LLM prompt and response for debugging and observability. Because that data is sensitive, the Phoenix container is on the internal Docker network only — it has no public Caddy route. Operator access is via SSH tunnel:
ssh -L 6006:phoenix:6006 alberto@<host>
# then open http://localhost:6006CORS
The API uses CORSMiddleware with explicit origins. The production
compose file passes https://${DOMAIN} as the default; multiple
origins can be supplied as a comma-separated list via
OREMEDY_CORS_ORIGINS. * is rejected when OREMEDY_ENV=production
because wildcard origins paired with credentialed cookies form a
known XSS amplifier.
Threat model recap
What is mitigated:
- Tenant admin compromise injecting cross-tenant playbook paths.
- DB tampering injecting daemon shell commands.
- LLM prompt injection escalating to RCE on the API container or managed servers.
- XSS exfiltrating session tokens.
- Cross-tenant leakage on real-time channels.
- Credential stuffing on
/auth/login. - Webhook spam and unauthenticated alert injection.
- SSRF from custom HTTP tools to internal services.
- TLS bypass on remote LLM providers.
What is not mitigated by this layer (out of scope, requires the infrastructure underneath):
- Compromise of the host running the API process.
- Compromise of the deployed
OREMEDY_SECRET_KEYorOREMEDY_ENCRYPTION_KEY. - Compromise of the underlying PostgreSQL / Redis / SeaweedFS.
- Network-level attacks against Caddy.
These are deployment-layer concerns and require the same standard hardening as any other production service (private networking, disk-level encryption, OS hardening, runtime EDR, etc.).