Security model

Required environment, authentication, isolation, audit.

This document consolidates the security posture of OpenRemedy: the boundaries the platform enforces, the credentials it requires, and the mitigations applied to its agentic surface. It is intended for operators deploying OpenRemedy and for auditors reviewing the platform.

The model has been hardened through a focused security pass; every mitigation below corresponds to a closed finding. Where relevant, the source file and behaviour are cited.

Required environment variables

The deploy aborts at compose interpolation time if any of these are missing. The application also validates the values on boot and refuses to start with known dev defaults.

Variable	Purpose	Constraints
`OREMEDY_SECRET_KEY`	JWT signing key (HS256)	At least 32 characters. Rejected if it matches the historical dev defaults (`changeme-dev-secret-key-32chars!!`, `dev-secret-key-change-in-production`, `changeme`). Generate with `openssl rand -base64 48`.
`OREMEDY_ENCRYPTION_KEY`	AES-256-GCM data key for stored secrets	Exactly 64 hex characters (32 bytes). Rejected if it matches dev placeholders such as 64×`a` or the example `0123…`. Generate with `openssl rand -hex 32`.
`POSTGRES_PASSWORD`	Database password	Required, no default.

In production:

OREMEDY_ENV=production activates the production validators.
OREMEDY_DEBUG=true is rejected when OREMEDY_ENV=production.
OREMEDY_CORS_ORIGINS must not contain *. Default in the production compose file is https://${DOMAIN}; override per deployment to add origins.

Authentication

Web sessions: HttpOnly cookies

Browser sessions use two cookies, both HttpOnly + Secure + SameSite=strict, set on a successful POST /auth/login or POST /auth/register:

Cookie	Lifetime	Purpose
`access_token`	`OREMEDY_ACCESS_TOKEN_EXPIRE_MINUTES` (default 480)	API auth
`refresh_token`	`OREMEDY_REFRESH_TOKEN_EXPIRE_DAYS` (default 30)	Refresh path

POST /auth/refresh reads the refresh cookie, validates the JWT, and re-issues a new pair via Set-Cookie. POST /auth/logout clears both. Tokens never appear in JavaScript scope, so an XSS payload cannot read them out of localStorage or out of a fetch() response body.

Programmatic clients: Bearer header

The get_current_user dependency reads the JWT from the access_token cookie first and falls back to Authorization: Bearer <jwt>. CLI tools, the Go daemon, and any non-browser caller can keep using Bearer.

WebSocket handshake

/ws/incidents and /ws/executions/{id} accept the cookie (the browser sends it automatically on a same-origin upgrade) or, as a fallback for non-browser clients, the Sec-WebSocket-Protocol: bearer, <jwt> slot. URL query params are not supported because they leak into proxy access logs. Pre-handshake auth failures close the WS with policy-violation status.

Login rate limiting

POST /auth/login is rate-limited at 10 requests per minute per client IP via slowapi. The bucket key uses the leftmost X-Forwarded-For value only when the immediate TCP peer is in trusted (RFC1918 / loopback) space; otherwise the actual peer address.

Webhook authentication

POST /api/v1/webhooks/alerts/{tenant_slug} requires every request to carry an HMAC-SHA256 signature of the raw body, computed against the tenant's webhook_secret:

X-OpenRemedy-Signature: sha256=<lowercase hex digest>

Each tenant has a unique 32-byte URL-safe webhook_secret, auto-generated at tenant creation (or backfilled by Alembic migration m9c2e8f1a4d3 for pre-existing tenants). Verification uses hmac.compare_digest for constant-time comparison.

The endpoint is also rate-limited at 60 requests per minute per client IP.

Signing examples in bash, Python, and Node.js are in integrations.

Daemon authentication and command signing

Session token

The Go daemon authenticates every call with its session token. On /daemon/v1/heartbeat and /daemon/v1/evidence the token sits in the JSON body. On /daemon/v1/tasks the token is sent in the Authorization: Bearer header. The legacy query-string form (?session_token=…) is still accepted for backwards compatibility but logs a deprecation warning on every call — tokens leak into reverse-proxy access logs and the migration to header-based auth is in progress.

Custom monitor command signatures

Monitors of type=custom carry an HMAC-SHA256 signature in the /daemon/v1/tasks response. The signature is keyed by the daemon's own session token:

signature = HMAC-SHA256(session_token, command).hex()

The daemon recomputes the HMAC before exec and refuses to run unsigned or mismatched commands. Daemon binary v0.2.0 or later is required for signature enforcement. Older daemons silently ignore the new field and remain at their previous risk level until updated.

The threat closed: an attacker with DB write access (SQL injection, leaked credentials) who flips a custom monitor's command no longer gets RCE. The platform-computed HMAC will not match their tampered command and the daemon catches the mismatch before exec.

Tenant isolation

Database scoping

Most resources carry a non-nullable tenant_id column with an index. The exceptions:

audit_logs.tenant_id is nullable (Alembic k4f7a3b2c8d9) so system events such as failed logins from unknown emails can be recorded without inventing a placeholder UUID.
The recipes table is global by design — recipes are a curated catalog. Read and execute are open to all tenant roles. Write (create, update, delete) requires superadmin so a tenant admin cannot inject playbook_paths that other tenants would execute.

WebSocket fanout

Both real-time channels are tenant-scoped server-side. the platform code and the platform code resolve tenant_id from the incident before publishing; the WS handler then drops messages that do not match the connection's JWT-bound tenant_id. Superadmin connections see all tenants.

/ws/executions/{id} additionally verifies, before subscribing, that the connection's tenant owns the execution.

Impersonation

Superadmin can POST /admin/impersonate/{tenant_id} to switch their session into another tenant. The endpoint writes:

a fresh tenant-scoped access_token cookie (30-minute TTL),
the original superadmin token preserved in original_access_token (also HttpOnly).

POST /admin/stop-impersonating swaps them back. The frontend banner reads the impersonated tenant name from sessionStorage (cosmetic only); the tokens themselves are never visible to JavaScript.

Approval gate (trust × risk)

swarm/guardrails.py implements the only path through which a recipe can be auto-executed without human review:

Trust × risk	`low`	`medium`	`high`
`autonomous`	auto	approval	approval
`supervised`	approval	approval	approval
`manual`	approval	approval	approval

The LLM cannot self-approve. Risk classification on a recipe is operator-controlled at create time and cannot be modified by the agent at runtime.

Custom tool sandbox

Tenant operators can define custom tools the agents may call. Each type has explicit guardrails.

`shell_command`

Operator template + LLM-supplied parameters, executed via Ansible's shell module on the target server.

_render_shell_template (in the platform code) wraps every parameter value with shlex.quote before substituting it into the template. Operator-controlled shell features (|, &&, redirects) keep working as written; LLM-supplied values cannot break out of their argument slot. A value such as "nginx; rm -rf /" becomes the literal argument 'nginx; rm -rf /' and is rejected by the target binary as invalid input.

`http_request`

Outbound HTTP call.

_is_safe_public_url resolves the URL host and rejects anything that falls into:

RFC1918 (10/8, 172.16/12, 192.168/16)
Loopback (127/8, ::1)
Link-local / cloud metadata (169.254/16)
IPv6 ULA (fc00::/7) and link-local (fe80::/10)

This blocks SSRF to internal services and to cloud-provider metadata endpoints. Header values containing CRLF are rejected. TLS verification is enabled.

`python_script`

Disabled. Running LLM-supplied Python on the API container is a remote shell with a JSON Schema; there is no safe lightweight sandbox available. Existing tools of this type return a clear error pointing at the migration paths (a fixed shell_command template, or an Ansible playbook recipe).

`run_diagnostic_command` (built-in)

The most-used built-in diagnostic tool no longer accepts a free-form shell command. It accepts an enum verb plus a regex-validated argument (see dashboard/tools for the verb list). Anything outside the enum requires the agent to propose a recipe — a curated, operator-reviewed playbook — instead of improvising a shell command.

LLM client TLS

build_client in the platform code disables TLS verification only when the resolved base URL points at a literal local host (localhost, 127.0.0.1, ::1, 0.0.0.0). A provider configured with verify_ssl: false still gets full certificate validation if the URL points anywhere else.

Encryption at rest

SSH private keys, bearer tokens, and other secrets stored in the secrets table are encrypted with AES-256-GCM keyed by OREMEDY_ENCRYPTION_KEY. A fresh 12-byte nonce is generated for every encryption operation.

Audit log

Every state-changing action writes a row to audit_logs:

Resource type and ID.
Action (created, updated, deleted, executed, approved, auth.login, auth.login_failed, etc.).
Actor (user email, agent name, or NULL for unauthenticated events such as failed logins).
Tenant (nullable for system events).
Timestamp (UTC, second precision).
Detail JSON.
IP address — captured from X-Forwarded-For only when the immediate TCP peer is in trusted (RFC1918 / loopback) space. Otherwise the actual peer is used. Pure header trust (the unconditional pre-hardening behaviour) is gone.

The table is append-only; the application never updates or deletes rows. Read access is tenant-scoped.

Phoenix tracing

Phoenix (Arize) ingests every LLM prompt and response for debugging and observability. Because that data is sensitive, the Phoenix container is on the internal Docker network only — it has no public Caddy route. Operator access is via SSH tunnel:

ssh -L 6006:phoenix:6006 alberto@<host>
# then open http://localhost:6006

CORS

The API uses CORSMiddleware with explicit origins. The production compose file passes https://${DOMAIN} as the default; multiple origins can be supplied as a comma-separated list via OREMEDY_CORS_ORIGINS. * is rejected when OREMEDY_ENV=production because wildcard origins paired with credentialed cookies form a known XSS amplifier.

Threat model recap

What is mitigated:

Tenant admin compromise injecting cross-tenant playbook paths.
DB tampering injecting daemon shell commands.
LLM prompt injection escalating to RCE on the API container or managed servers.
XSS exfiltrating session tokens.
Cross-tenant leakage on real-time channels.
Credential stuffing on /auth/login.
Webhook spam and unauthenticated alert injection.
SSRF from custom HTTP tools to internal services.
TLS bypass on remote LLM providers.

What is not mitigated by this layer (out of scope, requires the infrastructure underneath):

Compromise of the host running the API process.
Compromise of the deployed OREMEDY_SECRET_KEY or OREMEDY_ENCRYPTION_KEY.
Compromise of the underlying PostgreSQL / Redis / SeaweedFS.
Network-level attacks against Caddy.

These are deployment-layer concerns and require the same standard hardening as any other production service (private networking, disk-level encryption, OS hardening, runtime EDR, etc.).