Agent Security Model

BaselineOS provides defense-in-depth for AI agent operations. Every layer is designed to contain blast radius if any single component is compromised.

The Threat Model

AI agents operate with real credentials and real infrastructure access. The threats:

Threat	Example	Traditional defense	BaselineOS defense
Credential exfiltration	Prompt injection causes agent to leak API keys	Don’t put keys in prompts	Opaque grants — agent never sees raw values
Credential persistence	Compromised agent retains access indefinitely	Rotate keys manually	Time-boxed leases — auto-expire in seconds
Lateral movement	Agent with deploy access uses it for data exfiltration	Network segmentation	Per-agent scoping — each agent sees only its credentials
Privilege escalation	Low-trust agent performs high-risk operations	RBAC	Trust scoring — higher-risk steps require higher trust
No accountability	Unknown agent accessed production credentials	Manual log review	Persistent audit trail — every access, every accessor, every timestamp
Uncontrolled execution	Agent invents its own workflow with arbitrary commands	Hope for the best	Execution protocols — agent follows defined steps with defined access

Defense Layers

┌─────────────────────────────────────────────────────────────────┐
│ Layer 5: Execution Protocols                                     │
│   Agent follows defined steps, not freeform commands             │
├─────────────────────────────────────────────────────────────────┤
│ Layer 4: Trust Scoring                                           │
│   Per-protocol, per-step, per-credential trust gates             │
├─────────────────────────────────────────────────────────────────┤
│ Layer 3: Opaque Credential Grants                                │
│   Agent gets handle, not value — can use but can't exfiltrate    │
├─────────────────────────────────────────────────────────────────┤
│ Layer 2: Time-Boxed Leases                                       │
│   Credentials auto-expire after step timeout                     │
├─────────────────────────────────────────────────────────────────┤
│ Layer 1: Encrypted Vault                                         │
│   AES-256-CBC at rest, per-agent scoping, persistent audit trail │
└─────────────────────────────────────────────────────────────────┘

Each layer contains failures in the layers above it:

Protocol defines the steps → but what if the step is compromised?
Trust scoring gates access → but what if trust is too high?
Opaque grants hide values → but what if the runner is compromised?
Leases auto-expire → but what if the vault is compromised?
Encryption protects storage → bottom of the stack

What Happens When Something Is Compromised

Agent is compromised (prompt injection, jailbreak)

Traditional: Agent has VERCEL_TOKEN=vcel_... in its environment. Attacker has the token forever.

BaselineOS:

Agent has a lease handle (lease-xxx), not the token value
The lease expires in seconds (step timeout)
Even if the attacker extracts the handle, it resolves to nothing after the step ends
The agent never had the raw value in its memory or context

Credential is leaked

Traditional: Leaked ANTHROPIC_API_KEY is valid until manually rotated. Could be months.

BaselineOS:

Leased credentials auto-expire after TTL (seconds to minutes)
Vault audit trail shows exactly when and how the credential was accessed
vault.rotate('anthropic', newKey) immediately invalidates all active leases
vault.getAccessLog() shows the full forensic trail

Agent attempts unauthorized access

Traditional: Agent reads process.env.DATABASE_URL — nothing stops it.

BaselineOS:

Agent requests capability through protocol → trust score checked
Agent trust 60 < required 85 for production database → denied
Denial logged to audit trail with agent ID, trust score, and reason
Agent cannot bypass — credential is not in env, only in vault

Credential Flow Diagram

Organization defines:
  defineBaseline({
    credentials: {
      vercel: { envVar: 'VERCEL_TOKEN', minTrustScore: 60 },
      'prod-db': { envVar: 'DATABASE_URL', minTrustScore: 85, scope: 'agent', allowedAccessors: ['data-agent'] },
    }
  })
         │
         ▼
  CredentialVault auto-resolves from env
  Encrypts with AES-256-CBC
  Persists to SQLite
         │
         ▼
  Protocol defines steps:
    S1: "Run tests" (no credentials needed)
    S2: "Deploy" (needs vercel-token, trust 60)
    S3: "Migrate DB" (needs prod-db, trust 85)
         │
         ▼
  Agent starts execution:
    trust score: 75
         │
         ▼
  S1: ✓ passes (no credentials)
  S2: ✓ vercel-token → lease issued (TTL 60s) → handle to agent → runner injects → lease revoked
  S3: ✗ prod-db → DENIED (trust 75 < required 85) → logged → step fails

Secret Scanning

The Context Integrity Engine scans all context entries for embedded secrets:

// These patterns are rejected on context registration:
const secretPatterns = [
  /(?:sk-|pk-|api[_-]?key)[a-zA-Z0-9\-_]{20,}/,   // API keys
  /(?:ghp|gho|ghu|ghs|ghr)_[A-Za-z0-9_]{36,}/,     // GitHub tokens
  /xox[bpors]-[A-Za-z0-9\-]+/,                       // Slack tokens
  /-----BEGIN (?:RSA |EC )?PRIVATE KEY-----/,         // Private keys
  /AKIA[0-9A-Z]{16}/,                                // AWS access keys
];

If an agent tries to register context containing secrets, it’s blocked before storage.

API Server Security

The /api/context/* endpoints include:

Control	Implementation
Authentication	API key required (Bearer token or X-API-Key header)
Authorization	`authorizedBy` derived from auth context, not user input
Rate limiting	20 destructive operations per hour
Input validation	Content size limits (100KB), find/replace limits (10K chars)
Response sanitization	Snapshots stripped — no sensitive content in responses
Scope validation	Enum-validated scope parameter
Body size limit	1MB max request body

Compliance Mapping

Standard	Control	BaselineOS Implementation
OWASP Agentic Top 10	Tool Misuse	Execution protocols define allowed tools per step
OWASP Agentic Top 10	Prompt Injection	Input sanitizer (11 patterns), opaque credentials
OWASP Agentic Top 10	Excessive Agency	Trust scoring gates autonomous actions
SOC 2	Access Control	Per-agent credential scoping, trust scoring
SOC 2	Audit Trail	SQLite-persisted access log, lease lifecycle
SOC 2	Encryption	AES-256-CBC at rest, TLS in transit
GDPR Art. 32	Security of Processing	Encryption, access control, audit trail
ISO 27001	A.9 Access Control	Trust-scored, scoped, time-boxed access