Files
git.stella-ops.org/docs/modules/opsmemory/architecture.md
2026-01-09 18:27:46 +02:00

14 KiB
Raw Blame History

OpsMemory Architecture

Technical deep-dive into the Decision Ledger

Overview

OpsMemory provides a structured approach to organizational learning from security decisions. It captures the complete lifecycle of a decision - from the situation context through the action taken to the eventual outcome.

Design Principles

1. Determinism First

All operations produce deterministic, reproducible results:

  • Similarity vectors are computed from stable inputs
  • Confidence scores use fixed formulas
  • No randomness in suggestion ranking

2. Multi-Tenant Isolation

Every operation is scoped to a tenant:

  • Records cannot be accessed across tenants
  • Similarity search is tenant-isolated
  • Statistics are per-tenant

3. Fire-and-Forget Integration

Decision recording is async and non-blocking:

  • UI decisions complete immediately
  • OpsMemory recording happens in background
  • Failures don't affect the primary flow

4. Offline Capable

All features work without network access:

  • Local PostgreSQL storage
  • No external API dependencies
  • Self-contained similarity computation

Component Architecture

┌────────────────────────────────────────────────────────────────────┐
│                         WebService Layer                            │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │                   OpsMemoryEndpoints                          │  │
│  │  POST /decisions  GET /decisions  GET /suggestions  GET /stats│  │
│  └──────────────────────────────────────────────────────────────┘  │
└────────────────────────────────┬───────────────────────────────────┘
                                 │
┌────────────────────────────────┼───────────────────────────────────┐
│                         Service Layer                               │
│  ┌─────────────────┐  ┌─────────────────┐  ┌────────────────────┐  │
│  │ PlaybookSuggest │  │ OutcomeTracking │  │ SimilarityVector   │  │
│  │    Service      │  │    Service      │  │    Generator       │  │
│  └────────┬────────┘  └────────┬────────┘  └─────────┬──────────┘  │
│           │                    │                     │             │
│           ▼                    ▼                     ▼             │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │                      IOpsMemoryStore                          │  │
│  └──────────────────────────────────────────────────────────────┘  │
└────────────────────────────────┬───────────────────────────────────┘
                                 │
┌────────────────────────────────┼───────────────────────────────────┐
│                       Storage Layer                                 │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │                 PostgresOpsMemoryStore                        │  │
│  │  - Decision CRUD                                              │  │
│  │  - Outcome updates                                            │  │
│  │  - Similarity search (array-based cosine)                     │  │
│  │  - Query with pagination                                      │  │
│  │  - Statistics aggregation                                     │  │
│  └──────────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────────┘

Data Model

OpsMemoryRecord

The core aggregate containing all decision information:

public sealed record OpsMemoryRecord
{
    public required string MemoryId { get; init; }
    public required string TenantId { get; init; }
    public required DateTimeOffset RecordedAt { get; init; }
    public required SituationContext Situation { get; init; }
    public required DecisionRecord Decision { get; init; }
    public OutcomeRecord? Outcome { get; init; }
    public ImmutableArray<float> SimilarityVector { get; init; }
}

SituationContext

Captures the security context at decision time:

public sealed record SituationContext
{
    public string? CveId { get; init; }
    public string? Component { get; init; }       // PURL
    public string? Severity { get; init; }        // low/medium/high/critical
    public ReachabilityStatus Reachability { get; init; }
    public double? EpssScore { get; init; }       // 0-1
    public double? CvssScore { get; init; }       // 0-10
    public bool IsKev { get; init; }
    public ImmutableArray<string> ContextTags { get; init; }
}

DecisionRecord

The action taken and why:

public sealed record DecisionRecord
{
    public required DecisionAction Action { get; init; }
    public required string Rationale { get; init; }
    public required string DecidedBy { get; init; }
    public required DateTimeOffset DecidedAt { get; init; }
    public string? PolicyReference { get; init; }
    public MitigationDetails? Mitigation { get; init; }
}

OutcomeRecord

The result of the decision:

public sealed record OutcomeRecord
{
    public required OutcomeStatus Status { get; init; }
    public TimeSpan? ResolutionTime { get; init; }
    public string? ActualImpact { get; init; }
    public string? LessonsLearned { get; init; }
    public required string RecordedBy { get; init; }
    public required DateTimeOffset RecordedAt { get; init; }
}

Similarity Algorithm

Vector Generation

The SimilarityVectorGenerator creates 50-dimensional feature vectors:

Vector Layout:
[0-9]   : CVE category one-hot (memory, injection, auth, crypto, dos, 
          info-disclosure, privilege-escalation, xss, path-traversal, other)
[10-14] : Severity one-hot (none, low, medium, high, critical)
[15-18] : Reachability one-hot (unknown, reachable, not-reachable, potential)
[19-23] : EPSS band one-hot (0-0.2, 0.2-0.4, 0.4-0.6, 0.6-0.8, 0.8-1.0)
[24-28] : CVSS band one-hot (0-2, 2-4, 4-6, 6-8, 8-10)
[29]    : KEV flag (0 or 1)
[30-39] : Component type one-hot (npm, maven, pypi, nuget, go, cargo, 
          deb, rpm, apk, other)
[40-49] : Context tag presence (production, development, staging, 
          external-facing, internal, payment, auth, data, api, frontend)

Cosine Similarity

Similarity between vectors A and B:

similarity = (A · B) / (||A|| × ||B||)

Where A · B is the dot product and ||A|| is the L2 norm.

CVE Classification

CVEs are classified by analyzing keywords in the CVE ID and description:

Category Keywords
memory buffer, overflow, heap, stack, use-after-free
injection sql, command, code injection, ldap
auth authentication, authorization, bypass
crypto cryptographic, encryption, key
dos denial of service, resource exhaustion
info-disclosure information disclosure, leak
privilege-escalation privilege escalation, elevation
xss cross-site scripting, xss
path-traversal path traversal, directory traversal

Playbook Suggestion Algorithm

Confidence Calculation

confidence = baseSimilarity 
           × successRateBonus 
           × recencyBonus 
           × evidenceCountBonus

Where:

  • baseSimilarity: Highest similarity score from matching records
  • successRateBonus: 1 + (successRate - 0.5) * 0.5 (rewards high success rate)
  • recencyBonus: More recent decisions weighted higher
  • evidenceCountBonus: More evidence = higher confidence

Suggestion Ranking

  1. Group past decisions by action taken
  2. For each action, calculate:
    • Average similarity of records with that action
    • Success rate for that action
    • Number of similar decisions
  3. Compute confidence score
  4. Rank by confidence descending
  5. Return top N suggestions

Rationale Generation

Rationales are generated programmatically:

"{confidence}% confidence based on {count} similar past decisions. 
{action} succeeded in {successRate}% of {factors}."

Storage Design

PostgreSQL Schema

CREATE TABLE opsmemory.decisions (
    memory_id TEXT PRIMARY KEY,
    tenant_id TEXT NOT NULL,
    recorded_at TIMESTAMPTZ NOT NULL,
    
    -- Denormalized situation fields for indexing
    cve_id TEXT,
    component TEXT,
    severity TEXT,
    
    -- Full data as JSONB
    situation JSONB NOT NULL,
    decision JSONB NOT NULL,
    outcome JSONB,
    
    -- Similarity vector as array (not pgvector)
    similarity_vector REAL[] NOT NULL
);

-- Indexes
CREATE INDEX idx_decisions_tenant ON opsmemory.decisions(tenant_id);
CREATE INDEX idx_decisions_recorded ON opsmemory.decisions(recorded_at DESC);
CREATE INDEX idx_decisions_cve ON opsmemory.decisions(cve_id) WHERE cve_id IS NOT NULL;
CREATE INDEX idx_decisions_component ON opsmemory.decisions(component) WHERE component IS NOT NULL;

Why Not pgvector?

The current implementation uses PostgreSQL arrays instead of pgvector:

  1. Simpler deployment: No extension installation required
  2. Smaller dataset: OpsMemory is per-org, not global
  3. Adequate performance: Array operations are fast enough for <100K records
  4. Future option: Can migrate to pgvector if needed

Cosine Similarity in SQL

-- Cosine similarity between query vector and stored vectors
SELECT memory_id,
       (
         SELECT SUM(a * b) 
         FROM UNNEST(similarity_vector, @query_vector) AS t(a, b)
       ) / (
         SQRT((SELECT SUM(a * a) FROM UNNEST(similarity_vector) AS t(a))) *
         SQRT((SELECT SUM(b * b) FROM UNNEST(@query_vector) AS t(b)))
       ) AS similarity
FROM opsmemory.decisions
WHERE tenant_id = @tenant_id
ORDER BY similarity DESC
LIMIT @top_k;

API Design

Endpoint Overview

Method Path Description
POST /api/v1/opsmemory/decisions Record a new decision
GET /api/v1/opsmemory/decisions/{id} Get decision details
POST /api/v1/opsmemory/decisions/{id}/outcome Record outcome
GET /api/v1/opsmemory/suggestions Get playbook suggestions
GET /api/v1/opsmemory/decisions Query past decisions
GET /api/v1/opsmemory/stats Get statistics

Request/Response DTOs

The API uses string-based DTOs that convert to/from internal enums:

// API accepts strings
public record RecordDecisionRequest
{
    public required string Action { get; init; }  // "Remediate", "Accept", etc.
    public string? Reachability { get; init; }    // "reachable", "not-reachable"
}

// Internal uses enums
public enum DecisionAction { Accept, Remediate, Quarantine, ... }
public enum ReachabilityStatus { Unknown, Reachable, NotReachable, Potential }

Testing Strategy

Unit Tests (26 tests)

SimilarityVectorGeneratorTests:

  • Vector dimension validation
  • Feature encoding (severity, reachability, EPSS, CVSS, KEV)
  • Component type classification
  • Context tag encoding
  • Vector normalization
  • Cosine similarity computation
  • Matching factor detection

PlaybookSuggestionServiceTests:

  • Empty history handling
  • Single record suggestions
  • Multiple record ranking
  • Confidence calculation
  • Rationale generation
  • Evidence linking

Integration Tests (5 tests)

PostgresOpsMemoryStoreTests:

  • Decision persistence and retrieval
  • Outcome updates
  • Tenant isolation
  • Query filtering
  • Statistics calculation

Performance Considerations

Indexing Strategy

  • Primary key on memory_id for direct lookups
  • Index on tenant_id for isolation
  • Index on recorded_at for recent-first queries
  • Partial indexes on cve_id and component for filtered queries

Query Optimization

  • Limit similarity search to last N days by default
  • Return only top-K similar records
  • Use cursor-based pagination for large result sets

Caching

Currently no caching (records are infrequently accessed). Future options:

  • Cache similarity vectors in memory
  • Cache recent suggestions per tenant
  • Use read replicas for heavy read loads

Future Enhancements

pgvector Migration

If dataset grows significantly:

  1. Install pgvector extension
  2. Add vector column with IVFFlat index
  3. Replace array-based similarity with vector operations
  4. ~100x speedup for large datasets

ML-Based Suggestions

Replace rule-based confidence with ML model:

  1. Train on historical decision-outcome pairs
  2. Include more features (time of day, team, etc.)
  3. Use gradient boosting or neural network
  4. Continuous learning from new outcomes

Outcome Prediction

Predict outcome before decision is made:

  1. Use past outcomes as training data
  2. Predict success probability per action
  3. Show predicted outcomes in UI
  4. Track prediction accuracy over time