stella-ops.org/git.stella-ops.org

Fork 0

Files

master a21d3dbc1f save progress

2026-01-09 18:27:46 +02:00

14 KiB

Raw Blame History

OpsMemory Architecture

Technical deep-dive into the Decision Ledger

Overview

OpsMemory provides a structured approach to organizational learning from security decisions. It captures the complete lifecycle of a decision - from the situation context through the action taken to the eventual outcome.

Design Principles

1. Determinism First

All operations produce deterministic, reproducible results:

Similarity vectors are computed from stable inputs
Confidence scores use fixed formulas
No randomness in suggestion ranking

2. Multi-Tenant Isolation

Every operation is scoped to a tenant:

Records cannot be accessed across tenants
Similarity search is tenant-isolated
Statistics are per-tenant

3. Fire-and-Forget Integration

Decision recording is async and non-blocking:

UI decisions complete immediately
OpsMemory recording happens in background
Failures don't affect the primary flow

4. Offline Capable

All features work without network access:

Local PostgreSQL storage
No external API dependencies
Self-contained similarity computation

Component Architecture

┌────────────────────────────────────────────────────────────────────┐
│                         WebService Layer                            │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │                   OpsMemoryEndpoints                          │  │
│  │  POST /decisions  GET /decisions  GET /suggestions  GET /stats│  │
│  └──────────────────────────────────────────────────────────────┘  │
└────────────────────────────────┬───────────────────────────────────┘
                                 │
┌────────────────────────────────┼───────────────────────────────────┐
│                         Service Layer                               │
│  ┌─────────────────┐  ┌─────────────────┐  ┌────────────────────┐  │
│  │ PlaybookSuggest │  │ OutcomeTracking │  │ SimilarityVector   │  │
│  │    Service      │  │    Service      │  │    Generator       │  │
│  └────────┬────────┘  └────────┬────────┘  └─────────┬──────────┘  │
│           │                    │                     │             │
│           ▼                    ▼                     ▼             │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │                      IOpsMemoryStore                          │  │
│  └──────────────────────────────────────────────────────────────┘  │
└────────────────────────────────┬───────────────────────────────────┘
                                 │
┌────────────────────────────────┼───────────────────────────────────┐
│                       Storage Layer                                 │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │                 PostgresOpsMemoryStore                        │  │
│  │  - Decision CRUD                                              │  │
│  │  - Outcome updates                                            │  │
│  │  - Similarity search (array-based cosine)                     │  │
│  │  - Query with pagination                                      │  │
│  │  - Statistics aggregation                                     │  │
│  └──────────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────────┘

Data Model

OpsMemoryRecord

The core aggregate containing all decision information:

public sealed record OpsMemoryRecord
{
    public required string MemoryId { get; init; }
    public required string TenantId { get; init; }
    public required DateTimeOffset RecordedAt { get; init; }
    public required SituationContext Situation { get; init; }
    public required DecisionRecord Decision { get; init; }
    public OutcomeRecord? Outcome { get; init; }
    public ImmutableArray<float> SimilarityVector { get; init; }
}

SituationContext

Captures the security context at decision time:

public sealed record SituationContext
{
    public string? CveId { get; init; }
    public string? Component { get; init; }       // PURL
    public string? Severity { get; init; }        // low/medium/high/critical
    public ReachabilityStatus Reachability { get; init; }
    public double? EpssScore { get; init; }       // 0-1
    public double? CvssScore { get; init; }       // 0-10
    public bool IsKev { get; init; }
    public ImmutableArray<string> ContextTags { get; init; }
}

DecisionRecord

The action taken and why:

public sealed record DecisionRecord
{
    public required DecisionAction Action { get; init; }
    public required string Rationale { get; init; }
    public required string DecidedBy { get; init; }
    public required DateTimeOffset DecidedAt { get; init; }
    public string? PolicyReference { get; init; }
    public MitigationDetails? Mitigation { get; init; }
}

OutcomeRecord

The result of the decision:

public sealed record OutcomeRecord
{
    public required OutcomeStatus Status { get; init; }
    public TimeSpan? ResolutionTime { get; init; }
    public string? ActualImpact { get; init; }
    public string? LessonsLearned { get; init; }
    public required string RecordedBy { get; init; }
    public required DateTimeOffset RecordedAt { get; init; }
}

Similarity Algorithm

Vector Generation

The SimilarityVectorGenerator creates 50-dimensional feature vectors:

Vector Layout:
[0-9]   : CVE category one-hot (memory, injection, auth, crypto, dos, 
          info-disclosure, privilege-escalation, xss, path-traversal, other)
[10-14] : Severity one-hot (none, low, medium, high, critical)
[15-18] : Reachability one-hot (unknown, reachable, not-reachable, potential)
[19-23] : EPSS band one-hot (0-0.2, 0.2-0.4, 0.4-0.6, 0.6-0.8, 0.8-1.0)
[24-28] : CVSS band one-hot (0-2, 2-4, 4-6, 6-8, 8-10)
[29]    : KEV flag (0 or 1)
[30-39] : Component type one-hot (npm, maven, pypi, nuget, go, cargo, 
          deb, rpm, apk, other)
[40-49] : Context tag presence (production, development, staging, 
          external-facing, internal, payment, auth, data, api, frontend)

Cosine Similarity

Similarity between vectors A and B:

similarity = (A · B) / (||A|| × ||B||)

Where A · B is the dot product and ||A|| is the L2 norm.

CVE Classification

CVEs are classified by analyzing keywords in the CVE ID and description:

Category	Keywords
memory	buffer, overflow, heap, stack, use-after-free
injection	sql, command, code injection, ldap
auth	authentication, authorization, bypass
crypto	cryptographic, encryption, key
dos	denial of service, resource exhaustion
info-disclosure	information disclosure, leak
privilege-escalation	privilege escalation, elevation
xss	cross-site scripting, xss
path-traversal	path traversal, directory traversal

Playbook Suggestion Algorithm

Confidence Calculation

confidence = baseSimilarity 
           × successRateBonus 
           × recencyBonus 
           × evidenceCountBonus

Where:

baseSimilarity: Highest similarity score from matching records
successRateBonus: 1 + (successRate - 0.5) * 0.5 (rewards high success rate)
recencyBonus: More recent decisions weighted higher
evidenceCountBonus: More evidence = higher confidence

Suggestion Ranking

Group past decisions by action taken
For each action, calculate:
- Average similarity of records with that action
- Success rate for that action
- Number of similar decisions
Compute confidence score
Rank by confidence descending
Return top N suggestions

Rationale Generation

Rationales are generated programmatically:

"{confidence}% confidence based on {count} similar past decisions. 
{action} succeeded in {successRate}% of {factors}."

Storage Design

PostgreSQL Schema

CREATE TABLE opsmemory.decisions (
    memory_id TEXT PRIMARY KEY,
    tenant_id TEXT NOT NULL,
    recorded_at TIMESTAMPTZ NOT NULL,
    
    -- Denormalized situation fields for indexing
    cve_id TEXT,
    component TEXT,
    severity TEXT,
    
    -- Full data as JSONB
    situation JSONB NOT NULL,
    decision JSONB NOT NULL,
    outcome JSONB,
    
    -- Similarity vector as array (not pgvector)
    similarity_vector REAL[] NOT NULL
);

-- Indexes
CREATE INDEX idx_decisions_tenant ON opsmemory.decisions(tenant_id);
CREATE INDEX idx_decisions_recorded ON opsmemory.decisions(recorded_at DESC);
CREATE INDEX idx_decisions_cve ON opsmemory.decisions(cve_id) WHERE cve_id IS NOT NULL;
CREATE INDEX idx_decisions_component ON opsmemory.decisions(component) WHERE component IS NOT NULL;

Why Not pgvector?

The current implementation uses PostgreSQL arrays instead of pgvector:

Simpler deployment: No extension installation required
Smaller dataset: OpsMemory is per-org, not global
Adequate performance: Array operations are fast enough for <100K records
Future option: Can migrate to pgvector if needed

Cosine Similarity in SQL

-- Cosine similarity between query vector and stored vectors
SELECT memory_id,
       (
         SELECT SUM(a * b) 
         FROM UNNEST(similarity_vector, @query_vector) AS t(a, b)
       ) / (
         SQRT((SELECT SUM(a * a) FROM UNNEST(similarity_vector) AS t(a))) *
         SQRT((SELECT SUM(b * b) FROM UNNEST(@query_vector) AS t(b)))
       ) AS similarity
FROM opsmemory.decisions
WHERE tenant_id = @tenant_id
ORDER BY similarity DESC
LIMIT @top_k;

API Design

Endpoint Overview

Method	Path	Description
POST	`/api/v1/opsmemory/decisions`	Record a new decision
GET	`/api/v1/opsmemory/decisions/{id}`	Get decision details
POST	`/api/v1/opsmemory/decisions/{id}/outcome`	Record outcome
GET	`/api/v1/opsmemory/suggestions`	Get playbook suggestions
GET	`/api/v1/opsmemory/decisions`	Query past decisions
GET	`/api/v1/opsmemory/stats`	Get statistics

Request/Response DTOs

The API uses string-based DTOs that convert to/from internal enums:

// API accepts strings
public record RecordDecisionRequest
{
    public required string Action { get; init; }  // "Remediate", "Accept", etc.
    public string? Reachability { get; init; }    // "reachable", "not-reachable"
}

// Internal uses enums
public enum DecisionAction { Accept, Remediate, Quarantine, ... }
public enum ReachabilityStatus { Unknown, Reachable, NotReachable, Potential }

Testing Strategy

Unit Tests (26 tests)

SimilarityVectorGeneratorTests:

Vector dimension validation
Feature encoding (severity, reachability, EPSS, CVSS, KEV)
Component type classification
Context tag encoding
Vector normalization
Cosine similarity computation
Matching factor detection

PlaybookSuggestionServiceTests:

Empty history handling
Single record suggestions
Multiple record ranking
Confidence calculation
Rationale generation
Evidence linking

Integration Tests (5 tests)

PostgresOpsMemoryStoreTests:

Decision persistence and retrieval
Outcome updates
Tenant isolation
Query filtering
Statistics calculation

Performance Considerations

Indexing Strategy

Primary key on memory_id for direct lookups
Index on tenant_id for isolation
Index on recorded_at for recent-first queries
Partial indexes on cve_id and component for filtered queries

Query Optimization

Limit similarity search to last N days by default
Return only top-K similar records
Use cursor-based pagination for large result sets

Caching

Currently no caching (records are infrequently accessed). Future options:

Cache similarity vectors in memory
Cache recent suggestions per tenant
Use read replicas for heavy read loads

Future Enhancements

pgvector Migration

If dataset grows significantly:

Install pgvector extension
Add vector column with IVFFlat index
Replace array-based similarity with vector operations
~100x speedup for large datasets

ML-Based Suggestions

Replace rule-based confidence with ML model:

Train on historical decision-outcome pairs
Include more features (time of day, team, etc.)
Use gradient boosting or neural network
Continuous learning from new outcomes

Outcome Prediction

Predict outcome before decision is made:

Use past outcomes as training data
Predict success probability per action
Show predicted outcomes in UI
Track prediction accuracy over time

14 KiB Raw Blame History Unescape Escape