git.stella-ops.org/docs/modules/opsmemory/architecture.md

# OpsMemory Architecture

> **Technical deep-dive into the Decision Ledger**

## Overview

OpsMemory provides a structured approach to organizational learning from security decisions. It captures the complete lifecycle of a decision - from the situation context through the action taken to the eventual outcome.

## Design Principles

### 1. Determinism First

All operations produce deterministic, reproducible results:
- Similarity vectors are computed from stable inputs
- Confidence scores use fixed formulas
- No randomness in suggestion ranking

### 2. Multi-Tenant Isolation

Every operation is scoped to a tenant:
- Records cannot be accessed across tenants
- Similarity search is tenant-isolated
- Statistics are per-tenant

### 3. Fire-and-Forget Integration

Decision recording is async and non-blocking:
- UI decisions complete immediately
- OpsMemory recording happens in background
- Failures don't affect the primary flow

### 4. Offline Capable

All features work without network access:
- Local PostgreSQL storage
- No external API dependencies
- Self-contained similarity computation

## Component Architecture

```
┌────────────────────────────────────────────────────────────────────┐
│                         WebService Layer                            │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │                   OpsMemoryEndpoints                          │  │
│  │  POST /decisions  GET /decisions  GET /suggestions  GET /stats│  │
│  └──────────────────────────────────────────────────────────────┘  │
└────────────────────────────────┬───────────────────────────────────┘
                                 │
┌────────────────────────────────┼───────────────────────────────────┐
│                         Service Layer                               │
│  ┌─────────────────┐  ┌─────────────────┐  ┌────────────────────┐  │
│  │ PlaybookSuggest │  │ OutcomeTracking │  │ SimilarityVector   │  │
│  │    Service      │  │    Service      │  │    Generator       │  │
│  └────────┬────────┘  └────────┬────────┘  └─────────┬──────────┘  │
│           │                    │                     │             │
│           ▼                    ▼                     ▼             │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │                      IOpsMemoryStore                          │  │
│  └──────────────────────────────────────────────────────────────┘  │
└────────────────────────────────┬───────────────────────────────────┘
                                 │
┌────────────────────────────────┼───────────────────────────────────┐
│                       Storage Layer                                 │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │                 PostgresOpsMemoryStore                        │  │
│  │  - Decision CRUD                                              │  │
│  │  - Outcome updates                                            │  │
│  │  - Similarity search (array-based cosine)                     │  │
│  │  - Query with pagination                                      │  │
│  │  - Statistics aggregation                                     │  │
│  └──────────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────────┘
```

## Data Model

### OpsMemoryRecord

The core aggregate containing all decision information:

```csharp
public sealed record OpsMemoryRecord
{
    public required string MemoryId { get; init; }
    public required string TenantId { get; init; }
    public required DateTimeOffset RecordedAt { get; init; }
    public required SituationContext Situation { get; init; }
    public required DecisionRecord Decision { get; init; }
    public OutcomeRecord? Outcome { get; init; }
    public ImmutableArray<float> SimilarityVector { get; init; }
}
```

### SituationContext

Captures the security context at decision time:

```csharp
public sealed record SituationContext
{
    public string? CveId { get; init; }
    public string? Component { get; init; }       // PURL
    public string? Severity { get; init; }        // low/medium/high/critical
    public ReachabilityStatus Reachability { get; init; }
    public double? EpssScore { get; init; }       // 0-1
    public double? CvssScore { get; init; }       // 0-10
    public bool IsKev { get; init; }
    public ImmutableArray<string> ContextTags { get; init; }
}
```

### DecisionRecord

The action taken and why:

```csharp
public sealed record DecisionRecord
{
    public required DecisionAction Action { get; init; }
    public required string Rationale { get; init; }
    public required string DecidedBy { get; init; }
    public required DateTimeOffset DecidedAt { get; init; }
    public string? PolicyReference { get; init; }
    public MitigationDetails? Mitigation { get; init; }
}
```

### OutcomeRecord

The result of the decision:

```csharp
public sealed record OutcomeRecord
{
    public required OutcomeStatus Status { get; init; }
    public TimeSpan? ResolutionTime { get; init; }
    public string? ActualImpact { get; init; }
    public string? LessonsLearned { get; init; }
    public required string RecordedBy { get; init; }
    public required DateTimeOffset RecordedAt { get; init; }
}
```

## Similarity Algorithm

### Vector Generation

The `SimilarityVectorGenerator` creates 50-dimensional feature vectors:

```
Vector Layout:
[0-9]   : CVE category one-hot (memory, injection, auth, crypto, dos,
          info-disclosure, privilege-escalation, xss, path-traversal, other)
[10-14] : Severity one-hot (none, low, medium, high, critical)
[15-18] : Reachability one-hot (unknown, reachable, not-reachable, potential)
[19-23] : EPSS band one-hot (0-0.2, 0.2-0.4, 0.4-0.6, 0.6-0.8, 0.8-1.0)
[24-28] : CVSS band one-hot (0-2, 2-4, 4-6, 6-8, 8-10)
[29]    : KEV flag (0 or 1)
[30-39] : Component type one-hot (npm, maven, pypi, nuget, go, cargo,
          deb, rpm, apk, other)
[40-49] : Context tag presence (production, development, staging,
          external-facing, internal, payment, auth, data, api, frontend)
```

### Cosine Similarity

Similarity between vectors A and B:

```
similarity = (A · B) / (||A|| × ||B||)
```

Where `A · B` is the dot product and `||A||` is the L2 norm.

### CVE Classification

CVEs are classified by analyzing keywords in the CVE ID and description:

| Category | Keywords |
|----------|----------|
| memory | buffer, overflow, heap, stack, use-after-free |
| injection | sql, command, code injection, ldap |
| auth | authentication, authorization, bypass |
| crypto | cryptographic, encryption, key |
| dos | denial of service, resource exhaustion |
| info-disclosure | information disclosure, leak |
| privilege-escalation | privilege escalation, elevation |
| xss | cross-site scripting, xss |
| path-traversal | path traversal, directory traversal |

## Playbook Suggestion Algorithm

### Confidence Calculation

```csharp
confidence = baseSimilarity
           × successRateBonus
           × recencyBonus
           × evidenceCountBonus
```

Where:
- `baseSimilarity`: Highest similarity score from matching records
- `successRateBonus`: `1 + (successRate - 0.5) * 0.5` (rewards high success rate)
- `recencyBonus`: More recent decisions weighted higher
- `evidenceCountBonus`: More evidence = higher confidence

### Suggestion Ranking

1. Group past decisions by action taken
2. For each action, calculate:
   - Average similarity of records with that action
   - Success rate for that action
   - Number of similar decisions
3. Compute confidence score
4. Rank by confidence descending
5. Return top N suggestions

### Rationale Generation

Rationales are generated programmatically:

```
"{confidence}% confidence based on {count} similar past decisions.
{action} succeeded in {successRate}% of {factors}."
```

## Storage Design

### PostgreSQL Schema

```sql
CREATE TABLE opsmemory.decisions (
    memory_id TEXT PRIMARY KEY,
    tenant_id TEXT NOT NULL,
    recorded_at TIMESTAMPTZ NOT NULL,

    -- Denormalized situation fields for indexing
    cve_id TEXT,
    component TEXT,
    severity TEXT,

    -- Full data as JSONB
    situation JSONB NOT NULL,
    decision JSONB NOT NULL,
    outcome JSONB,

    -- Similarity vector as array (not pgvector)
    similarity_vector REAL[] NOT NULL
);

-- Indexes
CREATE INDEX idx_decisions_tenant ON opsmemory.decisions(tenant_id);
CREATE INDEX idx_decisions_recorded ON opsmemory.decisions(recorded_at DESC);
CREATE INDEX idx_decisions_cve ON opsmemory.decisions(cve_id) WHERE cve_id IS NOT NULL;
CREATE INDEX idx_decisions_component ON opsmemory.decisions(component) WHERE component IS NOT NULL;
```

### Why Not pgvector?

The current implementation uses PostgreSQL arrays instead of pgvector:

1. **Simpler deployment**: No extension installation required
2. **Smaller dataset**: OpsMemory is per-org, not global
3. **Adequate performance**: Array operations are fast enough for <100K records
4. **Future option**: Can migrate to pgvector if needed

### Cosine Similarity in SQL

```sql
-- Cosine similarity between query vector and stored vectors
SELECT memory_id,
       (
         SELECT SUM(a * b)
         FROM UNNEST(similarity_vector, @query_vector) AS t(a, b)
       ) / (
         SQRT((SELECT SUM(a * a) FROM UNNEST(similarity_vector) AS t(a))) *
         SQRT((SELECT SUM(b * b) FROM UNNEST(@query_vector) AS t(b)))
       ) AS similarity
FROM opsmemory.decisions
WHERE tenant_id = @tenant_id
ORDER BY similarity DESC
LIMIT @top_k;
```

## API Design

### Endpoint Overview

| Method | Path | Description |
|--------|------|-------------|
| POST | `/api/v1/opsmemory/decisions` | Record a new decision |
| GET | `/api/v1/opsmemory/decisions/{id}` | Get decision details |
| POST | `/api/v1/opsmemory/decisions/{id}/outcome` | Record outcome |
| GET | `/api/v1/opsmemory/suggestions` | Get playbook suggestions |
| GET | `/api/v1/opsmemory/decisions` | Query past decisions |
| GET | `/api/v1/opsmemory/stats` | Get statistics |

### Request/Response DTOs

The API uses string-based DTOs that convert to/from internal enums:

```csharp
// API accepts strings
public record RecordDecisionRequest
{
    public required string Action { get; init; }  // "Remediate", "Accept", etc.
    public string? Reachability { get; init; }    // "reachable", "not-reachable"
}

// Internal uses enums
public enum DecisionAction { Accept, Remediate, Quarantine, ... }
public enum ReachabilityStatus { Unknown, Reachable, NotReachable, Potential }
```

## Testing Strategy

### Unit Tests (26 tests)

**SimilarityVectorGeneratorTests:**
- Vector dimension validation
- Feature encoding (severity, reachability, EPSS, CVSS, KEV)
- Component type classification
- Context tag encoding
- Vector normalization
- Cosine similarity computation
- Matching factor detection

**PlaybookSuggestionServiceTests:**
- Empty history handling
- Single record suggestions
- Multiple record ranking
- Confidence calculation
- Rationale generation
- Evidence linking

### Integration Tests (5 tests)

**PostgresOpsMemoryStoreTests:**
- Decision persistence and retrieval
- Outcome updates
- Tenant isolation
- Query filtering
- Statistics calculation

## Performance Considerations

### Indexing Strategy

- Primary key on `memory_id` for direct lookups
- Index on `tenant_id` for isolation
- Index on `recorded_at` for recent-first queries
- Partial indexes on `cve_id` and `component` for filtered queries

### Query Optimization

- Limit similarity search to last N days by default
- Return only top-K similar records
- Use cursor-based pagination for large result sets

### Caching

Currently no caching (records are infrequently accessed). Future options:
- Cache similarity vectors in memory
- Cache recent suggestions per tenant
- Use read replicas for heavy read loads

## Future Enhancements

### pgvector Migration

If dataset grows significantly:
1. Install pgvector extension
2. Add vector column with IVFFlat index
3. Replace array-based similarity with vector operations
4. ~100x speedup for large datasets

### ML-Based Suggestions

Replace rule-based confidence with ML model:
1. Train on historical decision-outcome pairs
2. Include more features (time of day, team, etc.)
3. Use gradient boosting or neural network
4. Continuous learning from new outcomes

### Outcome Prediction

Predict outcome before decision is made:
1. Use past outcomes as training data
2. Predict success probability per action
3. Show predicted outcomes in UI
4. Track prediction accuracy over time