save progress

This commit is contained in:
master
2026-01-09 18:27:36 +02:00
parent e608752924
commit a21d3dbc1f
361 changed files with 63068 additions and 1192 deletions

View File

@@ -0,0 +1,327 @@
# OpsMemory Module
> **Decision Ledger for Playbook Learning**
OpsMemory is a structured ledger of prior security decisions and their outcomes. It enables playbook learning - understanding which decisions led to good outcomes and surfacing institutional knowledge for similar situations.
## What OpsMemory Is
-**Decision + Outcome pairs**: Every security decision is recorded with its eventual outcome
-**Success/failure classification**: Learn what worked and what didn't
-**Similar situation matching**: Find past decisions in comparable scenarios
-**Playbook suggestions**: Surface recommendations based on historical success
## What OpsMemory Is NOT
- ❌ Chat history (that's conversation storage)
- ❌ Audit logs (that's the Timeline)
- ❌ VEX statements (that's Excititor)
## Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ OpsMemory Service │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌──────────────────┐ ┌───────────────┐ │
│ │ Decision │ │ Playbook │ │ Outcome │ │
│ │ Recording │ │ Suggestion │ │ Tracking │ │
│ └──────┬──────┘ └────────┬─────────┘ └───────┬───────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ IOpsMemoryStore │ │
│ │ (PostgreSQL with similarity vectors) │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
## Core Components
### OpsMemoryRecord
The core data structure capturing a decision and its context:
```json
{
"memoryId": "mem-abc123",
"tenantId": "tenant-xyz",
"recordedAt": "2026-01-07T12:00:00Z",
"situation": {
"cveId": "CVE-2023-44487",
"component": "pkg:npm/http2@1.0.0",
"severity": "high",
"reachability": "reachable",
"epssScore": 0.97,
"isKev": true,
"contextTags": ["production", "external-facing", "payment-service"]
},
"decision": {
"action": "Remediate",
"rationale": "KEV + reachable + payment service = immediate remediation",
"decidedBy": "security-team",
"decidedAt": "2026-01-07T12:00:00Z",
"policyReference": "policy/critical-kev.rego"
},
"outcome": {
"status": "Success",
"resolutionTime": "4:30:00",
"lessonsLearned": "Upgrade was smooth, no breaking changes",
"recordedAt": "2026-01-07T16:30:00Z"
}
}
```
### Decision Actions
| Action | Description |
|--------|-------------|
| `Accept` | Accept the risk (no action) |
| `Remediate` | Upgrade/patch the component |
| `Quarantine` | Isolate the component |
| `Mitigate` | Apply compensating controls (WAF, config) |
| `Defer` | Defer for later review |
| `Escalate` | Escalate to security team |
| `FalsePositive` | Mark as not applicable |
### Outcome Status
| Status | Description |
|--------|-------------|
| `Success` | Decision led to successful resolution |
| `PartialSuccess` | Decision led to partial resolution |
| `Ineffective` | Decision was ineffective |
| `NegativeOutcome` | Decision led to negative consequences |
| `Pending` | Outcome still pending |
## API Reference
### Record a Decision
```http
POST /api/v1/opsmemory/decisions
Content-Type: application/json
{
"tenantId": "tenant-xyz",
"cveId": "CVE-2023-44487",
"componentPurl": "pkg:npm/http2@1.0.0",
"severity": "high",
"reachability": "reachable",
"epssScore": 0.97,
"action": "Remediate",
"rationale": "KEV + reachable + payment service",
"decidedBy": "alice@example.com",
"contextTags": ["production", "payment-service"]
}
```
**Response:**
```json
{
"memoryId": "abc123def456",
"recordedAt": "2026-01-07T12:00:00Z"
}
```
### Record an Outcome
```http
POST /api/v1/opsmemory/decisions/{memoryId}/outcome?tenantId=tenant-xyz
Content-Type: application/json
{
"status": "Success",
"resolutionTimeMinutes": 270,
"lessonsLearned": "Upgrade was smooth, no breaking changes",
"recordedBy": "alice@example.com"
}
```
### Get Playbook Suggestions
```http
GET /api/v1/opsmemory/suggestions?tenantId=tenant-xyz&cveId=CVE-2024-1234&severity=high&reachability=reachable
```
**Response:**
```json
{
"suggestions": [
{
"suggestedAction": "Remediate",
"confidence": 0.87,
"rationale": "87% confidence based on 15 similar past decisions. Remediation succeeded in 93% of high-severity reachable vulnerabilities.",
"successRate": 0.93,
"similarDecisionCount": 15,
"averageResolutionTimeMinutes": 180,
"evidence": [
{
"memoryId": "abc123",
"similarity": 0.92,
"action": "Remediate",
"outcome": "Success",
"cveId": "CVE-2023-44487"
}
],
"matchingFactors": [
"Same severity: high",
"Same reachability: Reachable",
"Both are KEV",
"Shared context: production"
]
}
],
"analyzedRecords": 15,
"topSimilarity": 0.92
}
```
### Query Past Decisions
```http
GET /api/v1/opsmemory/decisions?tenantId=tenant-xyz&action=Remediate&pageSize=20
```
### Get Statistics
```http
GET /api/v1/opsmemory/stats?tenantId=tenant-xyz
```
**Response:**
```json
{
"tenantId": "tenant-xyz",
"totalDecisions": 1250,
"decisionsWithOutcomes": 980,
"successRate": 0.87
}
```
## Similarity Algorithm
OpsMemory uses a 50-dimensional vector to represent each security situation:
| Dimensions | Feature |
|------------|---------|
| 0-9 | CVE category (memory, injection, auth, crypto, dos, etc.) |
| 10-14 | Severity (none, low, medium, high, critical) |
| 15-18 | Reachability (unknown, reachable, not-reachable, potential) |
| 19-23 | EPSS band (0-0.2, 0.2-0.4, 0.4-0.6, 0.6-0.8, 0.8-1.0) |
| 24-28 | CVSS band (0-2, 2-4, 4-6, 6-8, 8-10) |
| 29 | KEV flag |
| 30-39 | Component type (npm, maven, pypi, nuget, go, cargo, etc.) |
| 40-49 | Context tags (production, external-facing, payment, etc.) |
Similarity is computed using **cosine similarity** between normalized vectors.
## Integration Points
### Decision Recording Hook
OpsMemory integrates with the Findings Ledger to automatically capture decisions:
```csharp
public class OpsMemoryHook : IDecisionHook
{
public async Task OnDecisionRecordedAsync(FindingDecision decision)
{
var record = new OpsMemoryRecord
{
TenantId = decision.TenantId,
Situation = ExtractSituation(decision),
Decision = ExtractDecision(decision)
};
// Fire-and-forget to not block the decision flow
_ = _store.RecordDecisionAsync(record);
}
}
```
### Outcome Tracking
The OutcomeTrackingService monitors for resolution events and prompts users:
1. **Auto-detect resolution**: When a finding is marked resolved
2. **Calculate resolution time**: Time from decision to resolution
3. **Prompt for classification**: Ask user about outcome quality
4. **Link to original decision**: Update the OpsMemory record
## Configuration
```yaml
opsmemory:
connectionString: "Host=localhost;Database=stellaops"
similarity:
minThreshold: 0.6 # Minimum similarity for suggestions
maxResults: 10 # Maximum similar records to analyze
suggestions:
maxSuggestions: 3 # Maximum suggestions to return
minConfidence: 0.5 # Minimum confidence threshold
outcomeTracking:
autoPromptDelay: 24h # Delay before prompting for outcome
reminderInterval: 7d # Reminder interval for pending outcomes
```
## Database Schema
```sql
CREATE SCHEMA IF NOT EXISTS opsmemory;
CREATE TABLE opsmemory.decisions (
memory_id TEXT PRIMARY KEY,
tenant_id TEXT NOT NULL,
recorded_at TIMESTAMPTZ NOT NULL,
-- Situation (JSONB for flexibility)
situation JSONB NOT NULL,
-- Decision (JSONB)
decision JSONB NOT NULL,
-- Outcome (nullable, updated later)
outcome JSONB,
-- Similarity vector (array for simple cosine similarity)
similarity_vector REAL[] NOT NULL
);
CREATE INDEX idx_decisions_tenant ON opsmemory.decisions(tenant_id);
CREATE INDEX idx_decisions_recorded ON opsmemory.decisions(recorded_at DESC);
CREATE INDEX idx_decisions_cve ON opsmemory.decisions((situation->>'cveId'));
```
## Best Practices
### Recording Decisions
1. **Include context tags**: The more context, the better similarity matching
2. **Document rationale**: Future users benefit from understanding why
3. **Reference policies**: Link to the policy that guided the decision
### Recording Outcomes
1. **Be timely**: Record outcomes as soon as resolution is confirmed
2. **Be honest**: Failed decisions are valuable learning data
3. **Add lessons learned**: Help future users avoid pitfalls
### Using Suggestions
1. **Review evidence**: Look at the similar past decisions
2. **Check matching factors**: Ensure the situations are truly comparable
3. **Trust but verify**: Suggestions are guidance, not mandates
## Related Modules
- [Findings Ledger](../findings-ledger/README.md) - Source of decision events
- [Timeline](../timeline-indexer/README.md) - Audit trail
- [Excititor](../excititor/README.md) - VEX statement management
- [Risk Engine](../risk-engine/README.md) - Risk scoring

View File

@@ -0,0 +1,393 @@
# OpsMemory Architecture
> **Technical deep-dive into the Decision Ledger**
## Overview
OpsMemory provides a structured approach to organizational learning from security decisions. It captures the complete lifecycle of a decision - from the situation context through the action taken to the eventual outcome.
## Design Principles
### 1. Determinism First
All operations produce deterministic, reproducible results:
- Similarity vectors are computed from stable inputs
- Confidence scores use fixed formulas
- No randomness in suggestion ranking
### 2. Multi-Tenant Isolation
Every operation is scoped to a tenant:
- Records cannot be accessed across tenants
- Similarity search is tenant-isolated
- Statistics are per-tenant
### 3. Fire-and-Forget Integration
Decision recording is async and non-blocking:
- UI decisions complete immediately
- OpsMemory recording happens in background
- Failures don't affect the primary flow
### 4. Offline Capable
All features work without network access:
- Local PostgreSQL storage
- No external API dependencies
- Self-contained similarity computation
## Component Architecture
```
┌────────────────────────────────────────────────────────────────────┐
│ WebService Layer │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ OpsMemoryEndpoints │ │
│ │ POST /decisions GET /decisions GET /suggestions GET /stats│ │
│ └──────────────────────────────────────────────────────────────┘ │
└────────────────────────────────┬───────────────────────────────────┘
┌────────────────────────────────┼───────────────────────────────────┐
│ Service Layer │
│ ┌─────────────────┐ ┌─────────────────┐ ┌────────────────────┐ │
│ │ PlaybookSuggest │ │ OutcomeTracking │ │ SimilarityVector │ │
│ │ Service │ │ Service │ │ Generator │ │
│ └────────┬────────┘ └────────┬────────┘ └─────────┬──────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ IOpsMemoryStore │ │
│ └──────────────────────────────────────────────────────────────┘ │
└────────────────────────────────┬───────────────────────────────────┘
┌────────────────────────────────┼───────────────────────────────────┐
│ Storage Layer │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ PostgresOpsMemoryStore │ │
│ │ - Decision CRUD │ │
│ │ - Outcome updates │ │
│ │ - Similarity search (array-based cosine) │ │
│ │ - Query with pagination │ │
│ │ - Statistics aggregation │ │
│ └──────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────┘
```
## Data Model
### OpsMemoryRecord
The core aggregate containing all decision information:
```csharp
public sealed record OpsMemoryRecord
{
public required string MemoryId { get; init; }
public required string TenantId { get; init; }
public required DateTimeOffset RecordedAt { get; init; }
public required SituationContext Situation { get; init; }
public required DecisionRecord Decision { get; init; }
public OutcomeRecord? Outcome { get; init; }
public ImmutableArray<float> SimilarityVector { get; init; }
}
```
### SituationContext
Captures the security context at decision time:
```csharp
public sealed record SituationContext
{
public string? CveId { get; init; }
public string? Component { get; init; } // PURL
public string? Severity { get; init; } // low/medium/high/critical
public ReachabilityStatus Reachability { get; init; }
public double? EpssScore { get; init; } // 0-1
public double? CvssScore { get; init; } // 0-10
public bool IsKev { get; init; }
public ImmutableArray<string> ContextTags { get; init; }
}
```
### DecisionRecord
The action taken and why:
```csharp
public sealed record DecisionRecord
{
public required DecisionAction Action { get; init; }
public required string Rationale { get; init; }
public required string DecidedBy { get; init; }
public required DateTimeOffset DecidedAt { get; init; }
public string? PolicyReference { get; init; }
public MitigationDetails? Mitigation { get; init; }
}
```
### OutcomeRecord
The result of the decision:
```csharp
public sealed record OutcomeRecord
{
public required OutcomeStatus Status { get; init; }
public TimeSpan? ResolutionTime { get; init; }
public string? ActualImpact { get; init; }
public string? LessonsLearned { get; init; }
public required string RecordedBy { get; init; }
public required DateTimeOffset RecordedAt { get; init; }
}
```
## Similarity Algorithm
### Vector Generation
The `SimilarityVectorGenerator` creates 50-dimensional feature vectors:
```
Vector Layout:
[0-9] : CVE category one-hot (memory, injection, auth, crypto, dos,
info-disclosure, privilege-escalation, xss, path-traversal, other)
[10-14] : Severity one-hot (none, low, medium, high, critical)
[15-18] : Reachability one-hot (unknown, reachable, not-reachable, potential)
[19-23] : EPSS band one-hot (0-0.2, 0.2-0.4, 0.4-0.6, 0.6-0.8, 0.8-1.0)
[24-28] : CVSS band one-hot (0-2, 2-4, 4-6, 6-8, 8-10)
[29] : KEV flag (0 or 1)
[30-39] : Component type one-hot (npm, maven, pypi, nuget, go, cargo,
deb, rpm, apk, other)
[40-49] : Context tag presence (production, development, staging,
external-facing, internal, payment, auth, data, api, frontend)
```
### Cosine Similarity
Similarity between vectors A and B:
```
similarity = (A · B) / (||A|| × ||B||)
```
Where `A · B` is the dot product and `||A||` is the L2 norm.
### CVE Classification
CVEs are classified by analyzing keywords in the CVE ID and description:
| Category | Keywords |
|----------|----------|
| memory | buffer, overflow, heap, stack, use-after-free |
| injection | sql, command, code injection, ldap |
| auth | authentication, authorization, bypass |
| crypto | cryptographic, encryption, key |
| dos | denial of service, resource exhaustion |
| info-disclosure | information disclosure, leak |
| privilege-escalation | privilege escalation, elevation |
| xss | cross-site scripting, xss |
| path-traversal | path traversal, directory traversal |
## Playbook Suggestion Algorithm
### Confidence Calculation
```csharp
confidence = baseSimilarity
× successRateBonus
× recencyBonus
× evidenceCountBonus
```
Where:
- `baseSimilarity`: Highest similarity score from matching records
- `successRateBonus`: `1 + (successRate - 0.5) * 0.5` (rewards high success rate)
- `recencyBonus`: More recent decisions weighted higher
- `evidenceCountBonus`: More evidence = higher confidence
### Suggestion Ranking
1. Group past decisions by action taken
2. For each action, calculate:
- Average similarity of records with that action
- Success rate for that action
- Number of similar decisions
3. Compute confidence score
4. Rank by confidence descending
5. Return top N suggestions
### Rationale Generation
Rationales are generated programmatically:
```
"{confidence}% confidence based on {count} similar past decisions.
{action} succeeded in {successRate}% of {factors}."
```
## Storage Design
### PostgreSQL Schema
```sql
CREATE TABLE opsmemory.decisions (
memory_id TEXT PRIMARY KEY,
tenant_id TEXT NOT NULL,
recorded_at TIMESTAMPTZ NOT NULL,
-- Denormalized situation fields for indexing
cve_id TEXT,
component TEXT,
severity TEXT,
-- Full data as JSONB
situation JSONB NOT NULL,
decision JSONB NOT NULL,
outcome JSONB,
-- Similarity vector as array (not pgvector)
similarity_vector REAL[] NOT NULL
);
-- Indexes
CREATE INDEX idx_decisions_tenant ON opsmemory.decisions(tenant_id);
CREATE INDEX idx_decisions_recorded ON opsmemory.decisions(recorded_at DESC);
CREATE INDEX idx_decisions_cve ON opsmemory.decisions(cve_id) WHERE cve_id IS NOT NULL;
CREATE INDEX idx_decisions_component ON opsmemory.decisions(component) WHERE component IS NOT NULL;
```
### Why Not pgvector?
The current implementation uses PostgreSQL arrays instead of pgvector:
1. **Simpler deployment**: No extension installation required
2. **Smaller dataset**: OpsMemory is per-org, not global
3. **Adequate performance**: Array operations are fast enough for <100K records
4. **Future option**: Can migrate to pgvector if needed
### Cosine Similarity in SQL
```sql
-- Cosine similarity between query vector and stored vectors
SELECT memory_id,
(
SELECT SUM(a * b)
FROM UNNEST(similarity_vector, @query_vector) AS t(a, b)
) / (
SQRT((SELECT SUM(a * a) FROM UNNEST(similarity_vector) AS t(a))) *
SQRT((SELECT SUM(b * b) FROM UNNEST(@query_vector) AS t(b)))
) AS similarity
FROM opsmemory.decisions
WHERE tenant_id = @tenant_id
ORDER BY similarity DESC
LIMIT @top_k;
```
## API Design
### Endpoint Overview
| Method | Path | Description |
|--------|------|-------------|
| POST | `/api/v1/opsmemory/decisions` | Record a new decision |
| GET | `/api/v1/opsmemory/decisions/{id}` | Get decision details |
| POST | `/api/v1/opsmemory/decisions/{id}/outcome` | Record outcome |
| GET | `/api/v1/opsmemory/suggestions` | Get playbook suggestions |
| GET | `/api/v1/opsmemory/decisions` | Query past decisions |
| GET | `/api/v1/opsmemory/stats` | Get statistics |
### Request/Response DTOs
The API uses string-based DTOs that convert to/from internal enums:
```csharp
// API accepts strings
public record RecordDecisionRequest
{
public required string Action { get; init; } // "Remediate", "Accept", etc.
public string? Reachability { get; init; } // "reachable", "not-reachable"
}
// Internal uses enums
public enum DecisionAction { Accept, Remediate, Quarantine, ... }
public enum ReachabilityStatus { Unknown, Reachable, NotReachable, Potential }
```
## Testing Strategy
### Unit Tests (26 tests)
**SimilarityVectorGeneratorTests:**
- Vector dimension validation
- Feature encoding (severity, reachability, EPSS, CVSS, KEV)
- Component type classification
- Context tag encoding
- Vector normalization
- Cosine similarity computation
- Matching factor detection
**PlaybookSuggestionServiceTests:**
- Empty history handling
- Single record suggestions
- Multiple record ranking
- Confidence calculation
- Rationale generation
- Evidence linking
### Integration Tests (5 tests)
**PostgresOpsMemoryStoreTests:**
- Decision persistence and retrieval
- Outcome updates
- Tenant isolation
- Query filtering
- Statistics calculation
## Performance Considerations
### Indexing Strategy
- Primary key on `memory_id` for direct lookups
- Index on `tenant_id` for isolation
- Index on `recorded_at` for recent-first queries
- Partial indexes on `cve_id` and `component` for filtered queries
### Query Optimization
- Limit similarity search to last N days by default
- Return only top-K similar records
- Use cursor-based pagination for large result sets
### Caching
Currently no caching (records are infrequently accessed). Future options:
- Cache similarity vectors in memory
- Cache recent suggestions per tenant
- Use read replicas for heavy read loads
## Future Enhancements
### pgvector Migration
If dataset grows significantly:
1. Install pgvector extension
2. Add vector column with IVFFlat index
3. Replace array-based similarity with vector operations
4. ~100x speedup for large datasets
### ML-Based Suggestions
Replace rule-based confidence with ML model:
1. Train on historical decision-outcome pairs
2. Include more features (time of day, team, etc.)
3. Use gradient boosting or neural network
4. Continuous learning from new outcomes
### Outcome Prediction
Predict outcome before decision is made:
1. Use past outcomes as training data
2. Predict success probability per action
3. Show predicted outcomes in UI
4. Track prediction accuracy over time