Here’s a quick, plain‑English idea you can use right away: **not all code diffs are equal**—some actually change what’s *reachable* at runtime (and thus security posture), while others just refactor internals. A “**Smart‑Diff**” pipeline flags only the diffs that open or close attack paths by combining (1) call‑stack traces, (2) dependency graphs, and (3) dataflow.

---

### Why this matters (background)

* Text diffs ≠ behavior diffs. A rename or refactor can look big in Git but do nothing to reachable flows from external entry points (HTTP, gRPC, CLI, message consumers).
* Security triage gets noisy because scanners attach CVEs to all present packages, not to the code paths you can actually hit.
* **Dataflow‑aware diffs** shrink noise and make VEX generation honest: “vuln present but **not exploitable** because the sink is unreachable from any policy‑defined entrypoint.”

---

### Minimal architecture (fits Stella Ops)

1. **Entrypoint map** (per service): controllers, handlers, consumers.
2. **Call graph + dataflow** (per commit): Roslyn for C#, `golang.org/x/tools/go/callgraph` for Go, plus taint rules (source→sink).
3. **Reachability cache** keyed by (commit, entrypoint, package@version).
4. **Smart‑Diff** = `reachable_paths(commit_B) – reachable_paths(commit_A)`.

   * If a path to a sensitive sink is newly reachable → **High**.
   * If a path disappears → auto‑generate **VEX “not affected (no reachable path)”**.

---

### Tiny working seeds

**C# (.NET 10) — Roslyn skeleton to diff call‑reachability**

```csharp
// SmartDiff.csproj targets net10.0
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.FindSymbols;

public static class SmartDiff
{
    public static async Task<HashSet<string>> ReachableSinks(string solutionPath, string[] entrypoints, string[] sinks)
    {
        var workspace = MSBuild.MSBuildWorkspace.Create();
        var solution = await workspace.OpenSolutionAsync(solutionPath);
        var index = new HashSet<string>();

        foreach (var proj in solution.Projects)
        {
            var comp = await proj.GetCompilationAsync();
            if (comp is null) continue;

            // Resolve entrypoints & sinks by symbol name
            var epSymbols = comp.GlobalNamespace.GetMembers().SelectMany(Descend)
                .OfType<IMethodSymbol>().Where(m => entrypoints.Contains(m.ToDisplayString())).ToList();
            var sinkSymbols = comp.GlobalNamespace.GetMembers().SelectMany(Descend)
                .OfType<IMethodSymbol>().Where(m => sinks.Contains(m.ToDisplayString())).ToList();

            foreach (var ep in epSymbols)
            foreach (var sink in sinkSymbols)
            {
                // Heuristic reachability: cheap path search via SymbolFinder
                var refs = await SymbolFinder.FindReferencesAsync(sink, solution);
                if (refs.SelectMany(r => r.Locations).Any()) // replace with real graph walk
                    index.Add($"{ep.ToDisplayString()} -> {sink.ToDisplayString()}");
            }
        }
        return index;

        static IEnumerable<ISymbol> Descend(INamespaceOrTypeSymbol sym)
        {
            foreach (var m in sym.GetMembers())
            {
                yield return m;
                if (m is INamespaceOrTypeSymbol nt) foreach (var x in Descend(nt)) yield return x;
            }
        }
    }
}
```

**Go — SSA & callgraph seed**

```go
// go.mod: require golang.org/x/tools latest
package main

import (
	"fmt"
	"golang.org/x/tools/go/callgraph/cha"
	"golang.org/x/tools/go/packages"
	"golang.org/x/tools/go/ssa"
)

func main() {
	cfg := &packages.Config{Mode: packages.LoadAllSyntax, Tests: false}
	pkgs, _ := packages.Load(cfg, "./...")
	prog, pkgsSSA := ssa.NewProgram(pkgs[0].Fset, ssa.BuilderMode(0))
	for _, p := range pkgsSSA { prog.CreatePackage(p, p.Syntax, p.TypesInfo, true) }
	prog.Build()

	cg := cha.CallGraph(prog)
	// TODO: map entrypoints & sinks, then walk cg from EPs to sinks
	fmt.Println("nodes:", len(cg.Nodes))
}
```

---

### How to use it in your pipeline (fast win)

* **Pre‑merge job**:

  1. Build call graph for `HEAD` and `HEAD^`.
  2. Compute Smart‑Diff.
  3. If any *new* EP→sink path appears, fail with a short, proof‑linked note:
     “New reachable path: `POST /Invoices -> PdfExporter.Save(string path)` (writes outside sandbox).”
* **Post‑scan VEX**:

  * For each CVE on a package, mark **Affected** only if any EP can reach a symbol that uses that package’s vulnerable surface.

---

### Evidence to show in the UI

* “**Path card**”: EP → … → Sink, with file:line hop‑list and commit hash.
* “**What changed**”: before/after path diff (green removed, red added).
* “**Why it matters**”: sink classification (network write, file write, deserialization, SQL, crypto).

---

### Developer checklist (Stella Ops style)

* [ ] Define entrypoints per service (attribute or YAML).
* [ ] Define sink taxonomy (FS, NET, DESER, SQL, CRYPTO).
* [ ] Implement language adapters: `.NET (Roslyn)`, `Go (SSA)`, later `Java (Soot/WALA)`.
* [ ] Add a **ReachabilityCache** (Postgres table keyed by commit+lang+service).
* [ ] Wire a `SmartDiffJob` in CI; emit SARIF + CycloneDX `vulnerability-assertions` extension or OpenVEX.
* [ ] Gate merges on **newly‑reachable sensitive sinks**; auto‑VEX when paths disappear.

If you want, I can turn this into a small repo scaffold (Roslyn + Go adapters, Postgres schema, a GitLab/GitHub pipeline, and a minimal UI “path card”).
Below is a concrete **development implementation plan** to take the “Smart‑Diff” idea (reachability + dataflow + dependency/vuln context) into a shippable product integrated into your pipeline (Stella Ops style). I’ll assume the initial languages are **.NET (C#)** and **Go**, and the initial goal is **PR gating + VEX automation** with strong evidence (paths + file/line hops).

---

## 1) Product definition

### Problem you’re solving

Security noise comes from:

* “Vuln exists in dependency” ≠ “vuln exploitable from any entrypoint”
* Git diffs look big even when behavior is unchanged
* Teams struggle to triage “is this change actually risky?”

### What Smart‑Diff should do (core behavior)

Given **base commit A** and **head commit B**:

1. Identify **entrypoints** (web handlers, RPC methods, message consumers, CLI commands).
2. Identify **sinks** (file write, command exec, SQL, SSRF, deserialization, crypto misuse, templating, etc.).
3. Compute **reachable paths** from entrypoints → sinks (call graph + dataflow/taint).
4. Emit **Smart‑Diff**:

   * **Newly reachable** EP→sink paths (risk ↑)
   * **Removed** EP→sink paths (risk ↓)
   * **Changed** paths (same sink but different sanitization/guards)
5. Attach **dependency vulnerability context**:

   * If a vulnerable API surface is reachable (or data reaches it), mark “affected/exploitable”
   * Otherwise generate **VEX**: “not affected” / “not exploitable” with evidence

### MVP definition (minimum shippable)

A PR check that:

* Flags **new** reachable paths to a small set of high‑risk sinks (e.g., command exec, unsafe deserialization, filesystem write, SSRF/network dial, raw SQL).
* Produces:

  * SARIF report (for code scanning UI)
  * JSON artifact containing proof paths (EP → … → sink with file:line)
  * Optional VEX statement for dependency vulnerabilities (if you already have an SCA feed)

---

## 2) Architecture you can actually build

### High‑level components

1. **Policy & Taxonomy Service**

   * Defines entrypoints, sources, sinks, sanitizers, confidence rules
   * Versioned and centrally managed (but supports repo overrides)

2. **Analyzer Workers (language adapters)**

   * .NET analyzer (Roslyn + control flow)
   * Go analyzer (SSA + callgraph)
   * Outputs standardized IR (Intermediate Representation)

3. **Graph Store + Reachability Engine**

   * Stores symbol nodes + call edges + dataflow edges
   * Computes reachable sinks per entrypoint
   * Computes diff between commits A and B

4. **Vulnerability Mapper + VEX Generator**

   * Maps vulnerable packages/functions → “surfaces”
   * Joins with reachability results
   * Emits OpenVEX (or CycloneDX VEX) with evidence links

5. **CI/PR Integrations**

   * CLI that runs in CI
   * Optional server mode (cache + incremental processing)

6. **UI/API**

   * Path cards: “what changed”, “why it matters”, “proof”
   * Filters by sink class, confidence, service, entrypoint

### Data contracts (standardized IR)

Make every analyzer output the same shapes so the rest of the pipeline is language‑agnostic:

* **Symbols**

  * `symbol_id`: stable hash of (lang, module, fully-qualified name, signature)
  * metadata: file, line ranges, kind (method/function), accessibility

* **Edges**

  * Call edge: `caller_symbol_id -> callee_symbol_id`
  * Dataflow edge: `source_symbol_id -> sink_symbol_id` with variable/parameter traces
  * Edge metadata: type, confidence, reason (static, reflection guess, interface dispatch, etc.)

* **Entrypoints / Sources / Sinks**

  * entrypoint: (symbol_id, route/topic/command metadata)
  * sink: (symbol_id, sink_type, severity, cwe mapping optional)

* **Paths**

  * `entrypoint -> ... -> sink`
  * hop list: symbol_id + file:line, plus “dataflow step evidence” when relevant

---

## 3) Workstreams and deliverables

### Workstream A — Policy, taxonomy, configuration

**Deliverables**

* `smartdiff.policy.yaml` schema and validator
* A default sink taxonomy:

  * `CMD_EXEC`, `UNSAFE_DESER`, `SQL_RAW`, `SSRF`, `FILE_WRITE`, `PATH_TRAVERSAL`, `TEMPLATE_INJECTION`, `CRYPTO_WEAK`, `AUTHZ_BYPASS` (expand later)
* Initial sanitizer patterns:

  * For example: parameter validation, safe deserialization wrappers, ORM parameterized APIs, path normalization, allowlists

**Implementation notes**

* Start strict and small: 10–20 sinks, 10 sources, 10 sanitizers.
* Provide repo-level overrides:

  * `smartdiff.policy.yaml` in repo root
  * Central policies referenced by version tag

**Acceptance criteria**

* A service can onboard by configuring:

  * entrypoint discovery mode (auto + manual)
  * sink classes to enforce
  * severity threshold to fail PR

---

### Workstream B — .NET analyzer (Roslyn)

**Deliverables**

* Build pipeline that produces:

  * call graph (methods and invocations)
  * basic control-flow guards for reachability (optional for MVP)
  * taint propagation for common patterns (MVP: parameter → sink)
* Entry point discovery for:

  * ASP.NET controllers (`[HttpGet]`, `[HttpPost]`)
  * Minimal APIs (`MapGet/MapPost`)
  * gRPC service methods
  * message consumers (configurable attributes/interfaces)

**Implementation notes (practical path)**

* MVP static callgraph:

  * Use Roslyn semantic model to resolve invocation targets
  * For virtual/interface calls: conservative resolution to possible implementations within the compilation
* MVP taint:

  * “Sources”: request params/body, headers, query string, message payloads
  * “Sinks”: wrappers around `Process.Start`, `SqlCommand`, `File.WriteAllText`, `HttpClient.Send`, deserializers, etc.
  * Propagate taint across:

    * parameter → local → argument
    * return values
    * simple assignments and concatenations (heuristic)
* Confidence scoring:

  * Direct static call resolution: high
  * Reflection/dynamic: low (flag separately)

**Acceptance criteria**

* On a demo ASP.NET service, if a PR adds:

  * `HttpPost /upload` → `File.WriteAllBytes(userPath, ...)`
    Smart‑Diff flags **new EP→FILE_WRITE path** and shows hops with file/line.

---

### Workstream C — Go analyzer (SSA)

**Deliverables**

* SSA build + callgraph extraction
* Entrypoint discovery for:

  * `net/http` handlers
  * common routers (Gin/Echo/Chi) via adapter rules
  * gRPC methods
  * consumers (Kafka/NATS/etc.) by config

**Implementation notes**

* Use `golang.org/x/tools/go/packages` + `ssa` build
* Callgraph:

  * start with CHA (Class Hierarchy Analysis) for speed
  * later add pointer analysis for precision on interfaces
* Taint:

  * sources: `http.Request`, router params, message payloads
  * sinks: `os/exec`, `database/sql` raw query, file I/O, `net/http` outbound, unsafe deserialization libs

**Acceptance criteria**

* A PR that adds `exec.Command(req.FormValue("cmd"))` becomes a **new EP→CMD_EXEC** finding.

---

### Workstream D — Graph store + reachability computation

**Deliverables**

* Schema in Postgres (recommended first) for:

  * commits, services, languages
  * symbols, edges, entrypoints, sinks
  * computed reachable “facts” (entrypoint→sink with shortest path(s))
* Reachability engine:

  * BFS/DFS per entrypoint with early cutoffs
  * path reconstruction storage (store predecessor map or store k-shortest paths)

**Implementation notes**

* Don’t start with a graph DB unless you must.
* Use Postgres tables + indexes:

  * `edges(from_symbol, to_symbol, commit_id, kind)`
  * `symbols(symbol_id, lang, module, fqn, file, line_start, line_end)`
  * `reachability(entrypoint_id, sink_id, commit_id, path_hash, confidence, severity, evidence_json)`
* Cache:

  * keyed by (commit, policy_version, analyzer_version)
  * avoids recompute on re-runs

**Acceptance criteria**

* For any analyzed commit, you can answer:

  * “Which sinks are reachable from these entrypoints?”
  * “Show me one proof path per (entrypoint, sink_type).”

---

### Workstream E — Smart‑Diff engine (the “diff” part)

**Deliverables**

* Diff algorithm producing three buckets:

  * `added_paths`, `removed_paths`, `changed_paths`
* “Changed” means:

  * same entrypoint + sink type, but path differs OR taint/sanitization differs OR confidence changes

**Implementation notes**

* Identify a path by a stable fingerprint:

  * `path_id = hash(entrypoint_symbol + sink_symbol + sink_type + policy_version + analyzer_version)`
* Store:

  * top-k paths for each pair for evidence (k=1 for MVP, add more later)
* Severity gating rules:

  * Example:

    * New path to `CMD_EXEC` = fail
    * New path to `FILE_WRITE` = warn unless under `/tmp` allowlist
    * New path to `SQL_RAW` = fail unless parameterized sanitizer present

**Acceptance criteria**

* Given commits A and B:

  * If B introduces a new reachable sink, CI fails with a single actionable card:

    * **EP**: route / handler
    * **Sink**: type + symbol
    * **Proof**: hop list
    * **Why**: policy rule triggered

---

### Workstream F — Vulnerability mapping + VEX

**Deliverables**

* Ingest dependency inventory (SBOM or lockfiles)
* Map vulnerabilities to “surfaces”

  * package → vulnerable module/function patterns
  * minimal version/range matching (from your existing vuln feed)
* Decision logic:

  * **Affected** if any reachable path intersects vulnerable surface OR dataflow reaches vulnerable sink
  * else **Not affected / Not exploitable** with justification

**Implementation notes**

* Start with a pragmatic approach:

  * package‑level reachability: “is any symbol in that package reachable?”
  * then iterate toward function‑level surfaces
* VEX output:

  * include commit hash, policy version, evidence paths
  * embed links to internal “path card” URLs if available

**Acceptance criteria**

* For a known vulnerable dependency, the system emits:

  * VEX “not affected” if package code is never reached from any entrypoint, with proof references.

---

### Workstream G — CI integration + developer UX

**Deliverables**

* A single CLI:

  * `smartdiff analyze --commit <sha> --service <svc> --lang <dotnet|go>`
  * `smartdiff diff --base <shaA> --head <shaB> --out sarif`
* CI templates for:

  * GitHub Actions / GitLab CI
* Outputs:

  * SARIF
  * JSON evidence bundle
  * optional OpenVEX file

**Acceptance criteria**

* Teams can enable Smart‑Diff by adding:

  * CI job + config file
  * no additional infra required for MVP (local artifacts mode)
* When infra is available, enable server caching mode for speed.

---

### Workstream H — UI “Path Cards”

**Deliverables**

* UI components:

  * Path card list with filters (sink type, severity, confidence)
  * “What changed” diff view:

    * red = added hops
    * green = removed hops
  * “Evidence” panel:

    * file:line for each hop
    * code snippets (optional)
* APIs:

  * `GET /smartdiff/{repo}/{pr}/findings`
  * `GET /smartdiff/{repo}/{commit}/path/{path_id}`

**Acceptance criteria**

* A developer can click one finding and understand:

  * how the data got there
  * exactly what line introduced the risk
  * how to fix (sanitize/guard/allowlist)

---

## 4) Milestone plan (sequenced, no time promises)

### Milestone 0 — Foundation

* Repo scaffolding:

  * `smartdiff-cli/`
  * `analyzers/dotnet/`
  * `analyzers/go/`
  * `core-ir/` (schemas + validation)
  * `server/` (optional; can come later)
* Define IR JSON schema + versioning rules
* Implement policy YAML + validator + sample policies
* Implement “local mode” artifact output

**Exit criteria**

* You can run `smartdiff analyze` and get a valid IR file for at least one trivial repo.

---

### Milestone 1 — Callgraph reachability MVP

* .NET: build call edges + entrypoint discovery (basic)
* Go: build call edges + entrypoint discovery (basic)
* Graph store: in-memory or local sqlite/postgres
* Compute reachable sinks (callgraph only, no taint)

**Exit criteria**

* On a demo repo, you can list:

  * entrypoints
  * reachable sinks (callgraph reachability only)
  * a proof path (hop list)

---

### Milestone 2 — Smart‑Diff MVP (PR gating)

* Compute diff between base/head reachable sink sets
* Produce SARIF with:

  * rule id = sink type
  * message includes entrypoint + sink + link to evidence JSON
* CI templates + documentation

**Exit criteria**

* In PR checks, the job fails on new EP→sink paths and links to a proof.

---

### Milestone 3 — Taint/dataflow MVP (high-value sinks only)

* Add taint propagation to reduce false positives:

  * differentiate “sink reachable” vs “untrusted data reaches sink”
* Add sanitizer recognition
* Add confidence scoring + suppression mechanisms (policy allowlists)

**Exit criteria**

* A sink is only “high severity” if it is both reachable and tainted (or policy says otherwise).

---

### Milestone 4 — VEX integration MVP

* Join reachability with dependency vulnerabilities
* Emit OpenVEX (and/or CycloneDX VEX)
* Store evidence references (paths) inside VEX justification

**Exit criteria**

* For a repo with a vulnerable dependency, you can automatically produce:

  * affected/not affected with evidence.

---

### Milestone 5 — Scale and precision improvements

* Incremental analysis (only analyze changed projects/packages)
* Better dynamic dispatch handling (Go pointer analysis, .NET interface dispatch expansion)
* Optional runtime telemetry integration:

  * import production traces to prioritize “actually observed” entrypoints

**Exit criteria**

* Works on large services with acceptable run time and stable noise levels.

---

## 5) Backlog you can paste into Jira (epics + key stories)

### Epic: Policy & taxonomy

* Story: Define `smartdiff.policy.yaml` schema and validator
  **AC:** invalid configs fail with clear errors; configs are versioned.
* Story: Provide default sink list and severities
  **AC:** at least 10 sink rules with test cases.

### Epic: .NET analyzer

* Story: Resolve method invocations to symbols (Roslyn)
  **AC:** correct targets for direct calls; conservative handling for virtual calls.
* Story: Discover ASP.NET routes and bind to entrypoint symbols
  **AC:** entrypoints include route/method metadata.

### Epic: Go analyzer

* Story: SSA build and callgraph extraction
  **AC:** function nodes and edges generated for a multi-package repo.
* Story: net/http entrypoint discovery
  **AC:** handler functions recognized as entrypoints with path labels.

### Epic: Reachability engine

* Story: Compute reachable sinks per entrypoint
  **AC:** store at least one path with hop list.
* Story: Smart‑Diff A vs B
  **AC:** added/removed paths computed deterministically.

### Epic: CI/SARIF

* Story: Emit SARIF results
  **AC:** findings appear in code scanning UI; include file/line.

### Epic: Taint analysis

* Story: Propagate taint from request to sink for 3 sink classes
  **AC:** produces “tainted” evidence with a variable/argument trace.
* Story: Sanitizer recognition
  **AC:** path marked “sanitized” and downgraded per policy.

### Epic: VEX

* Story: Generate OpenVEX statements from reachability + vuln feed
  **AC:** for “not affected” includes justification and evidence references.

---

## 6) Key engineering decisions (recommended defaults)

### Storage

* Start with **Postgres** (or even local sqlite for MVP) for simplicity.
* Introduce a graph DB only if:

  * you need very large multi-commit graph queries at low latency
  * Postgres performance becomes a hard blocker

### Confidence model

Every edge/path should carry:

* `confidence`: High/Med/Low
* `reasons`: e.g., `DirectCall`, `InterfaceDispatch`, `ReflectionGuess`, `RouterHeuristic`
  This lets you:
* gate only on high-confidence paths in early rollout
* keep low-confidence as “informational”

### Suppression model

* Local suppressions:

  * `smartdiff.suppress.yaml` with rule id + symbol id + reason + expiry
* Policy allowlists:

  * allow file writes only under certain directories
  * allow outbound network only to configured domains

---

## 7) Testing strategy (to avoid “cool demo, unusable tool”)

### Unit tests

* Symbol hashing stability tests
* Call resolution tests:

  * overloads, generics, interfaces, lambdas
* Policy parsing/validation tests

### Integration tests (must-have)

* Golden repos in `testdata/`:

  * one ASP.NET minimal API
  * one MVC controller app
  * one Go net/http + one Gin app
* Golden outputs:

  * expected entrypoints
  * expected reachable sinks
  * expected diff between commits

### Regression tests

* A curated corpus of “known issues”:

  * false positives you fixed should never return
  * false negatives: ensure known risky path is always found

### Performance tests

* Measure:

  * analysis time per 50k LOC
  * memory peak
  * graph size
* Budget enforcement:

  * if over budget, degrade gracefully (lower precision, mark low confidence)

---

## 8) Example configs and outputs (to make onboarding easy)

### Example policy YAML (minimal)

```yaml
version: 1
service: invoices-api
entrypoints:
  autodiscover:
    dotnet:
      aspnet: true
    go:
      net_http: true

sinks:
  - type: CMD_EXEC
    severity: high
    match:
      dotnet:
        symbols:
          - "System.Diagnostics.Process.Start(string)"
      go:
        symbols:
          - "os/exec.Command"
  - type: FILE_WRITE
    severity: medium
    match:
      dotnet:
        namespaces: ["System.IO"]
      go:
        symbols: ["os.WriteFile"]

gating:
  fail_on:
    - sink_type: CMD_EXEC
      when: "added && confidence >= medium"
    - sink_type: FILE_WRITE
      when: "added && tainted && confidence >= medium"
```

### Evidence JSON shape (what the UI consumes)

```json
{
  "commit": "abc123",
  "entrypoint": {"symbol": "InvoicesController.Upload()", "route": "POST /upload"},
  "sink": {"type": "FILE_WRITE", "symbol": "System.IO.File.WriteAllBytes"},
  "confidence": "high",
  "tainted": true,
  "path": [
    {"symbol": "InvoicesController.Upload()", "file": "Controllers/InvoicesController.cs", "line": 42},
    {"symbol": "UploadService.Save()", "file": "Services/UploadService.cs", "line": 18},
    {"symbol": "System.IO.File.WriteAllBytes", "file": null, "line": null}
  ]
}
```

---

## 9) Risks and mitigations (explicit)

1. **Dynamic behavior (reflection, DI, router magic)**

   * Mitigation: conservative fallbacks + confidence labels + optional runtime traces later

2. **Noise from huge callgraphs**

   * Mitigation: sink-first slicing (compute reachability backwards from sinks), entrypoint scoping, k‑shortest paths only

3. **Large repo build failures**

   * Mitigation: analyzer runs inside build containers; allow partial analysis with explicit “incomplete” result flag

4. **Teams rejecting gating**

   * Mitigation: staged rollout:

     * Observe-only mode → warn-only → fail-only for high-confidence CMD_EXEC/UNSAFE_DESER

---

## 10) Definition of done (what “implemented” means)

You should consider Smart‑Diff “implemented” when:

* A repo can enable it with one config + one CI job.
* PRs get:

  * a small number of **actionable** findings (not hundreds)
  * each finding has a proof path with file/line hops
* It reliably detects at least:

  * new command execution paths
  * new unsafe deserialization paths
  * new tainted filesystem write paths
* It can optionally emit VEX decisions backed by reachability evidence.

---

If you want the next step, I can also give you:

* a **concrete repo layout** with module boundaries,
* the **Postgres schema** (tables + indexes),
* and a **language adapter interface** (so adding Java/Python later is straightforward).