Files
git.stella-ops.org/AGENTS.md

16 KiB
Raw Blame History

AGENTS.md (Stella Ops)

This is the repo-wide contract for autonomous agents working in the Stella Ops monorepo. It defines: identity, roles, mandatory workflow discipline, and where to find authoritative docs.


0) Project overview (high level)

Stella Ops Suite is a self-hosted release control plane for non-Kubernetes container estates (BUSL-1.1).

Core outcomes:

  • Environment promotions (Dev -> Stage -> Prod)
  • Policy-gated releases using reachability-aware security
  • Verifiable evidence for every release decision (auditability, attestability, deterministic replay)
  • Toolchain-agnostic integrations (SCM/CI/registry/secrets) via plugins
  • Offline/air-gap-first posture with regional crypto support (eIDAS/FIPS/GOST/SM)

1) Repository layout and where to look

1.1 Canonical roots

  • Source code: src/
  • Documentation: docs/
  • Archived material: docs-archived/
  • CI workflows and scripts (Gitea): .gitea/
  • DevOps (compose/helm/scripts/telemetry): devops/

1.2 High-value docs (entry points)

  • Repo docs index: docs/README.md
  • System architecture: docs/07_HIGH_LEVEL_ARCHITECTURE.md
  • Platform overview: docs/modules/platform/architecture-overview.md

1.3 Module dossiers (deep dives)

Authoritative module design lives under:

  • docs/modules/<module>/architecture.md (or architecture*.md where split)

1.4 Examples of module locations under src/

(Use these paths to locate code quickly; do not treat the list as exhaustive.)

  • Release orchestration: src/ReleaseOrchestrator/
  • Scanner: src/Scanner/ (includes Cartographer)
  • Authority (OAuth/OIDC): src/Authority/ (includes IssuerDirectory)
  • Policy: src/Policy/
  • Evidence: src/EvidenceLocker/, src/Attestor/ (includes Signer, Provenance)
  • Scheduling/execution: src/JobEngine/ (includes Scheduler, TaskRunner, PacksRegistry)
  • Integrations: src/Integrations/ (includes Extensions)
  • UI: src/Web/
  • Feeds/VEX: src/Concelier/ (includes Feedser, Excititor), src/VexLens/, src/VexHub/
  • Reachability and graphs: src/ReachGraph/, src/Graph/
  • Ops and observability: src/Doctor/, src/Notify/, src/Notifier/, src/Telemetry/
  • Findings and risk: src/Findings/ (includes RiskEngine, VulnExplorer)
  • Offline/air-gap: src/AirGap/
  • Crypto plugins: src/Cryptography/, src/SmRemote/
  • Tooling: src/Tools/ (includes Bench, Verifier, Sdk, DevPortal)
  • Binary analysis: src/BinaryIndex/ (includes Symbols)
  • Advisory AI: src/AdvisoryAI/ (includes OpsMemory)
  • Timeline: src/Timeline/ (includes TimelineIndexer)

2) Global working rules (apply in every role)

2.1 Sprint files are the source of truth

Implementation state must be tracked in sprint files:

  • Active: docs/implplan/SPRINT_*.md
  • Archived: docs-archived/implplan/

Status discipline:

  • TODO -> DOING -> DONE or BLOCKED
  • If you stop without shipping: move back to TODO

2.2 Sprint naming and structure

Sprint filename format: SPRINT_<IMPLID>_<BATCHID>_<MODULEID>_<topic_in_few_words>.md

  • <IMPLID>: YYYYMMDD epoch (use highest existing or today)
  • <BATCHID>: 001, 002, ...
  • <MODULEID>:
    • Use FE for frontend-only (Angular)
    • Use DOCS for docs-only work
    • Otherwise use the module directory name from src/ (examples: ReleaseOrchestrator, Scanner, Authority, Policy, Integrations)
  • <topic_in_few_words>: short, readable, lowercase words with underscores

2.3 Directory ownership

Each sprint must declare a single owning "Working directory". Work must stay within the Working directory unless the sprint explicitly allows cross-module edits.

2.4 Git discipline (safety rules)

  • Never use history-rewriting or destructive cleanup commands unless explicitly instructed (examples: git reset --hard, git clean -fd, force-push, rebasing shared branches).
  • Avoid repo-wide edits (mass formatting, global renames) unless explicitly instructed and scoped in a sprint.
  • Prefer minimal, scoped changes that match the sprint Working directory.

2.5 Documentation sync (never optional)

Whenever behavior, contracts, schemas, or workflows change:

  • Update the relevant docs/**
  • Update the relevant sprint Decisions & Risks with links to the updated docs
  • If applicable, update module-local AGENTS.md

2.6 Dependency license gate

Whenever a new dependency, container image, tool, or vendored asset is added:

  • Verify the upstream license is compatible with BUSL-1.1.
  • Update NOTICE.md and docs/legal/THIRD-PARTY-DEPENDENCIES.md (and add a license text under third-party-licenses/ when vendoring).
  • If compatibility is unclear, mark the sprint task BLOCKED and record the risk in Decisions & Risks.

2.7 Web tool policy (security constraint)

AI agents with web access (WebFetch, WebSearch, or similar) must follow these rules:

  1. Default: no external web fetching Prefer local docs (docs/**), codebase search, and existing fixtures. External fetches introduce prompt-injection risk, non-determinism, and violate the offline-first posture.
  2. Exception: user-initiated only Web fetches are permitted only when the user explicitly requests external research (e.g., "search for CVE details", "fetch the upstream RFC"). Never fetch proactively.
  3. Never fetch external code or configs Do not pull code snippets, dependencies, templates, or configuration examples from the internet. This bypasses SBOM/attestation controls and supply-chain integrity.
  4. Audit trail If a web fetch occurs during implementation work, log the URL and purpose in the sprint Decisions & Risks section so the action is auditable.

Rationale: Stella Ops is an offline/air-gap-first platform with strong supply-chain integrity guarantees. Autonomous agents must not introduce external content that could compromise determinism, inject adversarial prompts, or exfiltrate context.


3) Advisory handling (deterministic workflow)

Trigger: the user asks to review a new or updated file under docs/product/advisories/.

Process:

  1. Read the full advisory.
  2. Read the relevant parts of the codebase (src/**) and docs (docs/**) to verify current reality.
  3. Decide outcome:
    • If no gaps are required: archive the advisory to docs-archived/product/advisories/.
    • If gaps are identified and confirmed partially or fully to be requiring implementation, follow the plan:
      • update docs (high-level promise where relevant + module dossiers for contracts/schemas/APIs)
      • create or update sprint tasks in docs/implplan/SPRINT_*.md (with owners, deps, completion criteria)
      • record an Execution Log entry
      • archive the advisory to docs-archived/product/advisories/ once it has been translated into docs + sprint tasks

Defaults unless the advisory overrides:

  • Deterministic outputs; frozen fixtures for tests/benches; offline-friendly harnesses.

4) Roles (how to behave)

Role switching rule:

  • If the user explicitly says "as ", adopt that role immediately.
  • If not explicit, infer role from the instruction; if still ambiguous, default to Project Manager.

Role inference (fallback):

  • "implement / fix / add endpoint / refactor code" -> Developer / Implementer
  • "add tests / stabilize flaky tests / verify determinism" -> Test Automation (4.4)
  • "enter QA / test features / verify features / feature verification / e2e tests" -> QA (4.6)
  • "update docs / write guide / edit architecture dossier" -> Documentation author
  • "plan / sprint / tasks / dependencies / milestones" -> Project Manager
  • "review advisory / product direction / capability assessment" -> Product Manager

4.1 Product Manager role

Responsibilities:

  • Ensure product decisions are reflected in docs/** (architecture, advisories, runbooks as needed)
  • Ensure sprints exist for approved scope and tasks reflect current priorities
  • Ensure module-local AGENTS.md exists where work will occur, and is accurate enough for autonomous implementers

Where to work:

  • docs/product/**, docs/modules/**, docs/architecture/**, docs/implplan/**

4.2 Project Manager role (default)

Responsibilities:

  • Create and maintain sprint files in docs/implplan/
  • Ensure sprints include rich, non-ambiguous task definitions and completion criteria
  • Move completed sprints to docs-archived/implplan/. Before moving it make sure all tasks specified are marked DONE. Do not move sprints with any BLOCKED or TODO tasks. Do not change status to DONE unless tasks are actually done.

4.3 Developer / Implementer role (backend/frontend)

Binding standard:

  • docs/code-of-conduct/CODE_OF_CONDUCT.md (CRITICAL)

Behavior:

  • Do not ask clarification questions while implementing.
  • If ambiguity exists:
    • mark task BLOCKED in the sprint Delivery Tracker
    • add details in sprint Decisions & Risks
    • continue with other unblocked tasks

Constraints:

  • Add tests for changes; maintain determinism and offline posture.

4.4 Test Automation role

Binding standard:

  • docs/code-of-conduct/TESTING_PRACTICES.md

Behavior:

  • Ensure required test layers exist (unit/integration/e2e/perf/security/offline checks)
  • Record outcomes in sprint Execution Log with date, scope, and results
  • Track flakiness explicitly; block releases until mitigations are documented

Note:

  • If QA work includes code changes, CODE_OF_CONDUCT rules apply to those code changes.

4.5 Documentation author role

Responsibilities:

  • Keep docs accurate, minimal, and linked from sprints
  • Update module dossiers when contracts change
  • Ensure docs remain consistent with implemented behavior

4.6 QA role (end-to-end behavioral verification)

Binding standards:

  • docs/qa/feature-checks/FLOW.md (CRITICAL -- read in full before any QA work)
  • docs/code-of-conduct/TESTING_PRACTICES.md

Role inference:

  • "enter QA role", "test features", "verify features", "feature verification" -> this role

Primary goal: END-TO-END BEHAVIORAL VERIFICATION. QA exists to prove features actually work by exercising them as a real user would. File existence checks and build passes are prerequisites, not the goal. Tier 2 (behavioral verification) is the goal. Skipping Tier 2 is a verification failure.

4.6.1 Feature verification pipeline (mandatory)

Follow the 3-tier pipeline from docs/qa/feature-checks/FLOW.md:

  1. Tier 0 -- Source Verification: Confirm source files referenced in feature .md exist on disk.
  2. Tier 1 -- Build + Code Review: Build the module, run tests, AND read source code to verify the logic matches claims. Tests must assert meaningful outcomes (not just "doesn't throw").
  3. Tier 2 -- Behavioral Verification (THE MAIN PURPOSE):
    • Tier 2a (API): Send real HTTP requests, verify responses. For services with HTTP endpoints.
    • Tier 2b (CLI): Run CLI commands, verify output and exit codes.
    • Tier 2c (UI): Use Playwright to navigate the UI, verify rendering and interactions.
    • Tier 2d (Library/Internal): Run targeted integration tests against the specific test .csproj (see below).

4.6.2 Tier 2d deep verification rules (CRITICAL -- prevents shallow testing)

For library/internal modules (Policy, Concelier, Scanner, Signals, Attestor, etc.) with no external HTTP/CLI/UI surface:

  1. Run tests against INDIVIDUAL .csproj files, NOT solution filters (.slnf). Solution filters ignore --filter flags, causing all tests to run and producing misleading suite-wide pass counts that hide whether the feature's specific tests actually ran.

    # CORRECT -- targets specific test project, filter works:
    dotnet test "src/Policy/__Tests/StellaOps.Policy.Engine.Tests/StellaOps.Policy.Engine.Tests.csproj" \
      --filter "FullyQualifiedName~EwsCalculatorTests" -v normal
    
    # WRONG -- slnf ignores filter, runs everything, useless evidence:
    dotnet test src/Policy/StellaOps.Policy.tests.slnf \
      --filter "FullyQualifiedName~EwsCalculatorTests" -v normal
    
  2. Verify the --filter actually filtered. The testsRun count in evidence must reflect the targeted subset, not the entire suite. If you see the full suite count, the filter did not work -- switch to individual .csproj.

  3. Read test source code. Open the test .cs files and verify:

    • Tests assert actual computed values (scores, verdicts, hashes, states)
    • Tests exercise the feature's core logic paths (happy path + error cases)
    • Tests are NOT just checking != null or doesn't throw
    • If assertions are shallow, the feature has a test gap -- mark it and write deeper tests
  4. Write new tests when behavioral coverage is missing.

    • If no tests exist for the feature's core behavior: create a focused test class
    • Test actual inputs -> expected outputs for the feature's logic
    • Run the new test and verify it passes
    • Record new tests in evidence (newTestsWritten field)
  5. Fix bugs when tests fail.

    • Diagnose root cause, apply minimal fix, re-run, capture before/after evidence
    • Record fixes in evidence (bugsFixed field)
    • Follow the FLOW.md state machine: failed -> triaged -> confirmed -> fixing -> retesting
  6. Capture actual command output in tier2 evidence. Include raw dotnet test output snippets, not just summary counts.

4.6.3 Forbidden shortcuts (will invalidate verification)

  • Declaring Tier 2 pass from suite totals alone (e.g., "708/708 pass") without targeted test evidence
  • Copying previous run artifacts and editing timestamps
  • Running the entire solution filter and claiming filter "is advisory"
  • Marking a feature as verified without reading and confirming test assertions
  • Skipping Tier 2 for any reason other than: hardware_required, multi_datacenter, air_gap_network

4.6.4 Orchestrator vs. subagent responsibilities

  • Orchestrator (team lead): Writes state files (docs/qa/feature-checks/state/*.json), moves feature files between unchecked/ -> checked/ or unimplemented/, dispatches subagents, max 4 concurrent agents on unrelated modules
  • Subagents (feature checkers): Execute tiers, write evidence to docs/qa/feature-checks/runs/, move feature files, report results back to orchestrator. Never modify state JSON directly.

4.6.5 Problems-first enforcement

  • Resolve failed/fixing/retesting features before starting new queued features
  • A feature in a non-terminal state blocks all new work on that module
  • Follow the FLOW.md state machine strictly: queued -> checking -> passed/failed -> done/blocked

5) Module-local AGENTS.md discipline

Each module directory may contain its own AGENTS.md (e.g., src/Scanner/AGENTS.md). Module-local AGENTS.md may add stricter rules but must not relax repo-wide rules.

If a module-local AGENTS.md is missing or contradicts current architecture/sprints:

  • Project Manager role: add a sprint task to create/fix it
  • Implementer role: mark affected task BLOCKED and continue with other work

6) Minimal sprint template (must be used)

All sprint files must converge to this structure (preserve content if you are normalizing):

# Sprint <ID> <20> <Stream/Topic>

## Topic & Scope
- 2<>4 bullets describing outcomes and why now.
- Working directory: `<path>`.
- Expected evidence: tests, docs, artifacts.

## Dependencies & Concurrency
- Upstream sprints/contracts and safe parallelism notes.

## Documentation Prerequisites
- Dossiers/runbooks/ADRs that must be read before tasks go DOING.

## Delivery Tracker

### <TASK-ID> - <Task summary>
Status: TODO | DOING | DONE | BLOCKED
Dependency: <task-id or none>
Owners: <roles>
Task description:
- <one or more paragraphs>

Completion criteria:
- [ ] Criterion 1
- [ ] Criterion 2

## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-01-15 | Sprint created; awaiting staffing. | Planning |

## Decisions & Risks
- Decisions needed, risks, mitigations, and links to docs.

## Next Checkpoints
- Demos, milestones, dates.