Files
git.stella-ops.org/.gitea/docs/architecture.md
StellaOps Bot e6c47c8f50 save progress
2025-12-28 23:49:56 +02:00

11 KiB

CI/CD Architecture

Extended Documentation: See docs/cicd/ for comprehensive CI/CD guides.

Overview

StellaOps CI/CD infrastructure is built on Gitea Actions with a modular, layered architecture designed for:

  • Determinism: Reproducible builds and tests across environments
  • Offline-first: Support for air-gapped deployments
  • Security: Cryptographic signing and attestation at every stage
  • Scalability: Parallel execution with intelligent caching
Document Purpose
CI/CD Overview High-level architecture and getting started
Workflow Triggers Complete trigger matrix and dependency chains
Release Pipelines Suite, module, and bundle release flows
Security Scanning SAST, secrets, container, and dependency scanning
Troubleshooting Common issues and solutions
Script Reference CI/CD script documentation

Workflow Trigger Summary

Trigger Matrix (100 Workflows)

Trigger Type Count Examples
PR + Main Push 15 test-matrix.yml, build-test-deploy.yml
Tag-Based 3 release-suite.yml, release.yml, module-publish.yml
Scheduled 8 nightly-regression.yml, renovate.yml
Manual Only 25+ rollback.yml, cli-build.yml
Module-Specific 50+ Scanner, Concelier, Authority workflows

Tag Patterns

Pattern Workflow Example
suite-* Suite release suite-2026.04
v* Bundle release v2025.12.1
module-*-v* Module publish module-authority-v1.2.3

Schedule Overview

Time (UTC) Workflow Purpose
2:00 AM Daily nightly-regression.yml Full regression
3:00 AM/PM Daily renovate.yml Dependency updates
3:30 AM Monday sast-scan.yml Weekly security scan
5:00 AM Daily test-matrix.yml Extended tests

Full Details: See Workflow Triggers

Pipeline Architecture

Release Pipeline Flow

graph TD
    subgraph "Trigger Layer"
        TAG[Git Tag] --> PARSE[Parse Tag]
        DISPATCH[Manual Dispatch] --> PARSE
        SCHEDULE[Scheduled] --> PARSE
    end

    subgraph "Validation Layer"
        PARSE --> VALIDATE[Validate Inputs]
        VALIDATE --> RESOLVE[Resolve Versions]
    end

    subgraph "Build Layer"
        RESOLVE --> BUILD[Build Modules]
        BUILD --> TEST[Run Tests]
        TEST --> DETERMINISM[Determinism Check]
    end

    subgraph "Artifact Layer"
        DETERMINISM --> CONTAINER[Build Container]
        CONTAINER --> SBOM[Generate SBOM]
        SBOM --> SIGN[Sign Artifacts]
    end

    subgraph "Release Layer"
        SIGN --> MANIFEST[Update Manifest]
        MANIFEST --> CHANGELOG[Generate Changelog]
        CHANGELOG --> DOCS[Generate Docs]
        DOCS --> PUBLISH[Publish Release]
    end

    subgraph "Post-Release"
        PUBLISH --> VERIFY[Verify Release]
        VERIFY --> NOTIFY[Notify Stakeholders]
    end

Service Release Pipeline

graph LR
    subgraph "Trigger"
        A[service-{name}-v{semver}] --> B[Parse Service & Version]
    end

    subgraph "Build"
        B --> C[Read Directory.Versions.props]
        C --> D[Bump Version]
        D --> E[Build Service]
        E --> F[Run Tests]
    end

    subgraph "Package"
        F --> G[Build Container]
        G --> H[Generate Docker Tag]
        H --> I[Push to Registry]
    end

    subgraph "Attestation"
        I --> J[Generate SBOM]
        J --> K[Sign with Cosign]
        K --> L[Create Attestation]
    end

    subgraph "Finalize"
        L --> M[Update Manifest]
        M --> N[Commit Changes]
    end

Test Matrix Execution

graph TD
    subgraph "Matrix Strategy"
        TRIGGER[PR/Push] --> FILTER[Path Filter]
        FILTER --> MATRIX[Generate Matrix]
    end

    subgraph "Parallel Execution"
        MATRIX --> UNIT[Unit Tests]
        MATRIX --> INT[Integration Tests]
        MATRIX --> DET[Determinism Tests]
    end

    subgraph "Test Types"
        UNIT --> UNIT_FAST[Fast Unit]
        UNIT --> UNIT_SLOW[Slow Unit]
        INT --> INT_PG[PostgreSQL]
        INT --> INT_VALKEY[Valkey]
        DET --> DET_SCANNER[Scanner]
        DET --> DET_BUILD[Build Output]
    end

    subgraph "Reporting"
        UNIT_FAST --> TRX[TRX Reports]
        UNIT_SLOW --> TRX
        INT_PG --> TRX
        INT_VALKEY --> TRX
        DET_SCANNER --> TRX
        DET_BUILD --> TRX
        TRX --> SUMMARY[Job Summary]
    end

Workflow Dependencies

Core Dependencies

graph TD
    BTD[build-test-deploy.yml] --> TM[test-matrix.yml]
    BTD --> DG[determinism-gate.yml]

    TM --> TL[test-lanes.yml]
    TM --> ITG[integration-tests-gate.yml]

    RS[release-suite.yml] --> BTD
    RS --> MP[module-publish.yml]
    RS --> AS[artifact-signing.yml]

    SR[service-release.yml] --> BTD
    SR --> AS

    MP --> AS
    MP --> AB[attestation-bundle.yml]

Security Chain

graph LR
    BUILD[Build] --> SBOM[SBOM Generation]
    SBOM --> SIGN[Cosign Signing]
    SIGN --> ATTEST[Attestation]
    ATTEST --> VERIFY[Verification]
    VERIFY --> PUBLISH[Publish]

Execution Stages

Stage 1: Validation

Step Purpose Tools
Parse trigger Extract tag/input parameters bash
Validate config Check required files exist bash
Resolve versions Read from Directory.Versions.props Python
Check permissions Verify secrets available Gitea Actions

Stage 2: Build

Step Purpose Tools
Restore packages NuGet/npm dependencies dotnet restore, npm ci
Build solution Compile all projects dotnet build
Run analyzers Code analysis dotnet analyzers

Stage 3: Test

Step Purpose Tools
Unit tests Component testing xUnit
Integration tests Service integration Testcontainers
Determinism tests Output reproducibility Custom scripts

Stage 4: Package

Step Purpose Tools
Build container Docker image docker build
Generate SBOM Software bill of materials Syft
Sign artifacts Cryptographic signing Cosign
Create attestation in-toto/DSSE envelope Custom tools

Stage 5: Publish

Step Purpose Tools
Push container Registry upload docker push
Upload attestation Rekor transparency Cosign
Update manifest Version tracking Python
Generate docs Release documentation Python

Concurrency Control

Strategy

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

Workflow Groups

Group Behavior Workflows
Build Cancel in-progress build-test-deploy.yml
Release No cancel (sequential) release-suite.yml
Deploy Environment-locked promote.yml
Scheduled Allow concurrent renovate.yml

Caching Strategy

Cache Layers

graph TD
    subgraph "Package Cache"
        NUGET[NuGet Cache<br>~/.nuget/packages]
        NPM[npm Cache<br>~/.npm]
    end

    subgraph "Build Cache"
        OBJ[Object Files<br>**/obj]
        BIN[Binaries<br>**/bin]
    end

    subgraph "Test Cache"
        TC[Testcontainers<br>Images]
        FIX[Test Fixtures]
    end

    subgraph "Keys"
        K1[runner.os-nuget-hash] --> NUGET
        K2[runner.os-npm-hash] --> NPM
        K3[runner.os-dotnet-hash] --> OBJ
        K3 --> BIN
    end

Cache Configuration

Cache Key Pattern Restore Keys
NuGet ${{ runner.os }}-nuget-${{ hashFiles('**/*.csproj') }} ${{ runner.os }}-nuget-
npm ${{ runner.os }}-npm-${{ hashFiles('**/package-lock.json') }} ${{ runner.os }}-npm-
.NET Build ${{ runner.os }}-dotnet-${{ github.sha }} ${{ runner.os }}-dotnet-

Runner Requirements

Self-Hosted Runners

Label Purpose Requirements
ubuntu-latest General builds 4 CPU, 16GB RAM, 100GB disk
linux-arm64 ARM builds ARM64 host
windows-latest Windows builds Windows Server 2022
macos-latest macOS builds macOS 13+

Docker-in-Docker

Required for:

  • Testcontainers integration tests
  • Multi-architecture builds
  • Container scanning

Network Requirements

Endpoint Purpose Required
git.stella-ops.org Source, Registry Always
nuget.org NuGet packages Online mode
registry.npmjs.org npm packages Online mode
ghcr.io GitHub Container Registry Optional

Artifact Flow

Build Artifacts

artifacts/
├── binaries/
│   ├── StellaOps.Cli-linux-x64
│   ├── StellaOps.Cli-linux-arm64
│   ├── StellaOps.Cli-win-x64
│   └── StellaOps.Cli-osx-arm64
├── containers/
│   ├── scanner:1.2.3+20250128143022
│   └── authority:1.0.0+20250128143022
├── sbom/
│   ├── scanner.cyclonedx.json
│   └── authority.cyclonedx.json
└── attestations/
    ├── scanner.intoto.jsonl
    └── authority.intoto.jsonl

Release Artifacts

docs/releases/2026.04/
├── README.md
├── CHANGELOG.md
├── services.md
├── docker-compose.yml
├── docker-compose.airgap.yml
├── upgrade-guide.md
├── checksums.txt
└── manifest.yaml

Error Handling

Retry Strategy

Step Type Retries Backoff
Network calls 3 Exponential
Docker push 3 Linear (30s)
Tests 0 N/A
Signing 2 Linear (10s)

Failure Actions

Failure Type Action
Build failure Fail fast, notify
Test failure Continue, report
Signing failure Fail, alert security
Deploy failure Rollback, notify

Security Architecture

Secret Management

graph TD
    subgraph "Gitea Secrets"
        GS[Organization Secrets]
        RS[Repository Secrets]
        ES[Environment Secrets]
    end

    subgraph "Usage"
        GS --> BUILD[Build Workflows]
        RS --> SIGN[Signing Workflows]
        ES --> DEPLOY[Deploy Workflows]
    end

    subgraph "Rotation"
        ROTATE[Key Rotation] --> RS
        ROTATE --> ES
    end

Signing Chain

  1. Build outputs: SHA-256 checksums
  2. Container images: Cosign keyless/keyed signing
  3. SBOMs: in-toto attestation
  4. Releases: GPG-signed tags

Monitoring & Observability

Workflow Metrics

Metric Source Dashboard
Build duration Gitea Actions Grafana
Test pass rate TRX reports Grafana
Cache hit rate Actions cache Prometheus
Artifact size Upload artifact Prometheus

Alerts

Alert Condition Action
Build time > 30m Duration threshold Investigate
Test failures > 5% Rate threshold Review
Cache miss streak 3 consecutive Clear cache
Security scan critical Any critical CVE Block merge