Fix build and code structure improvements. New but essential UI functionality. CI improvements. Documentation improvements. AI module improvements.
This commit is contained in:
432
.gitea/docs/architecture.md
Normal file
432
.gitea/docs/architecture.md
Normal file
@@ -0,0 +1,432 @@
|
||||
# CI/CD Architecture
|
||||
|
||||
> **Extended Documentation:** See [docs/cicd/](../../docs/cicd/) for comprehensive CI/CD guides.
|
||||
|
||||
## Overview
|
||||
|
||||
StellaOps CI/CD infrastructure is built on Gitea Actions with a modular, layered architecture designed for:
|
||||
- **Determinism**: Reproducible builds and tests across environments
|
||||
- **Offline-first**: Support for air-gapped deployments
|
||||
- **Security**: Cryptographic signing and attestation at every stage
|
||||
- **Scalability**: Parallel execution with intelligent caching
|
||||
|
||||
## Quick Links
|
||||
|
||||
| Document | Purpose |
|
||||
|----------|---------|
|
||||
| [CI/CD Overview](../../docs/cicd/README.md) | High-level architecture and getting started |
|
||||
| [Workflow Triggers](../../docs/cicd/workflow-triggers.md) | Complete trigger matrix and dependency chains |
|
||||
| [Release Pipelines](../../docs/cicd/release-pipelines.md) | Suite, module, and bundle release flows |
|
||||
| [Security Scanning](../../docs/cicd/security-scanning.md) | SAST, secrets, container, and dependency scanning |
|
||||
| [Troubleshooting](./troubleshooting.md) | Common issues and solutions |
|
||||
| [Script Reference](./scripts.md) | CI/CD script documentation |
|
||||
|
||||
## Workflow Trigger Summary
|
||||
|
||||
### Trigger Matrix (100 Workflows)
|
||||
|
||||
| Trigger Type | Count | Examples |
|
||||
|--------------|-------|----------|
|
||||
| PR + Main Push | 15 | `test-matrix.yml`, `build-test-deploy.yml` |
|
||||
| Tag-Based | 3 | `release-suite.yml`, `release.yml`, `module-publish.yml` |
|
||||
| Scheduled | 8 | `nightly-regression.yml`, `renovate.yml` |
|
||||
| Manual Only | 25+ | `rollback.yml`, `cli-build.yml` |
|
||||
| Module-Specific | 50+ | Scanner, Concelier, Authority workflows |
|
||||
|
||||
### Tag Patterns
|
||||
|
||||
| Pattern | Workflow | Example |
|
||||
|---------|----------|---------|
|
||||
| `suite-*` | Suite release | `suite-2026.04` |
|
||||
| `v*` | Bundle release | `v2025.12.1` |
|
||||
| `module-*-v*` | Module publish | `module-authority-v1.2.3` |
|
||||
|
||||
### Schedule Overview
|
||||
|
||||
| Time (UTC) | Workflow | Purpose |
|
||||
|------------|----------|---------|
|
||||
| 2:00 AM Daily | `nightly-regression.yml` | Full regression |
|
||||
| 3:00 AM/PM Daily | `renovate.yml` | Dependency updates |
|
||||
| 3:30 AM Monday | `sast-scan.yml` | Weekly security scan |
|
||||
| 5:00 AM Daily | `test-matrix.yml` | Extended tests |
|
||||
|
||||
> **Full Details:** See [Workflow Triggers](../../docs/cicd/workflow-triggers.md)
|
||||
|
||||
## Pipeline Architecture
|
||||
|
||||
### Release Pipeline Flow
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph "Trigger Layer"
|
||||
TAG[Git Tag] --> PARSE[Parse Tag]
|
||||
DISPATCH[Manual Dispatch] --> PARSE
|
||||
SCHEDULE[Scheduled] --> PARSE
|
||||
end
|
||||
|
||||
subgraph "Validation Layer"
|
||||
PARSE --> VALIDATE[Validate Inputs]
|
||||
VALIDATE --> RESOLVE[Resolve Versions]
|
||||
end
|
||||
|
||||
subgraph "Build Layer"
|
||||
RESOLVE --> BUILD[Build Modules]
|
||||
BUILD --> TEST[Run Tests]
|
||||
TEST --> DETERMINISM[Determinism Check]
|
||||
end
|
||||
|
||||
subgraph "Artifact Layer"
|
||||
DETERMINISM --> CONTAINER[Build Container]
|
||||
CONTAINER --> SBOM[Generate SBOM]
|
||||
SBOM --> SIGN[Sign Artifacts]
|
||||
end
|
||||
|
||||
subgraph "Release Layer"
|
||||
SIGN --> MANIFEST[Update Manifest]
|
||||
MANIFEST --> CHANGELOG[Generate Changelog]
|
||||
CHANGELOG --> DOCS[Generate Docs]
|
||||
DOCS --> PUBLISH[Publish Release]
|
||||
end
|
||||
|
||||
subgraph "Post-Release"
|
||||
PUBLISH --> VERIFY[Verify Release]
|
||||
VERIFY --> NOTIFY[Notify Stakeholders]
|
||||
end
|
||||
```
|
||||
|
||||
### Service Release Pipeline
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph "Trigger"
|
||||
A[service-{name}-v{semver}] --> B[Parse Service & Version]
|
||||
end
|
||||
|
||||
subgraph "Build"
|
||||
B --> C[Read Directory.Versions.props]
|
||||
C --> D[Bump Version]
|
||||
D --> E[Build Service]
|
||||
E --> F[Run Tests]
|
||||
end
|
||||
|
||||
subgraph "Package"
|
||||
F --> G[Build Container]
|
||||
G --> H[Generate Docker Tag]
|
||||
H --> I[Push to Registry]
|
||||
end
|
||||
|
||||
subgraph "Attestation"
|
||||
I --> J[Generate SBOM]
|
||||
J --> K[Sign with Cosign]
|
||||
K --> L[Create Attestation]
|
||||
end
|
||||
|
||||
subgraph "Finalize"
|
||||
L --> M[Update Manifest]
|
||||
M --> N[Commit Changes]
|
||||
end
|
||||
```
|
||||
|
||||
### Test Matrix Execution
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph "Matrix Strategy"
|
||||
TRIGGER[PR/Push] --> FILTER[Path Filter]
|
||||
FILTER --> MATRIX[Generate Matrix]
|
||||
end
|
||||
|
||||
subgraph "Parallel Execution"
|
||||
MATRIX --> UNIT[Unit Tests]
|
||||
MATRIX --> INT[Integration Tests]
|
||||
MATRIX --> DET[Determinism Tests]
|
||||
end
|
||||
|
||||
subgraph "Test Types"
|
||||
UNIT --> UNIT_FAST[Fast Unit]
|
||||
UNIT --> UNIT_SLOW[Slow Unit]
|
||||
INT --> INT_PG[PostgreSQL]
|
||||
INT --> INT_VALKEY[Valkey]
|
||||
DET --> DET_SCANNER[Scanner]
|
||||
DET --> DET_BUILD[Build Output]
|
||||
end
|
||||
|
||||
subgraph "Reporting"
|
||||
UNIT_FAST --> TRX[TRX Reports]
|
||||
UNIT_SLOW --> TRX
|
||||
INT_PG --> TRX
|
||||
INT_VALKEY --> TRX
|
||||
DET_SCANNER --> TRX
|
||||
DET_BUILD --> TRX
|
||||
TRX --> SUMMARY[Job Summary]
|
||||
end
|
||||
```
|
||||
|
||||
## Workflow Dependencies
|
||||
|
||||
### Core Dependencies
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
BTD[build-test-deploy.yml] --> TM[test-matrix.yml]
|
||||
BTD --> DG[determinism-gate.yml]
|
||||
|
||||
TM --> TL[test-lanes.yml]
|
||||
TM --> ITG[integration-tests-gate.yml]
|
||||
|
||||
RS[release-suite.yml] --> BTD
|
||||
RS --> MP[module-publish.yml]
|
||||
RS --> AS[artifact-signing.yml]
|
||||
|
||||
SR[service-release.yml] --> BTD
|
||||
SR --> AS
|
||||
|
||||
MP --> AS
|
||||
MP --> AB[attestation-bundle.yml]
|
||||
```
|
||||
|
||||
### Security Chain
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
BUILD[Build] --> SBOM[SBOM Generation]
|
||||
SBOM --> SIGN[Cosign Signing]
|
||||
SIGN --> ATTEST[Attestation]
|
||||
ATTEST --> VERIFY[Verification]
|
||||
VERIFY --> PUBLISH[Publish]
|
||||
```
|
||||
|
||||
## Execution Stages
|
||||
|
||||
### Stage 1: Validation
|
||||
|
||||
| Step | Purpose | Tools |
|
||||
|------|---------|-------|
|
||||
| Parse trigger | Extract tag/input parameters | bash |
|
||||
| Validate config | Check required files exist | bash |
|
||||
| Resolve versions | Read from Directory.Versions.props | Python |
|
||||
| Check permissions | Verify secrets available | Gitea Actions |
|
||||
|
||||
### Stage 2: Build
|
||||
|
||||
| Step | Purpose | Tools |
|
||||
|------|---------|-------|
|
||||
| Restore packages | NuGet/npm dependencies | dotnet restore, npm ci |
|
||||
| Build solution | Compile all projects | dotnet build |
|
||||
| Run analyzers | Code analysis | dotnet analyzers |
|
||||
|
||||
### Stage 3: Test
|
||||
|
||||
| Step | Purpose | Tools |
|
||||
|------|---------|-------|
|
||||
| Unit tests | Component testing | xUnit |
|
||||
| Integration tests | Service integration | Testcontainers |
|
||||
| Determinism tests | Output reproducibility | Custom scripts |
|
||||
|
||||
### Stage 4: Package
|
||||
|
||||
| Step | Purpose | Tools |
|
||||
|------|---------|-------|
|
||||
| Build container | Docker image | docker build |
|
||||
| Generate SBOM | Software bill of materials | Syft |
|
||||
| Sign artifacts | Cryptographic signing | Cosign |
|
||||
| Create attestation | in-toto/DSSE envelope | Custom tools |
|
||||
|
||||
### Stage 5: Publish
|
||||
|
||||
| Step | Purpose | Tools |
|
||||
|------|---------|-------|
|
||||
| Push container | Registry upload | docker push |
|
||||
| Upload attestation | Rekor transparency | Cosign |
|
||||
| Update manifest | Version tracking | Python |
|
||||
| Generate docs | Release documentation | Python |
|
||||
|
||||
## Concurrency Control
|
||||
|
||||
### Strategy
|
||||
|
||||
```yaml
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
```
|
||||
|
||||
### Workflow Groups
|
||||
|
||||
| Group | Behavior | Workflows |
|
||||
|-------|----------|-----------|
|
||||
| Build | Cancel in-progress | `build-test-deploy.yml` |
|
||||
| Release | No cancel (sequential) | `release-suite.yml` |
|
||||
| Deploy | Environment-locked | `promote.yml` |
|
||||
| Scheduled | Allow concurrent | `renovate.yml` |
|
||||
|
||||
## Caching Strategy
|
||||
|
||||
### Cache Layers
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph "Package Cache"
|
||||
NUGET[NuGet Cache<br>~/.nuget/packages]
|
||||
NPM[npm Cache<br>~/.npm]
|
||||
end
|
||||
|
||||
subgraph "Build Cache"
|
||||
OBJ[Object Files<br>**/obj]
|
||||
BIN[Binaries<br>**/bin]
|
||||
end
|
||||
|
||||
subgraph "Test Cache"
|
||||
TC[Testcontainers<br>Images]
|
||||
FIX[Test Fixtures]
|
||||
end
|
||||
|
||||
subgraph "Keys"
|
||||
K1[runner.os-nuget-hash] --> NUGET
|
||||
K2[runner.os-npm-hash] --> NPM
|
||||
K3[runner.os-dotnet-hash] --> OBJ
|
||||
K3 --> BIN
|
||||
end
|
||||
```
|
||||
|
||||
### Cache Configuration
|
||||
|
||||
| Cache | Key Pattern | Restore Keys |
|
||||
|-------|-------------|--------------|
|
||||
| NuGet | `${{ runner.os }}-nuget-${{ hashFiles('**/*.csproj') }}` | `${{ runner.os }}-nuget-` |
|
||||
| npm | `${{ runner.os }}-npm-${{ hashFiles('**/package-lock.json') }}` | `${{ runner.os }}-npm-` |
|
||||
| .NET Build | `${{ runner.os }}-dotnet-${{ github.sha }}` | `${{ runner.os }}-dotnet-` |
|
||||
|
||||
## Runner Requirements
|
||||
|
||||
### Self-Hosted Runners
|
||||
|
||||
| Label | Purpose | Requirements |
|
||||
|-------|---------|--------------|
|
||||
| `ubuntu-latest` | General builds | 4 CPU, 16GB RAM, 100GB disk |
|
||||
| `linux-arm64` | ARM builds | ARM64 host |
|
||||
| `windows-latest` | Windows builds | Windows Server 2022 |
|
||||
| `macos-latest` | macOS builds | macOS 13+ |
|
||||
|
||||
### Docker-in-Docker
|
||||
|
||||
Required for:
|
||||
- Testcontainers integration tests
|
||||
- Multi-architecture builds
|
||||
- Container scanning
|
||||
|
||||
### Network Requirements
|
||||
|
||||
| Endpoint | Purpose | Required |
|
||||
|----------|---------|----------|
|
||||
| `git.stella-ops.org` | Source, Registry | Always |
|
||||
| `nuget.org` | NuGet packages | Online mode |
|
||||
| `registry.npmjs.org` | npm packages | Online mode |
|
||||
| `ghcr.io` | GitHub Container Registry | Optional |
|
||||
|
||||
## Artifact Flow
|
||||
|
||||
### Build Artifacts
|
||||
|
||||
```
|
||||
artifacts/
|
||||
├── binaries/
|
||||
│ ├── StellaOps.Cli-linux-x64
|
||||
│ ├── StellaOps.Cli-linux-arm64
|
||||
│ ├── StellaOps.Cli-win-x64
|
||||
│ └── StellaOps.Cli-osx-arm64
|
||||
├── containers/
|
||||
│ ├── scanner:1.2.3+20250128143022
|
||||
│ └── authority:1.0.0+20250128143022
|
||||
├── sbom/
|
||||
│ ├── scanner.cyclonedx.json
|
||||
│ └── authority.cyclonedx.json
|
||||
└── attestations/
|
||||
├── scanner.intoto.jsonl
|
||||
└── authority.intoto.jsonl
|
||||
```
|
||||
|
||||
### Release Artifacts
|
||||
|
||||
```
|
||||
docs/releases/2026.04/
|
||||
├── README.md
|
||||
├── CHANGELOG.md
|
||||
├── services.md
|
||||
├── docker-compose.yml
|
||||
├── docker-compose.airgap.yml
|
||||
├── upgrade-guide.md
|
||||
├── checksums.txt
|
||||
└── manifest.yaml
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Retry Strategy
|
||||
|
||||
| Step Type | Retries | Backoff |
|
||||
|-----------|---------|---------|
|
||||
| Network calls | 3 | Exponential |
|
||||
| Docker push | 3 | Linear (30s) |
|
||||
| Tests | 0 | N/A |
|
||||
| Signing | 2 | Linear (10s) |
|
||||
|
||||
### Failure Actions
|
||||
|
||||
| Failure Type | Action |
|
||||
|--------------|--------|
|
||||
| Build failure | Fail fast, notify |
|
||||
| Test failure | Continue, report |
|
||||
| Signing failure | Fail, alert security |
|
||||
| Deploy failure | Rollback, notify |
|
||||
|
||||
## Security Architecture
|
||||
|
||||
### Secret Management
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph "Gitea Secrets"
|
||||
GS[Organization Secrets]
|
||||
RS[Repository Secrets]
|
||||
ES[Environment Secrets]
|
||||
end
|
||||
|
||||
subgraph "Usage"
|
||||
GS --> BUILD[Build Workflows]
|
||||
RS --> SIGN[Signing Workflows]
|
||||
ES --> DEPLOY[Deploy Workflows]
|
||||
end
|
||||
|
||||
subgraph "Rotation"
|
||||
ROTATE[Key Rotation] --> RS
|
||||
ROTATE --> ES
|
||||
end
|
||||
```
|
||||
|
||||
### Signing Chain
|
||||
|
||||
1. **Build outputs**: SHA-256 checksums
|
||||
2. **Container images**: Cosign keyless/keyed signing
|
||||
3. **SBOMs**: in-toto attestation
|
||||
4. **Releases**: GPG-signed tags
|
||||
|
||||
## Monitoring & Observability
|
||||
|
||||
### Workflow Metrics
|
||||
|
||||
| Metric | Source | Dashboard |
|
||||
|--------|--------|-----------|
|
||||
| Build duration | Gitea Actions | Grafana |
|
||||
| Test pass rate | TRX reports | Grafana |
|
||||
| Cache hit rate | Actions cache | Prometheus |
|
||||
| Artifact size | Upload artifact | Prometheus |
|
||||
|
||||
### Alerts
|
||||
|
||||
| Alert | Condition | Action |
|
||||
|-------|-----------|--------|
|
||||
| Build time > 30m | Duration threshold | Investigate |
|
||||
| Test failures > 5% | Rate threshold | Review |
|
||||
| Cache miss streak | 3 consecutive | Clear cache |
|
||||
| Security scan critical | Any critical CVE | Block merge |
|
||||
736
.gitea/docs/scripts.md
Normal file
736
.gitea/docs/scripts.md
Normal file
@@ -0,0 +1,736 @@
|
||||
# CI/CD Scripts Inventory
|
||||
|
||||
Complete documentation of all scripts in `.gitea/scripts/`.
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
.gitea/scripts/
|
||||
├── build/ # Build orchestration
|
||||
├── evidence/ # Evidence bundle management
|
||||
├── metrics/ # Performance metrics
|
||||
├── release/ # Release automation
|
||||
├── sign/ # Artifact signing
|
||||
├── test/ # Test execution
|
||||
├── util/ # Utilities
|
||||
└── validate/ # Validation scripts
|
||||
```
|
||||
|
||||
## Exit Code Conventions
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| 0 | Success |
|
||||
| 1 | General error |
|
||||
| 2 | Missing configuration/key |
|
||||
| 3 | Missing required file |
|
||||
| 69 | Tool not found (EX_UNAVAILABLE) |
|
||||
|
||||
---
|
||||
|
||||
## Build Scripts (`scripts/build/`)
|
||||
|
||||
### build-cli.sh
|
||||
|
||||
Multi-platform CLI build with SBOM generation and signing.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
RIDS=linux-x64,win-x64,osx-arm64 ./build-cli.sh
|
||||
```
|
||||
|
||||
**Environment Variables:**
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `RIDS` | `linux-x64,win-x64,osx-arm64` | Comma-separated runtime identifiers |
|
||||
| `CONFIG` | `Release` | Build configuration |
|
||||
| `SBOM_TOOL` | `syft` | SBOM generator (`syft` or `none`) |
|
||||
| `SIGN` | `false` | Enable artifact signing |
|
||||
| `COSIGN_KEY` | - | Path to Cosign key file |
|
||||
|
||||
**Output:**
|
||||
```
|
||||
out/cli/
|
||||
├── linux-x64/
|
||||
│ ├── publish/
|
||||
│ ├── stella-cli-linux-x64.tar.gz
|
||||
│ ├── stella-cli-linux-x64.tar.gz.sha256
|
||||
│ └── stella-cli-linux-x64.tar.gz.sbom.json
|
||||
├── win-x64/
|
||||
│ ├── publish/
|
||||
│ ├── stella-cli-win-x64.zip
|
||||
│ └── ...
|
||||
└── manifest.json
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Builds self-contained single-file executables
|
||||
- Includes CLI plugins (Aoc, Symbols)
|
||||
- Generates SHA-256 checksums
|
||||
- Optional SBOM generation via Syft
|
||||
- Optional Cosign signing
|
||||
|
||||
---
|
||||
|
||||
### build-multiarch.sh
|
||||
|
||||
Multi-architecture Docker image builds using buildx.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
IMAGE=scanner PLATFORMS=linux/amd64,linux/arm64 ./build-multiarch.sh
|
||||
```
|
||||
|
||||
**Environment Variables:**
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `IMAGE` | - | Image name (required) |
|
||||
| `PLATFORMS` | `linux/amd64,linux/arm64` | Target platforms |
|
||||
| `REGISTRY` | `git.stella-ops.org` | Container registry |
|
||||
| `TAG` | `latest` | Image tag |
|
||||
| `PUSH` | `false` | Push to registry |
|
||||
|
||||
---
|
||||
|
||||
### build-airgap-bundle.sh
|
||||
|
||||
Build offline/air-gapped deployment bundle.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
VERSION=2026.04 ./build-airgap-bundle.sh
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```
|
||||
out/airgap/
|
||||
├── images.tar # All container images
|
||||
├── helm-charts.tar.gz # Helm charts
|
||||
├── compose.tar.gz # Docker Compose files
|
||||
├── checksums.txt
|
||||
└── manifest.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test Scripts (`scripts/test/`)
|
||||
|
||||
### determinism-run.sh
|
||||
|
||||
Run determinism verification tests.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./determinism-run.sh
|
||||
```
|
||||
|
||||
**Purpose:**
|
||||
- Executes tests filtered by `Determinism` category
|
||||
- Collects TRX test results
|
||||
- Generates summary and artifacts archive
|
||||
|
||||
**Output:**
|
||||
```
|
||||
out/scanner-determinism/
|
||||
├── determinism.trx
|
||||
├── summary.txt
|
||||
└── determinism-artifacts.tgz
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### run-fixtures-check.sh
|
||||
|
||||
Validate test fixtures against expected schemas.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./run-fixtures-check.sh [--update]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
- `--update`: Update golden fixtures if mismatched
|
||||
|
||||
---
|
||||
|
||||
## Validation Scripts (`scripts/validate/`)
|
||||
|
||||
### validate-sbom.sh
|
||||
|
||||
Validate CycloneDX SBOM files.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./validate-sbom.sh <sbom-file>
|
||||
./validate-sbom.sh --all
|
||||
./validate-sbom.sh --schema custom.json sample.json
|
||||
```
|
||||
|
||||
**Options:**
|
||||
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| `--all` | Validate all fixtures in `src/__Tests/__Benchmarks/golden-corpus/` |
|
||||
| `--schema <path>` | Custom schema file |
|
||||
|
||||
**Dependencies:**
|
||||
- `sbom-utility` (auto-installed if missing)
|
||||
|
||||
**Exit Codes:**
|
||||
- `0`: All validations passed
|
||||
- `1`: Validation failed
|
||||
|
||||
---
|
||||
|
||||
### validate-spdx.sh
|
||||
|
||||
Validate SPDX SBOM files.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./validate-spdx.sh <spdx-file>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### validate-vex.sh
|
||||
|
||||
Validate VEX documents (OpenVEX, CSAF).
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./validate-vex.sh <vex-file>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### validate-helm.sh
|
||||
|
||||
Validate Helm charts.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./validate-helm.sh [chart-path]
|
||||
```
|
||||
|
||||
**Default Path:** `devops/helm/stellaops`
|
||||
|
||||
**Checks:**
|
||||
- `helm lint`
|
||||
- Template rendering
|
||||
- Schema validation
|
||||
|
||||
---
|
||||
|
||||
### validate-compose.sh
|
||||
|
||||
Validate Docker Compose files.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./validate-compose.sh [profile]
|
||||
```
|
||||
|
||||
**Profiles:**
|
||||
- `dev` - Development
|
||||
- `stage` - Staging
|
||||
- `prod` - Production
|
||||
- `airgap` - Air-gapped
|
||||
|
||||
---
|
||||
|
||||
### validate-licenses.sh
|
||||
|
||||
Check dependency licenses for compliance.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./validate-licenses.sh
|
||||
```
|
||||
|
||||
**Checks:**
|
||||
- NuGet packages via `dotnet-delice`
|
||||
- npm packages via `license-checker`
|
||||
- Reports blocked licenses (GPL-2.0-only, SSPL, etc.)
|
||||
|
||||
---
|
||||
|
||||
### validate-migrations.sh
|
||||
|
||||
Validate database migrations.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./validate-migrations.sh
|
||||
```
|
||||
|
||||
**Checks:**
|
||||
- Migration naming conventions
|
||||
- Forward/rollback pairs
|
||||
- Idempotency
|
||||
|
||||
---
|
||||
|
||||
### validate-workflows.sh
|
||||
|
||||
Validate Gitea Actions workflow YAML files.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./validate-workflows.sh
|
||||
```
|
||||
|
||||
**Checks:**
|
||||
- YAML syntax
|
||||
- Required fields
|
||||
- Action version pinning
|
||||
|
||||
---
|
||||
|
||||
### verify-binaries.sh
|
||||
|
||||
Verify binary integrity.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./verify-binaries.sh <binary-path> [checksum-file]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Signing Scripts (`scripts/sign/`)
|
||||
|
||||
### sign-signals.sh
|
||||
|
||||
Sign Signals artifacts with Cosign.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./sign-signals.sh
|
||||
```
|
||||
|
||||
**Environment Variables:**
|
||||
|
||||
| Variable | Description |
|
||||
|----------|-------------|
|
||||
| `COSIGN_KEY_FILE` | Path to signing key |
|
||||
| `COSIGN_PRIVATE_KEY_B64` | Base64-encoded private key |
|
||||
| `COSIGN_PASSWORD` | Key password |
|
||||
| `COSIGN_ALLOW_DEV_KEY` | Allow development key (`1`) |
|
||||
| `OUT_DIR` | Output directory |
|
||||
|
||||
**Key Resolution Order:**
|
||||
1. `COSIGN_KEY_FILE` environment variable
|
||||
2. `COSIGN_PRIVATE_KEY_B64` environment variable (decoded)
|
||||
3. `tools/cosign/cosign.key`
|
||||
4. `tools/cosign/cosign.dev.key` (if `COSIGN_ALLOW_DEV_KEY=1`)
|
||||
|
||||
**Signed Artifacts:**
|
||||
- `confidence_decay_config.yaml`
|
||||
- `unknowns_scoring_manifest.json`
|
||||
- `heuristics.catalog.json`
|
||||
|
||||
**Output:**
|
||||
```
|
||||
evidence-locker/signals/{date}/
|
||||
├── confidence_decay_config.sigstore.json
|
||||
├── unknowns_scoring_manifest.sigstore.json
|
||||
├── heuristics_catalog.sigstore.json
|
||||
└── SHA256SUMS
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### sign-policy.sh
|
||||
|
||||
Sign policy artifacts.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./sign-policy.sh <policy-file>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### sign-authority-gaps.sh
|
||||
|
||||
Sign authority gap attestations.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./sign-authority-gaps.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Release Scripts (`scripts/release/`)
|
||||
|
||||
### build_release.py
|
||||
|
||||
Main release pipeline orchestration.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python build_release.py --channel stable --version 2026.04
|
||||
```
|
||||
|
||||
**Arguments:**
|
||||
|
||||
| Argument | Description |
|
||||
|----------|-------------|
|
||||
| `--channel` | Release channel (`stable`, `beta`, `nightly`) |
|
||||
| `--version` | Version string |
|
||||
| `--config` | Component config file |
|
||||
| `--dry-run` | Don't push artifacts |
|
||||
|
||||
**Dependencies:**
|
||||
- docker (with buildx)
|
||||
- cosign
|
||||
- helm
|
||||
- npm/node
|
||||
- dotnet SDK
|
||||
|
||||
---
|
||||
|
||||
### verify_release.py
|
||||
|
||||
Post-release verification.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python verify_release.py --version 2026.04
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### bump-service-version.py
|
||||
|
||||
Manage service versions in `Directory.Versions.props`.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
# Bump version
|
||||
python bump-service-version.py --service scanner --bump minor
|
||||
|
||||
# Set explicit version
|
||||
python bump-service-version.py --service scanner --version 2.0.0
|
||||
|
||||
# List versions
|
||||
python bump-service-version.py --list
|
||||
```
|
||||
|
||||
**Arguments:**
|
||||
|
||||
| Argument | Description |
|
||||
|----------|-------------|
|
||||
| `--service` | Service name (e.g., `scanner`, `authority`) |
|
||||
| `--bump` | Bump type (`major`, `minor`, `patch`) |
|
||||
| `--version` | Explicit version to set |
|
||||
| `--list` | List all service versions |
|
||||
| `--dry-run` | Don't write changes |
|
||||
|
||||
---
|
||||
|
||||
### read-service-version.sh
|
||||
|
||||
Read current service version.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./read-service-version.sh scanner
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```
|
||||
1.2.3
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### generate-docker-tag.sh
|
||||
|
||||
Generate Docker tag with datetime suffix.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./generate-docker-tag.sh 1.2.3
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```
|
||||
1.2.3+20250128143022
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### generate_changelog.py
|
||||
|
||||
AI-assisted changelog generation.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python generate_changelog.py --version 2026.04 --codename Nova
|
||||
```
|
||||
|
||||
**Environment Variables:**
|
||||
|
||||
| Variable | Description |
|
||||
|----------|-------------|
|
||||
| `AI_API_KEY` | AI service API key |
|
||||
| `AI_API_URL` | AI service endpoint (optional) |
|
||||
|
||||
**Features:**
|
||||
- Parses git commits since last release
|
||||
- Categorizes by type (Breaking, Security, Features, Fixes)
|
||||
- Groups by module
|
||||
- AI-assisted summary generation
|
||||
- Fallback to rule-based generation
|
||||
|
||||
---
|
||||
|
||||
### generate_suite_docs.py
|
||||
|
||||
Generate suite release documentation.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python generate_suite_docs.py --version 2026.04 --codename Nova
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```
|
||||
docs/releases/2026.04/
|
||||
├── README.md
|
||||
├── CHANGELOG.md
|
||||
├── services.md
|
||||
├── upgrade-guide.md
|
||||
├── checksums.txt
|
||||
└── manifest.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### generate_compose.py
|
||||
|
||||
Generate pinned Docker Compose files.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python generate_compose.py --version 2026.04
|
||||
```
|
||||
|
||||
**Output:**
|
||||
- `docker-compose.yml` - Standard deployment
|
||||
- `docker-compose.airgap.yml` - Air-gapped deployment
|
||||
|
||||
---
|
||||
|
||||
### collect_versions.py
|
||||
|
||||
Collect service versions from `Directory.Versions.props`.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python collect_versions.py --format json
|
||||
python collect_versions.py --format yaml
|
||||
python collect_versions.py --format markdown
|
||||
python collect_versions.py --format env
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### check_cli_parity.py
|
||||
|
||||
Verify CLI version parity across platforms.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python check_cli_parity.py
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Evidence Scripts (`scripts/evidence/`)
|
||||
|
||||
### upload-all-evidence.sh
|
||||
|
||||
Upload all evidence bundles to Evidence Locker.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./upload-all-evidence.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### signals-upload-evidence.sh
|
||||
|
||||
Upload Signals evidence.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./signals-upload-evidence.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### zastava-upload-evidence.sh
|
||||
|
||||
Upload Zastava evidence.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./zastava-upload-evidence.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Metrics Scripts (`scripts/metrics/`)
|
||||
|
||||
### compute-reachability-metrics.sh
|
||||
|
||||
Compute reachability analysis metrics.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./compute-reachability-metrics.sh
|
||||
```
|
||||
|
||||
**Output Metrics:**
|
||||
- Total functions analyzed
|
||||
- Reachable functions
|
||||
- Coverage percentage
|
||||
- Analysis duration
|
||||
|
||||
---
|
||||
|
||||
### compute-ttfs-metrics.sh
|
||||
|
||||
Compute Time-to-First-Scan metrics.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./compute-ttfs-metrics.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### enforce-performance-slos.sh
|
||||
|
||||
Enforce performance SLOs.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./enforce-performance-slos.sh
|
||||
```
|
||||
|
||||
**Checked SLOs:**
|
||||
- Build time < 30 minutes
|
||||
- Test coverage > 80%
|
||||
- TTFS < 60 seconds
|
||||
|
||||
---
|
||||
|
||||
## Utility Scripts (`scripts/util/`)
|
||||
|
||||
### cleanup-runner-space.sh
|
||||
|
||||
Clean up runner disk space.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./cleanup-runner-space.sh
|
||||
```
|
||||
|
||||
**Actions:**
|
||||
- Remove Docker build cache
|
||||
- Clean NuGet cache
|
||||
- Remove old test results
|
||||
- Prune unused images
|
||||
|
||||
---
|
||||
|
||||
### dotnet-filter.sh
|
||||
|
||||
Filter .NET projects for selective builds.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./dotnet-filter.sh --changed
|
||||
./dotnet-filter.sh --module Scanner
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### enable-openssl11-shim.sh
|
||||
|
||||
Enable OpenSSL 1.1 compatibility shim.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./enable-openssl11-shim.sh
|
||||
```
|
||||
|
||||
**Purpose:**
|
||||
Required for certain cryptographic operations on newer Linux distributions that have removed OpenSSL 1.1.
|
||||
|
||||
---
|
||||
|
||||
## Script Development Guidelines
|
||||
|
||||
### Required Elements
|
||||
|
||||
1. **Shebang:**
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
```
|
||||
|
||||
2. **Strict Mode:**
|
||||
```bash
|
||||
set -euo pipefail
|
||||
```
|
||||
|
||||
3. **Sprint Reference:**
|
||||
```bash
|
||||
# DEVOPS-XXX-YY-ZZZ: Description
|
||||
# Sprint: SPRINT_XXXX_XXXX_XXXX - Topic
|
||||
```
|
||||
|
||||
4. **Usage Documentation:**
|
||||
```bash
|
||||
# Usage:
|
||||
# ./script.sh <required-arg> [optional-arg]
|
||||
```
|
||||
|
||||
### Best Practices
|
||||
|
||||
1. **Use environment variables with defaults:**
|
||||
```bash
|
||||
CONFIG="${CONFIG:-Release}"
|
||||
```
|
||||
|
||||
2. **Validate required tools:**
|
||||
```bash
|
||||
if ! command -v dotnet >/dev/null 2>&1; then
|
||||
echo "dotnet CLI not found" >&2
|
||||
exit 69
|
||||
fi
|
||||
```
|
||||
|
||||
3. **Use absolute paths:**
|
||||
```bash
|
||||
ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
|
||||
```
|
||||
|
||||
4. **Handle cleanup:**
|
||||
```bash
|
||||
trap 'rm -f "$TMP_FILE"' EXIT
|
||||
```
|
||||
|
||||
5. **Use logging functions:**
|
||||
```bash
|
||||
log_info() { echo "[INFO] $*"; }
|
||||
log_error() { echo "[ERROR] $*" >&2; }
|
||||
```
|
||||
624
.gitea/docs/troubleshooting.md
Normal file
624
.gitea/docs/troubleshooting.md
Normal file
@@ -0,0 +1,624 @@
|
||||
# CI/CD Troubleshooting Guide
|
||||
|
||||
Common issues and solutions for StellaOps CI/CD infrastructure.
|
||||
|
||||
## Quick Diagnostics
|
||||
|
||||
### Check Workflow Status
|
||||
|
||||
```bash
|
||||
# View recent workflow runs
|
||||
gh run list --limit 10
|
||||
|
||||
# View specific run logs
|
||||
gh run view <run-id> --log
|
||||
|
||||
# Re-run failed workflow
|
||||
gh run rerun <run-id>
|
||||
```
|
||||
|
||||
### Verify Local Environment
|
||||
|
||||
```bash
|
||||
# Check .NET SDK
|
||||
dotnet --list-sdks
|
||||
|
||||
# Check Docker
|
||||
docker version
|
||||
docker buildx version
|
||||
|
||||
# Check Node.js
|
||||
node --version
|
||||
npm --version
|
||||
|
||||
# Check required tools
|
||||
which cosign syft helm
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Build Failures
|
||||
|
||||
### NuGet Restore Failures
|
||||
|
||||
**Symptom:** `error NU1301: Unable to load the service index`
|
||||
|
||||
**Causes:**
|
||||
1. Network connectivity issues
|
||||
2. NuGet source unavailable
|
||||
3. Invalid credentials
|
||||
|
||||
**Solutions:**
|
||||
|
||||
```bash
|
||||
# Clear NuGet cache
|
||||
dotnet nuget locals all --clear
|
||||
|
||||
# Check NuGet sources
|
||||
dotnet nuget list source
|
||||
|
||||
# Restore with verbose logging
|
||||
dotnet restore src/StellaOps.sln -v detailed
|
||||
```
|
||||
|
||||
**In CI:**
|
||||
```yaml
|
||||
- name: Restore with retry
|
||||
run: |
|
||||
for i in {1..3}; do
|
||||
dotnet restore src/StellaOps.sln && break
|
||||
echo "Retry $i..."
|
||||
sleep 30
|
||||
done
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### SDK Version Mismatch
|
||||
|
||||
**Symptom:** `error MSB4236: The SDK 'Microsoft.NET.Sdk' specified could not be found`
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Check `global.json`:
|
||||
```bash
|
||||
cat global.json
|
||||
```
|
||||
|
||||
2. Install correct SDK:
|
||||
```bash
|
||||
# CI environment
|
||||
- uses: actions/setup-dotnet@v4
|
||||
with:
|
||||
dotnet-version: '10.0.100'
|
||||
include-prerelease: true
|
||||
```
|
||||
|
||||
3. Override SDK version:
|
||||
```bash
|
||||
# Remove global.json override
|
||||
rm global.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Docker Build Failures
|
||||
|
||||
**Symptom:** `failed to solve: rpc error: code = Unknown`
|
||||
|
||||
**Causes:**
|
||||
1. Disk space exhausted
|
||||
2. Layer cache corruption
|
||||
3. Network timeout
|
||||
|
||||
**Solutions:**
|
||||
|
||||
```bash
|
||||
# Clean Docker system
|
||||
docker system prune -af
|
||||
docker builder prune -af
|
||||
|
||||
# Build without cache
|
||||
docker build --no-cache -t myimage .
|
||||
|
||||
# Increase buildx timeout
|
||||
docker buildx create --driver-opt network=host --use
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Multi-arch Build Failures
|
||||
|
||||
**Symptom:** `exec format error` or QEMU issues
|
||||
|
||||
**Solutions:**
|
||||
|
||||
```bash
|
||||
# Install QEMU for cross-platform builds
|
||||
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
|
||||
|
||||
# Create new buildx builder
|
||||
docker buildx create --name multiarch --driver docker-container --use
|
||||
docker buildx inspect --bootstrap
|
||||
|
||||
# Build for specific platforms
|
||||
docker buildx build --platform linux/amd64 -t myimage .
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test Failures
|
||||
|
||||
### Testcontainers Issues
|
||||
|
||||
**Symptom:** `Could not find a running Docker daemon`
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Ensure Docker is running:
|
||||
```bash
|
||||
docker info
|
||||
```
|
||||
|
||||
2. Set Testcontainers host:
|
||||
```bash
|
||||
export TESTCONTAINERS_HOST_OVERRIDE=host.docker.internal
|
||||
# or for Linux
|
||||
export TESTCONTAINERS_HOST_OVERRIDE=$(hostname -I | awk '{print $1}')
|
||||
```
|
||||
|
||||
3. Use Ryuk container for cleanup:
|
||||
```bash
|
||||
export TESTCONTAINERS_RYUK_DISABLED=false
|
||||
```
|
||||
|
||||
4. CI configuration:
|
||||
```yaml
|
||||
services:
|
||||
dind:
|
||||
image: docker:dind
|
||||
privileged: true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### PostgreSQL Test Failures
|
||||
|
||||
**Symptom:** `FATAL: role "postgres" does not exist`
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Check connection string:
|
||||
```bash
|
||||
export STELLAOPS_TEST_POSTGRES_CONNECTION="Host=localhost;Database=test;Username=postgres;Password=postgres"
|
||||
```
|
||||
|
||||
2. Use Testcontainers PostgreSQL:
|
||||
```csharp
|
||||
var container = new PostgreSqlBuilder()
|
||||
.WithDatabase("test")
|
||||
.WithUsername("postgres")
|
||||
.WithPassword("postgres")
|
||||
.Build();
|
||||
```
|
||||
|
||||
3. Wait for PostgreSQL readiness:
|
||||
```bash
|
||||
until pg_isready -h localhost -p 5432; do
|
||||
sleep 1
|
||||
done
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Test Timeouts
|
||||
|
||||
**Symptom:** `Test exceeded timeout`
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Increase timeout:
|
||||
```bash
|
||||
dotnet test --blame-hang-timeout 10m
|
||||
```
|
||||
|
||||
2. Run tests in parallel with limited concurrency:
|
||||
```bash
|
||||
dotnet test -maxcpucount:2
|
||||
```
|
||||
|
||||
3. Identify slow tests:
|
||||
```bash
|
||||
dotnet test --logger "console;verbosity=detailed" --logger "trx"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Determinism Test Failures
|
||||
|
||||
**Symptom:** `Output mismatch: expected SHA256 differs`
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Check for non-deterministic sources:
|
||||
- Timestamps
|
||||
- Random GUIDs
|
||||
- Floating-point operations
|
||||
- Dictionary ordering
|
||||
|
||||
2. Run determinism comparison:
|
||||
```bash
|
||||
.gitea/scripts/test/determinism-run.sh
|
||||
diff out/scanner-determinism/run1.json out/scanner-determinism/run2.json
|
||||
```
|
||||
|
||||
3. Update golden fixtures:
|
||||
```bash
|
||||
.gitea/scripts/test/run-fixtures-check.sh --update
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment Failures
|
||||
|
||||
### SSH Connection Issues
|
||||
|
||||
**Symptom:** `ssh: connect to host X.X.X.X port 22: Connection refused`
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Verify SSH key:
|
||||
```bash
|
||||
ssh-keygen -lf ~/.ssh/id_rsa.pub
|
||||
```
|
||||
|
||||
2. Test connection:
|
||||
```bash
|
||||
ssh -vvv user@host
|
||||
```
|
||||
|
||||
3. Add host to known_hosts:
|
||||
```yaml
|
||||
- name: Setup SSH
|
||||
run: |
|
||||
mkdir -p ~/.ssh
|
||||
ssh-keyscan -H ${{ secrets.DEPLOY_HOST }} >> ~/.ssh/known_hosts
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Registry Push Failures
|
||||
|
||||
**Symptom:** `unauthorized: authentication required`
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Login to registry:
|
||||
```bash
|
||||
docker login git.stella-ops.org -u $REGISTRY_USERNAME -p $REGISTRY_PASSWORD
|
||||
```
|
||||
|
||||
2. Check token permissions:
|
||||
- `write:packages` scope required
|
||||
- Token not expired
|
||||
|
||||
3. Use credential helper:
|
||||
```yaml
|
||||
- name: Login to Registry
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
registry: git.stella-ops.org
|
||||
username: ${{ secrets.REGISTRY_USERNAME }}
|
||||
password: ${{ secrets.REGISTRY_PASSWORD }}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Helm Deployment Failures
|
||||
|
||||
**Symptom:** `Error: UPGRADE FAILED: cannot patch`
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Check resource conflicts:
|
||||
```bash
|
||||
kubectl get events -n stellaops --sort-by='.lastTimestamp'
|
||||
```
|
||||
|
||||
2. Force upgrade:
|
||||
```bash
|
||||
helm upgrade --install --force stellaops ./devops/helm/stellaops
|
||||
```
|
||||
|
||||
3. Clean up stuck release:
|
||||
```bash
|
||||
helm history stellaops
|
||||
helm rollback stellaops <revision>
|
||||
# or
|
||||
kubectl delete secret -l name=stellaops,owner=helm
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow Issues
|
||||
|
||||
### Workflow Not Triggering
|
||||
|
||||
**Symptom:** Push/PR doesn't trigger workflow
|
||||
|
||||
**Causes:**
|
||||
1. Path filter not matching
|
||||
2. Branch protection rules
|
||||
3. YAML syntax error
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Check path filters:
|
||||
```yaml
|
||||
on:
|
||||
push:
|
||||
paths:
|
||||
- 'src/**' # Check if files match
|
||||
```
|
||||
|
||||
2. Validate YAML:
|
||||
```bash
|
||||
.gitea/scripts/validate/validate-workflows.sh
|
||||
```
|
||||
|
||||
3. Check branch rules:
|
||||
- Verify workflow permissions
|
||||
- Check protected branch settings
|
||||
|
||||
---
|
||||
|
||||
### Concurrency Issues
|
||||
|
||||
**Symptom:** Duplicate runs or stuck workflows
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Add concurrency control:
|
||||
```yaml
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
```
|
||||
|
||||
2. Cancel stale runs manually:
|
||||
```bash
|
||||
gh run cancel <run-id>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Artifact Upload/Download Failures
|
||||
|
||||
**Symptom:** `Unable to find any artifacts`
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Check artifact names match:
|
||||
```yaml
|
||||
# Upload
|
||||
- uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: my-artifact # Must match
|
||||
|
||||
# Download
|
||||
- uses: actions/download-artifact@v4
|
||||
with:
|
||||
name: my-artifact # Must match
|
||||
```
|
||||
|
||||
2. Check retention period:
|
||||
```yaml
|
||||
- uses: actions/upload-artifact@v4
|
||||
with:
|
||||
retention-days: 90 # Default is 90
|
||||
```
|
||||
|
||||
3. Verify job dependencies:
|
||||
```yaml
|
||||
download-job:
|
||||
needs: [upload-job] # Must complete first
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Runner Issues
|
||||
|
||||
### Disk Space Exhausted
|
||||
|
||||
**Symptom:** `No space left on device`
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Run cleanup script:
|
||||
```bash
|
||||
.gitea/scripts/util/cleanup-runner-space.sh
|
||||
```
|
||||
|
||||
2. Add cleanup step to workflow:
|
||||
```yaml
|
||||
- name: Free disk space
|
||||
run: |
|
||||
docker system prune -af
|
||||
rm -rf /tmp/*
|
||||
df -h
|
||||
```
|
||||
|
||||
3. Use larger runner:
|
||||
```yaml
|
||||
runs-on: ubuntu-latest-4xlarge
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Out of Memory
|
||||
|
||||
**Symptom:** `Killed` or `OOMKilled`
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Limit parallel jobs:
|
||||
```yaml
|
||||
strategy:
|
||||
max-parallel: 2
|
||||
```
|
||||
|
||||
2. Limit dotnet memory:
|
||||
```bash
|
||||
export DOTNET_GCHeapHardLimit=0x40000000 # 1GB
|
||||
```
|
||||
|
||||
3. Use swap:
|
||||
```yaml
|
||||
- name: Create swap
|
||||
run: |
|
||||
sudo fallocate -l 4G /swapfile
|
||||
sudo chmod 600 /swapfile
|
||||
sudo mkswap /swapfile
|
||||
sudo swapon /swapfile
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Runner Not Picking Up Jobs
|
||||
|
||||
**Symptom:** Jobs stuck in `queued` state
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Check runner status:
|
||||
```bash
|
||||
# Self-hosted runner
|
||||
./run.sh --check
|
||||
```
|
||||
|
||||
2. Verify labels match:
|
||||
```yaml
|
||||
runs-on: [self-hosted, linux, x64] # All labels must match
|
||||
```
|
||||
|
||||
3. Restart runner service:
|
||||
```bash
|
||||
sudo systemctl restart actions.runner.*.service
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Signing & Attestation Issues
|
||||
|
||||
### Cosign Signing Failures
|
||||
|
||||
**Symptom:** `error opening key: no such file`
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Check key configuration:
|
||||
```bash
|
||||
# From base64 secret
|
||||
echo "$COSIGN_PRIVATE_KEY_B64" | base64 -d > cosign.key
|
||||
|
||||
# Verify key
|
||||
cosign public-key --key cosign.key
|
||||
```
|
||||
|
||||
2. Set password:
|
||||
```bash
|
||||
export COSIGN_PASSWORD="${{ secrets.COSIGN_PASSWORD }}"
|
||||
```
|
||||
|
||||
3. Use keyless signing:
|
||||
```yaml
|
||||
- name: Sign with keyless
|
||||
env:
|
||||
COSIGN_EXPERIMENTAL: 1
|
||||
run: cosign sign --yes $IMAGE
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### SBOM Generation Failures
|
||||
|
||||
**Symptom:** `syft: command not found`
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Install Syft:
|
||||
```bash
|
||||
curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin
|
||||
```
|
||||
|
||||
2. Use container:
|
||||
```yaml
|
||||
- name: Generate SBOM
|
||||
uses: anchore/sbom-action@v0
|
||||
with:
|
||||
image: ${{ env.IMAGE }}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Debugging Tips
|
||||
|
||||
### Enable Debug Logging
|
||||
|
||||
```yaml
|
||||
env:
|
||||
ACTIONS_STEP_DEBUG: true
|
||||
ACTIONS_RUNNER_DEBUG: true
|
||||
```
|
||||
|
||||
### SSH into Runner
|
||||
|
||||
```yaml
|
||||
- name: Debug SSH
|
||||
uses: mxschmitt/action-tmate@v3
|
||||
if: failure()
|
||||
```
|
||||
|
||||
### Collect Diagnostic Info
|
||||
|
||||
```yaml
|
||||
- name: Diagnostics
|
||||
if: failure()
|
||||
run: |
|
||||
echo "=== Environment ==="
|
||||
env | sort
|
||||
echo "=== Disk ==="
|
||||
df -h
|
||||
echo "=== Memory ==="
|
||||
free -m
|
||||
echo "=== Docker ==="
|
||||
docker info
|
||||
docker ps -a
|
||||
```
|
||||
|
||||
### View Workflow Logs
|
||||
|
||||
```bash
|
||||
# Stream logs
|
||||
gh run watch <run-id>
|
||||
|
||||
# Download logs
|
||||
gh run download <run-id> --name logs
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Getting Help
|
||||
|
||||
1. **Check existing issues:** Search repository issues
|
||||
2. **Review workflow history:** Look for similar failures
|
||||
3. **Consult documentation:** `docs/` and `.gitea/docs/`
|
||||
4. **Contact DevOps:** Create issue with label `ci-cd`
|
||||
|
||||
### Information to Include
|
||||
|
||||
- Workflow name and run ID
|
||||
- Error message and stack trace
|
||||
- Steps to reproduce
|
||||
- Environment details (OS, SDK versions)
|
||||
- Recent changes to affected code
|
||||
Reference in New Issue
Block a user