Files
git.stella-ops.org/docs/airgap/vex-raw-schema-validation.md
StellaOps Bot 3b96b2e3ea
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
up
2025-11-27 23:45:09 +02:00

5.4 KiB

VEX Raw Schema Validation - Offline Kit

This document describes how operators can validate the integrity of VEX raw evidence stored in MongoDB, ensuring that Excititor stores only immutable, content-addressed documents.

Overview

The vex_raw collection stores raw VEX documents with content-addressed storage (documents are keyed by their cryptographic hash). This ensures immutability - documents cannot be modified after insertion without changing their key.

Schema Definition

The MongoDB JSON Schema enforces the following structure:

{
  "$jsonSchema": {
    "bsonType": "object",
    "title": "VEX Raw Document Schema",
    "description": "Schema for immutable VEX evidence storage",
    "required": ["_id", "providerId", "format", "sourceUri", "retrievedAt", "digest"],
    "properties": {
      "_id": {
        "bsonType": "string",
        "description": "Content digest serving as immutable key"
      },
      "providerId": {
        "bsonType": "string",
        "minLength": 1,
        "description": "VEX provider identifier"
      },
      "format": {
        "bsonType": "string",
        "enum": ["csaf", "cyclonedx", "openvex"],
        "description": "VEX document format"
      },
      "sourceUri": {
        "bsonType": "string",
        "minLength": 1,
        "description": "Original source URI"
      },
      "retrievedAt": {
        "bsonType": "date",
        "description": "Timestamp when document was fetched"
      },
      "digest": {
        "bsonType": "string",
        "minLength": 32,
        "description": "Content hash (SHA-256 hex)"
      },
      "content": {
        "bsonType": ["binData", "string"],
        "description": "Raw document content"
      },
      "gridFsObjectId": {
        "bsonType": ["objectId", "null", "string"],
        "description": "GridFS reference for large documents"
      },
      "metadata": {
        "bsonType": "object",
        "description": "Provider-specific metadata"
      }
    }
  }
}

Offline Validation Steps

1. Export the Schema

The schema can be exported from the application using the validator tooling:

# Using the Excititor CLI
stellaops excititor schema export --collection vex_raw --output vex-raw-schema.json

# Or via MongoDB shell
mongosh --eval "db.getCollectionInfos({name: 'vex_raw'})[0].options.validator" > vex-raw-schema.json

2. Validate Documents in MongoDB Shell

// Connect to your MongoDB instance
mongosh "mongodb://localhost:27017/excititor"

// Get all documents that violate the schema
db.runCommand({
  validate: "vex_raw",
  full: true
})

// Or check individual documents
db.vex_raw.find().forEach(function(doc) {
  var result = db.runCommand({
    validate: "vex_raw",
    documentId: doc._id
  });
  if (!result.valid) {
    print("Invalid: " + doc._id);
  }
});

3. Programmatic Validation (C#)

using StellaOps.Excititor.Storage.Mongo.Validation;

// Validate a single document
var result = VexRawSchemaValidator.Validate(document);
if (!result.IsValid)
{
    foreach (var violation in result.Violations)
    {
        Console.WriteLine($"{violation.Field}: {violation.Message}");
    }
}

// Batch validation
var batchResult = VexRawSchemaValidator.ValidateBatch(documents);
Console.WriteLine($"Valid: {batchResult.ValidCount}, Invalid: {batchResult.InvalidCount}");

4. Export Schema for External Tools

// Get schema as JSON for external validation tools
var schemaJson = VexRawSchemaValidator.GetJsonSchemaAsJson();
File.WriteAllText("vex-raw-schema.json", schemaJson);

Verification Checklist

Use this checklist to verify schema compliance:

  • All documents have required fields (_id, providerId, format, sourceUri, retrievedAt, digest)
  • The _id matches the digest value (content-addressed)
  • Format is one of: csaf, cyclonedx, openvex
  • Digest is at least 32 characters (SHA-256 hex)
  • No documents have been modified after insertion (verify via digest recomputation)

Immutability Verification

To verify documents haven't been tampered with:

// MongoDB shell - verify content matches digest
db.vex_raw.find().forEach(function(doc) {
  var content = doc.content;
  if (content) {
    // Compute SHA-256 of content
    var computedDigest = hex_md5(content); // Use appropriate hash function
    if (computedDigest !== doc.digest) {
      print("TAMPERED: " + doc._id);
    }
  }
});

Auditing

For compliance auditing, export a validation report:

# Generate validation report
stellaops excititor validate --collection vex_raw --report validation-report.json

# The report includes:
# - Total document count
# - Valid/invalid counts
# - List of violations by document
# - Schema version used for validation

Troubleshooting

Common Violations

  1. Missing required field: Ensure all required fields are present
  2. Invalid format: Format must be exactly "csaf", "cyclonedx", or "openvex"
  3. Digest too short: Digest must be at least 32 hex characters
  4. Wrong type: Check field types match schema requirements

Recovery

If invalid documents are found:

  1. Do NOT modify documents in place (violates immutability)
  2. Export the invalid documents for analysis
  3. Re-ingest from original sources with correct data
  4. Document the incident in audit logs