5.4 KiB
5.4 KiB
VEX Raw Schema Validation - Offline Kit
This document describes how operators can validate the integrity of VEX raw evidence stored in MongoDB, ensuring that Excititor stores only immutable, content-addressed documents.
Overview
The vex_raw collection stores raw VEX documents with content-addressed storage (documents are keyed by their cryptographic hash). This ensures immutability - documents cannot be modified after insertion without changing their key.
Schema Definition
The MongoDB JSON Schema enforces the following structure:
{
"$jsonSchema": {
"bsonType": "object",
"title": "VEX Raw Document Schema",
"description": "Schema for immutable VEX evidence storage",
"required": ["_id", "providerId", "format", "sourceUri", "retrievedAt", "digest"],
"properties": {
"_id": {
"bsonType": "string",
"description": "Content digest serving as immutable key"
},
"providerId": {
"bsonType": "string",
"minLength": 1,
"description": "VEX provider identifier"
},
"format": {
"bsonType": "string",
"enum": ["csaf", "cyclonedx", "openvex"],
"description": "VEX document format"
},
"sourceUri": {
"bsonType": "string",
"minLength": 1,
"description": "Original source URI"
},
"retrievedAt": {
"bsonType": "date",
"description": "Timestamp when document was fetched"
},
"digest": {
"bsonType": "string",
"minLength": 32,
"description": "Content hash (SHA-256 hex)"
},
"content": {
"bsonType": ["binData", "string"],
"description": "Raw document content"
},
"gridFsObjectId": {
"bsonType": ["objectId", "null", "string"],
"description": "GridFS reference for large documents"
},
"metadata": {
"bsonType": "object",
"description": "Provider-specific metadata"
}
}
}
}
Offline Validation Steps
1. Export the Schema
The schema can be exported from the application using the validator tooling:
# Using the Excititor CLI
stellaops excititor schema export --collection vex_raw --output vex-raw-schema.json
# Or via MongoDB shell
mongosh --eval "db.getCollectionInfos({name: 'vex_raw'})[0].options.validator" > vex-raw-schema.json
2. Validate Documents in MongoDB Shell
// Connect to your MongoDB instance
mongosh "mongodb://localhost:27017/excititor"
// Get all documents that violate the schema
db.runCommand({
validate: "vex_raw",
full: true
})
// Or check individual documents
db.vex_raw.find().forEach(function(doc) {
var result = db.runCommand({
validate: "vex_raw",
documentId: doc._id
});
if (!result.valid) {
print("Invalid: " + doc._id);
}
});
3. Programmatic Validation (C#)
using StellaOps.Excititor.Storage.Mongo.Validation;
// Validate a single document
var result = VexRawSchemaValidator.Validate(document);
if (!result.IsValid)
{
foreach (var violation in result.Violations)
{
Console.WriteLine($"{violation.Field}: {violation.Message}");
}
}
// Batch validation
var batchResult = VexRawSchemaValidator.ValidateBatch(documents);
Console.WriteLine($"Valid: {batchResult.ValidCount}, Invalid: {batchResult.InvalidCount}");
4. Export Schema for External Tools
// Get schema as JSON for external validation tools
var schemaJson = VexRawSchemaValidator.GetJsonSchemaAsJson();
File.WriteAllText("vex-raw-schema.json", schemaJson);
Verification Checklist
Use this checklist to verify schema compliance:
- All documents have required fields (_id, providerId, format, sourceUri, retrievedAt, digest)
- The
_idmatches thedigestvalue (content-addressed) - Format is one of: csaf, cyclonedx, openvex
- Digest is at least 32 characters (SHA-256 hex)
- No documents have been modified after insertion (verify via digest recomputation)
Immutability Verification
To verify documents haven't been tampered with:
// MongoDB shell - verify content matches digest
db.vex_raw.find().forEach(function(doc) {
var content = doc.content;
if (content) {
// Compute SHA-256 of content
var computedDigest = hex_md5(content); // Use appropriate hash function
if (computedDigest !== doc.digest) {
print("TAMPERED: " + doc._id);
}
}
});
Auditing
For compliance auditing, export a validation report:
# Generate validation report
stellaops excititor validate --collection vex_raw --report validation-report.json
# The report includes:
# - Total document count
# - Valid/invalid counts
# - List of violations by document
# - Schema version used for validation
Troubleshooting
Common Violations
- Missing required field: Ensure all required fields are present
- Invalid format: Format must be exactly "csaf", "cyclonedx", or "openvex"
- Digest too short: Digest must be at least 32 hex characters
- Wrong type: Check field types match schema requirements
Recovery
If invalid documents are found:
- Do NOT modify documents in place (violates immutability)
- Export the invalid documents for analysis
- Re-ingest from original sources with correct data
- Document the incident in audit logs