# VEX Raw Schema Validation - Offline Kit This document describes how operators can validate the integrity of VEX raw evidence stored in MongoDB, ensuring that Excititor stores only immutable, content-addressed documents. ## Overview The `vex_raw` collection stores raw VEX documents with content-addressed storage (documents are keyed by their cryptographic hash). This ensures immutability - documents cannot be modified after insertion without changing their key. ## Schema Definition The MongoDB JSON Schema enforces the following structure: ```json { "$jsonSchema": { "bsonType": "object", "title": "VEX Raw Document Schema", "description": "Schema for immutable VEX evidence storage", "required": ["_id", "providerId", "format", "sourceUri", "retrievedAt", "digest"], "properties": { "_id": { "bsonType": "string", "description": "Content digest serving as immutable key" }, "providerId": { "bsonType": "string", "minLength": 1, "description": "VEX provider identifier" }, "format": { "bsonType": "string", "enum": ["csaf", "cyclonedx", "openvex"], "description": "VEX document format" }, "sourceUri": { "bsonType": "string", "minLength": 1, "description": "Original source URI" }, "retrievedAt": { "bsonType": "date", "description": "Timestamp when document was fetched" }, "digest": { "bsonType": "string", "minLength": 32, "description": "Content hash (SHA-256 hex)" }, "content": { "bsonType": ["binData", "string"], "description": "Raw document content" }, "gridFsObjectId": { "bsonType": ["objectId", "null", "string"], "description": "GridFS reference for large documents" }, "metadata": { "bsonType": "object", "description": "Provider-specific metadata" } } } } ``` ## Offline Validation Steps ### 1. Export the Schema The schema can be exported from the application using the validator tooling: ```bash # Using the Excititor CLI stellaops excititor schema export --collection vex_raw --output vex-raw-schema.json # Or via MongoDB shell mongosh --eval "db.getCollectionInfos({name: 'vex_raw'})[0].options.validator" > vex-raw-schema.json ``` ### 2. Validate Documents in MongoDB Shell ```javascript // Connect to your MongoDB instance mongosh "mongodb://localhost:27017/excititor" // Get all documents that violate the schema db.runCommand({ validate: "vex_raw", full: true }) // Or check individual documents db.vex_raw.find().forEach(function(doc) { var result = db.runCommand({ validate: "vex_raw", documentId: doc._id }); if (!result.valid) { print("Invalid: " + doc._id); } }); ``` ### 3. Programmatic Validation (C#) ```csharp using StellaOps.Excititor.Storage.Mongo.Validation; // Validate a single document var result = VexRawSchemaValidator.Validate(document); if (!result.IsValid) { foreach (var violation in result.Violations) { Console.WriteLine($"{violation.Field}: {violation.Message}"); } } // Batch validation var batchResult = VexRawSchemaValidator.ValidateBatch(documents); Console.WriteLine($"Valid: {batchResult.ValidCount}, Invalid: {batchResult.InvalidCount}"); ``` ### 4. Export Schema for External Tools ```csharp // Get schema as JSON for external validation tools var schemaJson = VexRawSchemaValidator.GetJsonSchemaAsJson(); File.WriteAllText("vex-raw-schema.json", schemaJson); ``` ## Verification Checklist Use this checklist to verify schema compliance: - [ ] All documents have required fields (_id, providerId, format, sourceUri, retrievedAt, digest) - [ ] The `_id` matches the `digest` value (content-addressed) - [ ] Format is one of: csaf, cyclonedx, openvex - [ ] Digest is at least 32 characters (SHA-256 hex) - [ ] No documents have been modified after insertion (verify via digest recomputation) ## Immutability Verification To verify documents haven't been tampered with: ```javascript // MongoDB shell - verify content matches digest db.vex_raw.find().forEach(function(doc) { var content = doc.content; if (content) { // Compute SHA-256 of content var computedDigest = hex_md5(content); // Use appropriate hash function if (computedDigest !== doc.digest) { print("TAMPERED: " + doc._id); } } }); ``` ## Auditing For compliance auditing, export a validation report: ```bash # Generate validation report stellaops excititor validate --collection vex_raw --report validation-report.json # The report includes: # - Total document count # - Valid/invalid counts # - List of violations by document # - Schema version used for validation ``` ## Troubleshooting ### Common Violations 1. **Missing required field**: Ensure all required fields are present 2. **Invalid format**: Format must be exactly "csaf", "cyclonedx", or "openvex" 3. **Digest too short**: Digest must be at least 32 hex characters 4. **Wrong type**: Check field types match schema requirements ### Recovery If invalid documents are found: 1. Do NOT modify documents in place (violates immutability) 2. Export the invalid documents for analysis 3. Re-ingest from original sources with correct data 4. Document the incident in audit logs ## Related Documentation - [Excititor Architecture](../modules/excititor/architecture.md) - [VEX Storage Design](../modules/excititor/storage.md) - [Offline Operation Guide](../24_OFFLINE_KIT.md)