198 lines
5.4 KiB
Markdown
198 lines
5.4 KiB
Markdown
# VEX Raw Schema Validation - Offline Kit
|
|
|
|
This document describes how operators can validate the integrity of VEX raw evidence stored in MongoDB, ensuring that Excititor stores only immutable, content-addressed documents.
|
|
|
|
## Overview
|
|
|
|
The `vex_raw` collection stores raw VEX documents with content-addressed storage (documents are keyed by their cryptographic hash). This ensures immutability - documents cannot be modified after insertion without changing their key.
|
|
|
|
## Schema Definition
|
|
|
|
The MongoDB JSON Schema enforces the following structure:
|
|
|
|
```json
|
|
{
|
|
"$jsonSchema": {
|
|
"bsonType": "object",
|
|
"title": "VEX Raw Document Schema",
|
|
"description": "Schema for immutable VEX evidence storage",
|
|
"required": ["_id", "providerId", "format", "sourceUri", "retrievedAt", "digest"],
|
|
"properties": {
|
|
"_id": {
|
|
"bsonType": "string",
|
|
"description": "Content digest serving as immutable key"
|
|
},
|
|
"providerId": {
|
|
"bsonType": "string",
|
|
"minLength": 1,
|
|
"description": "VEX provider identifier"
|
|
},
|
|
"format": {
|
|
"bsonType": "string",
|
|
"enum": ["csaf", "cyclonedx", "openvex"],
|
|
"description": "VEX document format"
|
|
},
|
|
"sourceUri": {
|
|
"bsonType": "string",
|
|
"minLength": 1,
|
|
"description": "Original source URI"
|
|
},
|
|
"retrievedAt": {
|
|
"bsonType": "date",
|
|
"description": "Timestamp when document was fetched"
|
|
},
|
|
"digest": {
|
|
"bsonType": "string",
|
|
"minLength": 32,
|
|
"description": "Content hash (SHA-256 hex)"
|
|
},
|
|
"content": {
|
|
"bsonType": ["binData", "string"],
|
|
"description": "Raw document content"
|
|
},
|
|
"gridFsObjectId": {
|
|
"bsonType": ["objectId", "null", "string"],
|
|
"description": "GridFS reference for large documents"
|
|
},
|
|
"metadata": {
|
|
"bsonType": "object",
|
|
"description": "Provider-specific metadata"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Offline Validation Steps
|
|
|
|
### 1. Export the Schema
|
|
|
|
The schema can be exported from the application using the validator tooling:
|
|
|
|
```bash
|
|
# Using the Excititor CLI
|
|
stellaops excititor schema export --collection vex_raw --output vex-raw-schema.json
|
|
|
|
# Or via MongoDB shell
|
|
mongosh --eval "db.getCollectionInfos({name: 'vex_raw'})[0].options.validator" > vex-raw-schema.json
|
|
```
|
|
|
|
### 2. Validate Documents in MongoDB Shell
|
|
|
|
```javascript
|
|
// Connect to your MongoDB instance
|
|
mongosh "mongodb://localhost:27017/excititor"
|
|
|
|
// Get all documents that violate the schema
|
|
db.runCommand({
|
|
validate: "vex_raw",
|
|
full: true
|
|
})
|
|
|
|
// Or check individual documents
|
|
db.vex_raw.find().forEach(function(doc) {
|
|
var result = db.runCommand({
|
|
validate: "vex_raw",
|
|
documentId: doc._id
|
|
});
|
|
if (!result.valid) {
|
|
print("Invalid: " + doc._id);
|
|
}
|
|
});
|
|
```
|
|
|
|
### 3. Programmatic Validation (C#)
|
|
|
|
```csharp
|
|
using StellaOps.Excititor.Storage.Mongo.Validation;
|
|
|
|
// Validate a single document
|
|
var result = VexRawSchemaValidator.Validate(document);
|
|
if (!result.IsValid)
|
|
{
|
|
foreach (var violation in result.Violations)
|
|
{
|
|
Console.WriteLine($"{violation.Field}: {violation.Message}");
|
|
}
|
|
}
|
|
|
|
// Batch validation
|
|
var batchResult = VexRawSchemaValidator.ValidateBatch(documents);
|
|
Console.WriteLine($"Valid: {batchResult.ValidCount}, Invalid: {batchResult.InvalidCount}");
|
|
```
|
|
|
|
### 4. Export Schema for External Tools
|
|
|
|
```csharp
|
|
// Get schema as JSON for external validation tools
|
|
var schemaJson = VexRawSchemaValidator.GetJsonSchemaAsJson();
|
|
File.WriteAllText("vex-raw-schema.json", schemaJson);
|
|
```
|
|
|
|
## Verification Checklist
|
|
|
|
Use this checklist to verify schema compliance:
|
|
|
|
- [ ] All documents have required fields (_id, providerId, format, sourceUri, retrievedAt, digest)
|
|
- [ ] The `_id` matches the `digest` value (content-addressed)
|
|
- [ ] Format is one of: csaf, cyclonedx, openvex
|
|
- [ ] Digest is at least 32 characters (SHA-256 hex)
|
|
- [ ] No documents have been modified after insertion (verify via digest recomputation)
|
|
|
|
## Immutability Verification
|
|
|
|
To verify documents haven't been tampered with:
|
|
|
|
```javascript
|
|
// MongoDB shell - verify content matches digest
|
|
db.vex_raw.find().forEach(function(doc) {
|
|
var content = doc.content;
|
|
if (content) {
|
|
// Compute SHA-256 of content
|
|
var computedDigest = hex_md5(content); // Use appropriate hash function
|
|
if (computedDigest !== doc.digest) {
|
|
print("TAMPERED: " + doc._id);
|
|
}
|
|
}
|
|
});
|
|
```
|
|
|
|
## Auditing
|
|
|
|
For compliance auditing, export a validation report:
|
|
|
|
```bash
|
|
# Generate validation report
|
|
stellaops excititor validate --collection vex_raw --report validation-report.json
|
|
|
|
# The report includes:
|
|
# - Total document count
|
|
# - Valid/invalid counts
|
|
# - List of violations by document
|
|
# - Schema version used for validation
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Violations
|
|
|
|
1. **Missing required field**: Ensure all required fields are present
|
|
2. **Invalid format**: Format must be exactly "csaf", "cyclonedx", or "openvex"
|
|
3. **Digest too short**: Digest must be at least 32 hex characters
|
|
4. **Wrong type**: Check field types match schema requirements
|
|
|
|
### Recovery
|
|
|
|
If invalid documents are found:
|
|
|
|
1. Do NOT modify documents in place (violates immutability)
|
|
2. Export the invalid documents for analysis
|
|
3. Re-ingest from original sources with correct data
|
|
4. Document the incident in audit logs
|
|
|
|
## Related Documentation
|
|
|
|
- [Excititor Architecture](../modules/excititor/architecture.md)
|
|
- [VEX Storage Design](../modules/excititor/storage.md)
|
|
- [Offline Operation Guide](../24_OFFLINE_KIT.md)
|