This commit is contained in:
525
docs/dev/scanning-engine.md
Normal file
525
docs/dev/scanning-engine.md
Normal file
@@ -0,0 +1,525 @@
|
||||
## 0) Scope at a glance
|
||||
|
||||
**Scan surfaces**
|
||||
|
||||
* **Images (static):** every file in every layer, plus Dockerfile metadata (ENV/ARG/LABEL, history).
|
||||
* **Runtime (live containers):** env vars, process args, mounted volumes (e.g., `/run/secrets`), logs, selected files created at runtime.
|
||||
|
||||
**Detection methods**
|
||||
|
||||
1. **Deterministic patterns (regex)** for known secret types.
|
||||
2. **Heuristics**: entropy scoring for unknown/random secrets.
|
||||
3. **Contextual signals**: filename/path, key names, nearby keywords, file type hints.
|
||||
4. **Structural checks**: e.g., JWT decodable, cloud key prefix/length.
|
||||
5. **(Optional) Lightweight validation**: local checksum/format (no network calls by default).
|
||||
|
||||
**Reporting**
|
||||
|
||||
* JSON (and optionally SARIF) with: *where*, *what rule matched*, *snippet masked*, *confidence*, *severity*, *layer/container process*, and *remediation hint*.
|
||||
|
||||
---
|
||||
|
||||
## 1) Docker‑aware discovery workflow
|
||||
|
||||
### A. Images (static, pre‑runtime)
|
||||
|
||||
1. **Obtain filesystem + metadata**
|
||||
|
||||
* Prefer **API**: Docker Engine (Docker.DotNet) to `Images.GetImageAsync` and **export/tar** (`docker save`) in memory.
|
||||
* Parse `manifest.json` + `config.json`; capture:
|
||||
|
||||
* `config.Env` (final env),
|
||||
* **history**/`created_by` for `ENV`/`ARG`/`RUN` strings,
|
||||
* labels.
|
||||
2. **Scan every layer**
|
||||
|
||||
* Stream‑extract each layer tar (e.g., SharpCompress).
|
||||
* Track **added/modified paths** per layer (so you can report: *layer N, file X*).
|
||||
* **Text‑only filter**: skip clearly binary files (e.g., sample N bytes; if >30% non‑printables, skip or downrank).
|
||||
3. **File content & name/path analysis**
|
||||
|
||||
* Apply **regex detectors** (Section 3) and **entropy** (Section 4).
|
||||
* Weigh findings with **context** (Section 5).
|
||||
4. **Dockerfile/History checks**
|
||||
|
||||
* Flag secrets in `ENV`/`ARG`/`RUN` strings (e.g., `ENV MYSQL_ROOT_PASSWORD=...`).
|
||||
* Flag **deleted‑later files** that were present in earlier layers (common leak).
|
||||
* Highlight missing `.dockerignore` patterns when suspicious files (.env, .pem, .tfstate) entered any layer.
|
||||
|
||||
### B. Running containers (runtime)
|
||||
|
||||
1. **Enumerate** containers and **inspect**:
|
||||
|
||||
* `InspectContainerAsync` → `Config.Env`, `HostConfig.Binds`, `Mounts`, image id.
|
||||
2. **Env var scan**
|
||||
|
||||
* Scan all `key=value` pairs with the same detectors (regex + entropy + context on the key name).
|
||||
3. **Process args**
|
||||
|
||||
* `docker top` or `/proc/<pid>/cmdline` via `Exec` → scan args for `--password=...`, `--api-key=...`.
|
||||
4. **Mounted secret paths**
|
||||
|
||||
* Default locations: `/run/secrets/*`, `/var/run/secrets/*`, K8s secret volumes, config maps that may contain creds.
|
||||
* Retrieve via `GetArchiveFromContainerAsync` and scan.
|
||||
5. **Logs (optional but valuable)**
|
||||
|
||||
* Attach/stream logs; scan lines for secret patterns; provide **live redaction** option.
|
||||
|
||||
> **Note**: Memory forensics is possible but heavy; treat as optional/IR-only.
|
||||
|
||||
---
|
||||
|
||||
## 2) High‑value filename/path heuristics (fast wins)
|
||||
|
||||
Run these **glob/name** checks before content scanning to prioritize files:
|
||||
|
||||
**Generic secret indicators**
|
||||
|
||||
```
|
||||
**/*.env **/.env* **/*secret*.* **/*secr*.*
|
||||
**/*credential*.* **/*creds*.* **/*passwd*
|
||||
**/password* **/*token*.* **/*apikey*.* **/*api_key*.*
|
||||
**/*.pem **/*.key **/*.pfx **/*.p12
|
||||
**/*.jks **/*.keystore **/id_rsa **/id_dsa
|
||||
**/id_ecdsa **/id_ed25519 **/private.pem **/server.key
|
||||
**/tls.key **/jwt*.key
|
||||
```
|
||||
|
||||
**Common app/config**
|
||||
|
||||
```
|
||||
**/appsettings*.json **/secrets*.json
|
||||
**/application.{yml,yaml,properties}
|
||||
**/application-*.{yml,yaml,properties}
|
||||
**/config.yaml **/settings.yml **/settings.py
|
||||
**/wp-config.php **/config.php **/settings.php
|
||||
**/nuget.config **/settings.xml (Maven) **/gradle.properties
|
||||
**/docker-compose*.yml **/compose*.yml
|
||||
**/PublishProfiles/*.pubxml
|
||||
```
|
||||
|
||||
**Cloud/CLI creds**
|
||||
|
||||
```
|
||||
**/.aws/credentials **/.aws/config
|
||||
**/gcloud/application_default_credentials.json
|
||||
**/.azure/** **/doctl/config.yaml **/.oci/config
|
||||
**/.docker/config.json **/.dockercfg
|
||||
**/.npmrc **/.yarnrc **/.pypirc **/.gem/credentials **/.netrc
|
||||
```
|
||||
|
||||
**Infra/IaC**
|
||||
|
||||
```
|
||||
**/*.tfstate **/*.tfvars* **/kube/config **/.kube/config **/*kubeconfig*
|
||||
**/service-account*.json **/*-sa.json **/*-key.json
|
||||
```
|
||||
|
||||
**Orchestrator runtime**
|
||||
|
||||
```
|
||||
/run/secrets/* /var/run/secrets/*
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3) **Regex detector catalog** (battle‑tested patterns)
|
||||
|
||||
> Use `RegexOptions.Compiled | RegexOptions.IgnoreCase` (case‑sensitive where needed).
|
||||
> Always **mask** values in reports (e.g., show first 4 + last 4 chars).
|
||||
|
||||
### 3.1 Private keys / certificates
|
||||
|
||||
* **OpenSSH private key**
|
||||
`@"-----BEGIN OPENSSH PRIVATE KEY-----"`
|
||||
* **Generic PEM private key**
|
||||
`@"-----BEGIN (?:RSA |DSA |EC |PGP )?PRIVATE KEY-----"`
|
||||
* **PGP private key**
|
||||
`@"-----BEGIN PGP PRIVATE KEY BLOCK-----"`
|
||||
|
||||
> (Public keys/certificates are *not* secrets: `BEGIN PUBLIC KEY`, `BEGIN CERTIFICATE` → downrank/ignore.)
|
||||
|
||||
### 3.2 Cloud: AWS
|
||||
|
||||
* **Access Key ID**
|
||||
`@"\b(?:AKIA|ASIA|AGPA|AIDA|AROA|AIPA|ANPA)[A-Z0-9]{16}\b"`
|
||||
* **Secret Access Key (context‑aided)**
|
||||
`@"\b[A-Za-z0-9/\+=]{40}\b"`
|
||||
*Boost only if near `aws|secret|access[_-]?key|AWS_SECRET_ACCESS_KEY` within ~50 chars.*
|
||||
* **Credentials file lines**
|
||||
|
||||
* `@"aws_access_key_id\s*=\s*[A-Z0-9]{20}"`
|
||||
* `@"aws_secret_access_key\s*=\s*[A-Za-z0-9/\+=]{40}"`
|
||||
|
||||
### 3.3 Cloud: GCP / Google
|
||||
|
||||
* **API key**
|
||||
`@"\bAIza[0-9A-Za-z\-_]{35}\b"`
|
||||
* **Service Account JSON** (two‑term signature)
|
||||
|
||||
* `@"""type""\s*:\s*""service_account"""`
|
||||
* `@"""private_key""\s*:\s*""-----BEGIN PRIVATE KEY-----"`
|
||||
|
||||
### 3.4 Cloud: Azure
|
||||
|
||||
* **Storage connection string**
|
||||
`@"DefaultEndpointsProtocol=https;AccountName=[^;]+;AccountKey=[A-Za-z0-9\+/=]{88};EndpointSuffix=core\.windows\.net"`
|
||||
* **SAS token (simplified)**
|
||||
`@"\bsv=\d{4}-\d{2}-\d{2}[^ ]*?&sig=[A-Za-z0-9%/\+=]{40,}\b"`
|
||||
|
||||
### 3.5 Dev platforms / SCM
|
||||
|
||||
* **GitHub PAT**
|
||||
`@"\bgh[prusoa]_[A-Za-z0-9]{36}\b"`
|
||||
* **GitLab PAT**
|
||||
`@"\bglpat-[A-Za-z0-9\-_]{20,}\b"`
|
||||
* **NPM token**
|
||||
|
||||
* in `.npmrc`: `@"//registry\.npmjs\.org/:_authToken=\s*(npm_[A-Za-z0-9]{36})"`
|
||||
* raw form: `@"\bnpm_[A-Za-z0-9]{36}\b"`
|
||||
* **PyPI token**
|
||||
`@"\bpypi-AgEIcHlwaS5vcmc[A-Za-z0-9\-_]{50,}\b"`
|
||||
|
||||
### 3.6 Messaging / SaaS
|
||||
|
||||
* **Slack tokens (broad)**
|
||||
`@"\bxox[a-z]-[A-Za-z0-9-]{8,}-[A-Za-z0-9-]{8,}-[A-Za-z0-9-]{8,}(?:-[A-Za-z0-9-]{8,})?\b"`
|
||||
* **Stripe**
|
||||
`@"\bsk_(?:live|test)_[0-9a-zA-Z]{24}\b"`
|
||||
* **SendGrid**
|
||||
`@"\bSG\.[A-Za-z0-9\-_]{16,32}\.[A-Za-z0-9\-_]{16,64}\b"`
|
||||
* **Mailgun**
|
||||
`@"\bkey-[0-9a-zA-Z]{32}\b"`
|
||||
* **Twilio**
|
||||
|
||||
* SID: `@"\bAC[0-9a-f]{32}\b"`
|
||||
* Auth token (context aided): `@"\b[0-9a-f]{32}\b"` near `twilio|auth[_-]?token`
|
||||
* **Discord bot**
|
||||
`@"\b[A-Za-z\d]{24}\.[A-Za-z\d\-_]{6}\.[A-Za-z\d\-_]{27}\b"`
|
||||
|
||||
### 3.7 Database / service connection strings
|
||||
|
||||
* **PostgreSQL**
|
||||
`@"\bpostgres(?:ql)?://[^:\s]+:[^@\s]+@[^/\s]+"`
|
||||
* **MySQL**
|
||||
`@"\bmysql://[^:\s]+:[^@\s]+@[^/\s]+"`
|
||||
* **MongoDB**
|
||||
`@"\bmongodb(?:\+srv)?://[^:\s]+:[^@\s]+@[^/\s]+"`
|
||||
* **SQL Server (ADO.NET)**
|
||||
`@"\bData Source=[^;]+;Initial Catalog=[^;]+;User ID=[^;]+;Password=[^;]+;"`
|
||||
* **Redis**
|
||||
`@"\bredis(?:\+ssl)?://(?::[^@]+@)?[^/\s]+"`
|
||||
* **Basic auth in URL (generic)**
|
||||
`@"[a-zA-Z][a-zA-Z0-9+\-.]*://[^:/\s]+:[^@/\s]+@[^/\s]+"`
|
||||
|
||||
### 3.8 Docker / CLI auth artifacts
|
||||
|
||||
* **Docker config.json auth**
|
||||
`@"""auth""\s*:\s*""[A-Za-z0-9\+/=]{20,}"""`
|
||||
* **.netrc auth**
|
||||
`@"(?mi)^machine\s+\S+\s+login\s+\S+\s+password\s+\S+"`
|
||||
|
||||
### 3.9 Tokens / JWT
|
||||
|
||||
* **JWT (structural)**
|
||||
`@"\beyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\b"`
|
||||
|
||||
### 3.10 Build tools / package managers
|
||||
|
||||
* **NuGet (cleartext)**
|
||||
`@"<add\s+key=""ClearTextPassword""\s+value=""[^""]+"""`
|
||||
`@"<add\s+key=""Password""\s+value=""[^""]+"""` *(base64 ‑ still secret)*
|
||||
* **Maven settings.xml**
|
||||
`@"<server>\s*<id>[^<]+</id>\s*<username>[^<]+</username>\s*<password>[^<]+</password>"`
|
||||
* **Gradle**
|
||||
`@"(?i)\bsigning\.password\s*=\s*.+"`
|
||||
|
||||
> Keep regexes modular; associate each with:
|
||||
> `{ Id, Name, Pattern, Severity, Examples, RecommendedRemediation }`.
|
||||
|
||||
---
|
||||
|
||||
## 4) Entropy detector (catches “unknown” secrets)
|
||||
|
||||
**Why:** Many org‑specific tokens won’t match known regexes.
|
||||
|
||||
**Implementation**
|
||||
|
||||
* Extract candidate tokens by character class:
|
||||
|
||||
* base64/base64url: `[A-Za-z0-9/_\-\+=]{20,}`
|
||||
* hex: `[A-Fa-f0-9]{32,}`
|
||||
* general mixed: `[A-Za-z0-9]{24,}`
|
||||
* Compute **Shannon entropy** per candidate. Use **alphabet‑aware thresholds**:
|
||||
|
||||
* **base64/url**: ≥ **4.0** bits/char & length ≥ 24
|
||||
* **hex**: ≥ **3.0** bits/char & length ≥ 32
|
||||
* **alnum**: ≥ **4.0** bits/char & length ≥ 24
|
||||
* **Context boosts** (raise confidence) if **within 64 chars** of:
|
||||
`password|passwd|pwd|secret|token|apikey|api_key|api-key|client[_-]?secret|private[_-]?key|connectionstring|conn[_-]?str|bearer`
|
||||
* **Context suppressors** (lower confidence/ignore):
|
||||
|
||||
* File/path contains: `example|sample|test|fixture|dummy`
|
||||
* Surrounding line contains: `REDACTED|<redacted>|changeme`
|
||||
* Known non‑secret blocks: `BEGIN PUBLIC KEY`, `BEGIN CERTIFICATE`
|
||||
* Cap **N findings per file** (e.g., 50) to avoid log floods.
|
||||
|
||||
---
|
||||
|
||||
## 5) Scoring & de‑duping
|
||||
|
||||
Combine signals into a **confidence score**:
|
||||
|
||||
* +0.9 Regex “hard” match (e.g., OpenSSH private key)
|
||||
* +0.7 Regex “soft” match (e.g., AWS secret 40‑char near keyword)
|
||||
* +0.4 Entropy pass
|
||||
* +0.2 Suspicious filename/path
|
||||
* –0.5 Suppressor keyword/file
|
||||
* +0.2 Structural check passes (e.g., JWT decodes)
|
||||
|
||||
**Severity**
|
||||
|
||||
* **Critical**: private keys, cloud root creds, Docker auth, DB creds in URLs, verified JWT signing keys.
|
||||
* **High**: API tokens (GitHub/GitLab/Slack/Stripe), secrets in ENV/ARG history.
|
||||
* **Medium**: high‑entropy candidates with strong context.
|
||||
* **Low**: weak context/entropy only, or likely sample values.
|
||||
|
||||
**De‑dupe** same value across files/layers/envs; keep a single canonical record with **occurrence list**.
|
||||
|
||||
---
|
||||
|
||||
## 6) Docker‑specific checks you must implement
|
||||
|
||||
* **ENV/ARG leakage in history**
|
||||
Parse `config.History[].CreatedBy` or `docker history --no-trunc`.
|
||||
Flag any `ENV/ARG` with suspicious key names or values matching detectors.
|
||||
* **Deleted‑later files**
|
||||
If a file existed in an earlier layer and got deleted later (common `.env` mishap), still flag it and report **layer** + **instruction** that introduced it.
|
||||
* **`.dockerignore` advisory**
|
||||
If high‑risk files (.env, .pem, .tfstate, credentials) entered the build context once, suggest `.dockerignore` entries.
|
||||
|
||||
---
|
||||
|
||||
## 7) Runtime inspection rules
|
||||
|
||||
* **Environment**
|
||||
|
||||
* Scan all `Env` pairs; **boost** hits for keys containing:
|
||||
`PASSWORD|PASS|PWD|SECRET|TOKEN|KEY|CLIENT_SECRET|SAS|CONNECTIONSTRING`
|
||||
* **Process args**
|
||||
|
||||
* Flag `--password`, `--api-key`, `--token`, `--secret`, `--connection-string`.
|
||||
* **Mounted secrets**
|
||||
|
||||
* Enumerate `/run/secrets/*`, `/var/run/secrets/*` (Swarm/K8s).
|
||||
* Ensure permissions are restrictive; still **scan contents** (apps sometimes copy them elsewhere).
|
||||
* **Logs**
|
||||
|
||||
* Tail & scan. Provide **optional redaction** pipeline.
|
||||
|
||||
---
|
||||
|
||||
## 8) Reporting format (JSON)
|
||||
|
||||
Example JSON for one finding:
|
||||
|
||||
```json
|
||||
{
|
||||
"detectorId": "aws.accessKeyId",
|
||||
"name": "AWS Access Key ID",
|
||||
"severity": "HIGH",
|
||||
"confidence": 0.92,
|
||||
"valueSample": "AKIA************WXYZ",
|
||||
"locations": [
|
||||
{
|
||||
"type": "image-layer-file",
|
||||
"image": "repo/app:1.4.2",
|
||||
"layerDigest": "sha256:...abc",
|
||||
"path": "/app/.env",
|
||||
"line": 12
|
||||
},
|
||||
{
|
||||
"type": "container-env",
|
||||
"containerId": "f3e9d...",
|
||||
"envKey": "AWS_ACCESS_KEY_ID"
|
||||
}
|
||||
],
|
||||
"context": {
|
||||
"filePathScore": 0.2,
|
||||
"regexMatch": true,
|
||||
"entropy": null,
|
||||
"nearbyKeywords": ["AWS_ACCESS_KEY_ID"]
|
||||
},
|
||||
"remediation": "Remove from image; inject via secrets manager or runtime mount; rotate the key."
|
||||
}
|
||||
```
|
||||
|
||||
> Optionally also emit **SARIF** to plug into code‑scanning dashboards.
|
||||
|
||||
---
|
||||
|
||||
## 9) C# implementation sketch
|
||||
|
||||
### Project layout
|
||||
|
||||
```
|
||||
SecretsScanner/
|
||||
Core/
|
||||
IDetector.cs // interface: Detect(stream|text, path, context) -> Findings
|
||||
RegexDetector.cs // holds Pattern, Hints, Confidence rules
|
||||
EntropyDetector.cs // Shannon entropy
|
||||
JwtDetector.cs // structural decoding check
|
||||
FileClassifier.cs // text/binary check, ext-based hints
|
||||
Scoring.cs // combine signals; severity
|
||||
PathsHeuristics.cs // globs & filename rules
|
||||
ReportModel.cs // JSON schema / SARIF
|
||||
Docker/
|
||||
ImageReader.cs // reads image tars, layers via Docker.DotNet or stream
|
||||
HistoryParser.cs // extracts ENV/ARG from history
|
||||
ContainerInspector.cs // env, args, mounts, logs (Docker.DotNet)
|
||||
Catalog/
|
||||
RegexCatalog.cs // patterns (section 3), per-detector metadata
|
||||
Keywords.cs // boost/suppress lists
|
||||
Cli/
|
||||
Program.cs // options: image, container, path; json output; fail-on
|
||||
```
|
||||
|
||||
### C# snippets (illustrative)
|
||||
|
||||
**Regex catalog**
|
||||
|
||||
```csharp
|
||||
public static class RegexCatalog
|
||||
{
|
||||
public static readonly (string Id, string Name, Regex Rx, string Severity, string Hint)[] Rules =
|
||||
{
|
||||
("pem.openssh", "OpenSSH Private Key",
|
||||
new Regex(@"-----BEGIN OPENSSH PRIVATE KEY-----", RegexOptions.Compiled),
|
||||
"CRITICAL", "Remove private keys from images; use mounts or vault."),
|
||||
("pem.private", "PEM Private Key",
|
||||
new Regex(@"-----BEGIN (?:RSA |DSA |EC |PGP )?PRIVATE KEY-----", RegexOptions.Compiled),
|
||||
"CRITICAL", "Remove private keys; rotate credentials."),
|
||||
("aws.akid", "AWS Access Key ID",
|
||||
new Regex(@"\b(?:AKIA|ASIA|AGPA|AIDA|AROA|AIPA|ANPA)[A-Z0-9]{16}\b", RegexOptions.Compiled),
|
||||
"HIGH", "Rotate; use IAM roles/STS; remove from code/config."),
|
||||
("github.pat", "GitHub Personal Access Token",
|
||||
new Regex(@"\bgh[prusoa]_[A-Za-z0-9]{36}\b", RegexOptions.Compiled),
|
||||
"HIGH", "Revoke PAT; use fine-grained tokens; remove from image."),
|
||||
// ... add remaining patterns from Section 3
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
**Entropy**
|
||||
|
||||
```csharp
|
||||
public static class Entropy
|
||||
{
|
||||
public static double Shannon(ReadOnlySpan<char> s, ReadOnlySpan<char> alphabet)
|
||||
{
|
||||
Span<int> counts = stackalloc int[256];
|
||||
int n = 0;
|
||||
foreach (var ch in s)
|
||||
{
|
||||
if (alphabet.IndexOf(ch) >= 0) { counts[ch]++; n++; }
|
||||
}
|
||||
if (n == 0) return 0.0;
|
||||
double H = 0.0;
|
||||
for (int i = 0; i < counts.Length; i++)
|
||||
{
|
||||
if (counts[i] == 0) continue;
|
||||
double p = counts[i] / (double)n;
|
||||
H -= p * Math.Log(p, 2);
|
||||
}
|
||||
return H;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Candidate extraction (simplified)**
|
||||
|
||||
```csharp
|
||||
static readonly Regex Base64Token = new(@"[A-Za-z0-9/_\-\+=]{20,}", RegexOptions.Compiled);
|
||||
static readonly Regex HexToken = new(@"[A-Fa-f0-9]{32,}", RegexOptions.Compiled);
|
||||
|
||||
IEnumerable<Candidate> ExtractCandidates(string line)
|
||||
{
|
||||
foreach (Match m in Base64Token.Matches(line)) yield return new Candidate(m.Value, "b64", line);
|
||||
foreach (Match m in HexToken.Matches(line)) yield return new Candidate(m.Value, "hex", line);
|
||||
}
|
||||
```
|
||||
|
||||
**Scoring**
|
||||
|
||||
```csharp
|
||||
double Score(DetectionSignals s)
|
||||
{
|
||||
double score = 0;
|
||||
if (s.RegexHard) score += 0.9;
|
||||
if (s.RegexSoft) score += 0.7;
|
||||
if (s.EntropyHit) score += 0.4;
|
||||
if (s.SuspiciousPath) score += 0.2;
|
||||
if (s.StructuralOk) score += 0.2;
|
||||
if (s.Suppressor) score -= 0.5;
|
||||
return Math.Clamp(score, 0, 1);
|
||||
}
|
||||
```
|
||||
|
||||
**Docker (Docker.DotNet)**
|
||||
|
||||
* Images: `IImageOperations.GetImageHistoryAsync`, `Images.GetImageAsync` + tar unpack.
|
||||
* Containers: `Containers.InspectContainerAsync`, `Exec.ExecCreateContainerAsync` + `ExecStart`, `GetArchiveFromContainerAsync`, `Logs.GetContainerLogsAsync`.
|
||||
|
||||
---
|
||||
|
||||
## 10) False‑positive control & hygiene
|
||||
|
||||
* **Ignore lists**: file globs (`test/**`, `**/*.example.*`), value lists (`REDACTED`, `example`, `dummy`, `changeme`).
|
||||
* **Public materials**: downrank matches inside `BEGIN PUBLIC KEY`/`BEGIN CERTIFICATE`.
|
||||
* **Thresholds**: tune entropy and minimum lengths to your codebase; keep per‑detector knobs in config.
|
||||
* **Masking**: never print full values; keep secure logs.
|
||||
* **Rate‑limits**: cap per‑file matches; cap per‑container to avoid spam.
|
||||
|
||||
---
|
||||
|
||||
## 11) CI/CD and policy
|
||||
|
||||
* **Build step**: after `docker build`, run image scan; **fail** on High/Critical (configurable).
|
||||
* **Pre‑deploy**: scan runtime env for env/args/mounts (read‑only).
|
||||
* **Baselining**: allow a first pass to **baseline known leftovers**, then block any **new** secrets.
|
||||
* **Rotation**: auto‑emit per‑type remediation (e.g., rotate PAT, revoke AWS AK/SK, move to secret manager).
|
||||
|
||||
---
|
||||
|
||||
## 12) Optional enhancements
|
||||
|
||||
* **SBOM‑guided scanning**: use SBOM/file inventory to prioritize text/config assets; cache base layers.
|
||||
* **JWT structural checks**: base64url‑decode header/payload; verify JSON; flag if plausible.
|
||||
* **Checksum checks**: Luhn for CCNs (if in scope); simple format checks for cloud tokens.
|
||||
* **Interactive audit**: CLI `--audit` mode to triage and write an “allowlist/baseline”.
|
||||
|
||||
---
|
||||
|
||||
## 13) Minimal “first list” your dev can paste today
|
||||
|
||||
**Start with these detectors (high ROI):**
|
||||
|
||||
* PEM/OPENSSH private keys
|
||||
* AWS AKID + secret (context‑aided)
|
||||
* GitHub PAT, GitLab PAT, NPM, PyPI
|
||||
* Slack, Stripe, SendGrid, Twilio
|
||||
* Docker config `auth` field
|
||||
* DB connection strings (Postgres/MySQL/Mongo/SQLServer)
|
||||
* JWT
|
||||
* `.aws/credentials`, `.npmrc`, `.docker/config.json`, `appsettings*.json`, `.env*`, `*.tfstate`, `*kubeconfig*` (path heuristics)
|
||||
* Entropy (base64/hex/alnum) with context boosts/suppressors
|
||||
|
||||
That set alone catches the overwhelming majority of real‑world leaks.
|
||||
|
||||
---
|
||||
|
||||
### Final note
|
||||
|
||||
This blueprint keeps everything **offline** (no external calls), so it’s safe in CI and reproducible. If you later want to add **credential validation** (e.g., confirm an AWS key via STS), make it opt‑in and heavily rate‑limited.
|
||||
|
||||
If you want, I can package these regexes and the scaffolding into a **starter C# repo** with a CLI (`scan image <ref> | scan container <id> | scan path <dir>`) and JSON output.
|
||||
Reference in New Issue
Block a user