18 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	0) Scope at a glance
Scan surfaces
- Images (static): every file in every layer, plus Dockerfile metadata (ENV/ARG/LABEL, history).
 - Runtime (live containers): env vars, process args, mounted volumes (e.g., 
/run/secrets), logs, selected files created at runtime. 
Detection methods
- Deterministic patterns (regex) for known secret types.
 - Heuristics: entropy scoring for unknown/random secrets.
 - Contextual signals: filename/path, key names, nearby keywords, file type hints.
 - Structural checks: e.g., JWT decodable, cloud key prefix/length.
 - (Optional) Lightweight validation: local checksum/format (no network calls by default).
 
Reporting
- JSON (and optionally SARIF) with: where, what rule matched, snippet masked, confidence, severity, layer/container process, and remediation hint.
 
1) Docker‑aware discovery workflow
A. Images (static, pre‑runtime)
- 
Obtain filesystem + metadata
- 
Prefer API: Docker Engine (Docker.DotNet) to
Images.GetImageAsyncand export/tar (docker save) in memory. - 
Parse
manifest.json+config.json; capture:config.Env(final env),- history/
created_byforENV/ARG/RUNstrings, - labels.
 
 
 - 
 - 
Scan every layer
- Stream‑extract each layer tar (e.g., SharpCompress).
 - Track added/modified paths per layer (so you can report: layer N, file X).
 - Text‑only filter: skip clearly binary files (e.g., sample N bytes; if >30% non‑printables, skip or downrank).
 
 - 
File content & name/path analysis
- Apply regex detectors (Section 3) and entropy (Section 4).
 - Weigh findings with context (Section 5).
 
 - 
Dockerfile/History checks
- Flag secrets in 
ENV/ARG/RUNstrings (e.g.,ENV MYSQL_ROOT_PASSWORD=...). - Flag deleted‑later files that were present in earlier layers (common leak).
 - Highlight missing 
.dockerignorepatterns when suspicious files (.env, .pem, .tfstate) entered any layer. 
 - Flag secrets in 
 
B. Running containers (runtime)
- 
Enumerate containers and inspect:
InspectContainerAsync→Config.Env,HostConfig.Binds,Mounts, image id.
 - 
Env var scan
- Scan all 
key=valuepairs with the same detectors (regex + entropy + context on the key name). 
 - Scan all 
 - 
Process args
docker topor/proc/<pid>/cmdlineviaExec→ scan args for--password=...,--api-key=....
 - 
Mounted secret paths
- Default locations: 
/run/secrets/*,/var/run/secrets/*, K8s secret volumes, config maps that may contain creds. - Retrieve via 
GetArchiveFromContainerAsyncand scan. 
 - Default locations: 
 - 
Logs (optional but valuable)
- Attach/stream logs; scan lines for secret patterns; provide live redaction option.
 
 
Note
: Memory forensics is possible but heavy; treat as optional/IR-only.
2) High‑value filename/path heuristics (fast wins)
Run these glob/name checks before content scanning to prioritize files:
Generic secret indicators
**/*.env        **/.env*           **/*secret*.*      **/*secr*.* 
**/*credential*.*                 **/*creds*.*       **/*passwd* 
**/password*   **/*token*.*       **/*apikey*.*      **/*api_key*.* 
**/*.pem       **/*.key           **/*.pfx           **/*.p12 
**/*.jks       **/*.keystore      **/id_rsa          **/id_dsa 
**/id_ecdsa    **/id_ed25519      **/private.pem     **/server.key 
**/tls.key     **/jwt*.key
Common app/config
**/appsettings*.json              **/secrets*.json
**/application.{yml,yaml,properties}
**/application-*.{yml,yaml,properties}
**/config.yaml  **/settings.yml   **/settings.py
**/wp-config.php **/config.php     **/settings.php
**/nuget.config  **/settings.xml (Maven)  **/gradle.properties
**/docker-compose*.yml   **/compose*.yml
**/PublishProfiles/*.pubxml
Cloud/CLI creds
**/.aws/credentials  **/.aws/config
**/gcloud/application_default_credentials.json
**/.azure/**         **/doctl/config.yaml     **/.oci/config
**/.docker/config.json  **/.dockercfg
**/.npmrc  **/.yarnrc  **/.pypirc  **/.gem/credentials  **/.netrc
Infra/IaC
**/*.tfstate  **/*.tfvars*   **/kube/config  **/.kube/config  **/*kubeconfig*
**/service-account*.json     **/*-sa.json    **/*-key.json
Orchestrator runtime
/run/secrets/*     /var/run/secrets/*
3) Regex detector catalog (battle‑tested patterns)
Use
RegexOptions.Compiled | RegexOptions.IgnoreCase(case‑sensitive where needed). Always mask values in reports (e.g., show first 4 + last 4 chars).
3.1 Private keys / certificates
- OpenSSH private key
@"-----BEGIN OPENSSH PRIVATE KEY-----" - Generic PEM private key
@"-----BEGIN (?:RSA |DSA |EC |PGP )?PRIVATE KEY-----" - PGP private key
@"-----BEGIN PGP PRIVATE KEY BLOCK-----" 
(Public keys/certificates are not secrets:
BEGIN PUBLIC KEY,BEGIN CERTIFICATE→ downrank/ignore.)
3.2 Cloud: AWS
- 
Access Key ID
@"\b(?:AKIA|ASIA|AGPA|AIDA|AROA|AIPA|ANPA)[A-Z0-9]{16}\b" - 
Secret Access Key (context‑aided)
@"\b[A-Za-z0-9/\+=]{40}\b"Boost only if nearaws|secret|access[_-]?key|AWS_SECRET_ACCESS_KEYwithin ~50 chars. - 
Credentials file lines
@"aws_access_key_id\s*=\s*[A-Z0-9]{20}"@"aws_secret_access_key\s*=\s*[A-Za-z0-9/\+=]{40}"
 
3.3 Cloud: GCP / Google
- 
API key
@"\bAIza[0-9A-Za-z\-_]{35}\b" - 
Service Account JSON (two‑term signature)
@"""type""\s*:\s*""service_account"""@"""private_key""\s*:\s*""-----BEGIN PRIVATE KEY-----"
 
3.4 Cloud: Azure
- Storage connection string
@"DefaultEndpointsProtocol=https;AccountName=[^;]+;AccountKey=[A-Za-z0-9\+/=]{88};EndpointSuffix=core\.windows\.net" - SAS token (simplified)
@"\bsv=\d{4}-\d{2}-\d{2}[^ ]*?&sig=[A-Za-z0-9%/\+=]{40,}\b" 
3.5 Dev platforms / SCM
- 
GitHub PAT
@"\bgh[prusoa]_[A-Za-z0-9]{36}\b" - 
GitLab PAT
@"\bglpat-[A-Za-z0-9\-_]{20,}\b" - 
NPM token
- in 
.npmrc:@"//registry\.npmjs\.org/:_authToken=\s*(npm_[A-Za-z0-9]{36})" - raw form: 
@"\bnpm_[A-Za-z0-9]{36}\b" 
 - in 
 - 
PyPI token
@"\bpypi-AgEIcHlwaS5vcmc[A-Za-z0-9\-_]{50,}\b" 
3.6 Messaging / SaaS
- 
Slack tokens (broad)
@"\bxox[a-z]-[A-Za-z0-9-]{8,}-[A-Za-z0-9-]{8,}-[A-Za-z0-9-]{8,}(?:-[A-Za-z0-9-]{8,})?\b" - 
Stripe
@"\bsk_(?:live|test)_[0-9a-zA-Z]{24}\b" - 
SendGrid
@"\bSG\.[A-Za-z0-9\-_]{16,32}\.[A-Za-z0-9\-_]{16,64}\b" - 
Mailgun
@"\bkey-[0-9a-zA-Z]{32}\b" - 
Twilio
- SID: 
@"\bAC[0-9a-f]{32}\b" - Auth token (context aided): 
@"\b[0-9a-f]{32}\b"neartwilio|auth[_-]?token 
 - SID: 
 - 
Discord bot
@"\b[A-Za-z\d]{24}\.[A-Za-z\d\-_]{6}\.[A-Za-z\d\-_]{27}\b" 
3.7 Database / service connection strings
- PostgreSQL
@"\bpostgres(?:ql)?://[^:\s]+:[^@\s]+@[^/\s]+" - MySQL
@"\bmysql://[^:\s]+:[^@\s]+@[^/\s]+" - MongoDB
@"\bmongodb(?:\+srv)?://[^:\s]+:[^@\s]+@[^/\s]+" - SQL Server (ADO.NET)
@"\bData Source=[^;]+;Initial Catalog=[^;]+;User ID=[^;]+;Password=[^;]+;" - Redis
@"\bredis(?:\+ssl)?://(?::[^@]+@)?[^/\s]+" - Basic auth in URL (generic)
@"[a-zA-Z][a-zA-Z0-9+\-.]*://[^:/\s]+:[^@/\s]+@[^/\s]+" 
3.8 Docker / CLI auth artifacts
- Docker config.json auth
@"""auth""\s*:\s*""[A-Za-z0-9\+/=]{20,}""" - .netrc auth
@"(?mi)^machine\s+\S+\s+login\s+\S+\s+password\s+\S+" 
3.9 Tokens / JWT
- JWT (structural)
@"\beyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\b" 
3.10 Build tools / package managers
- NuGet (cleartext)
@"<add\s+key=""ClearTextPassword""\s+value=""[^""]+"""@"<add\s+key=""Password""\s+value=""[^""]+"""(base64 ‑ still secret) - Maven settings.xml
@"<server>\s*<id>[^<]+</id>\s*<username>[^<]+</username>\s*<password>[^<]+</password>" - Gradle
@"(?i)\bsigning\.password\s*=\s*.+" 
Keep regexes modular; associate each with:
{ Id, Name, Pattern, Severity, Examples, RecommendedRemediation }.
4) Entropy detector (catches “unknown” secrets)
Why: Many org‑specific tokens won’t match known regexes.
Implementation
- 
Extract candidate tokens by character class:
- base64/base64url: 
[A-Za-z0-9/_\-\+=]{20,} - hex: 
[A-Fa-f0-9]{32,} - general mixed: 
[A-Za-z0-9]{24,} 
 - base64/base64url: 
 - 
Compute Shannon entropy per candidate. Use alphabet‑aware thresholds:
- base64/url: ≥ 4.0 bits/char & length ≥ 24
 - hex: ≥ 3.0 bits/char & length ≥ 32
 - alnum: ≥ 4.0 bits/char & length ≥ 24
 
 - 
Context boosts (raise confidence) if within 64 chars of:
password|passwd|pwd|secret|token|apikey|api_key|api-key|client[_-]?secret|private[_-]?key|connectionstring|conn[_-]?str|bearer - 
Context suppressors (lower confidence/ignore):
- File/path contains: 
example|sample|test|fixture|dummy - Surrounding line contains: 
REDACTED|<redacted>|changeme - Known non‑secret blocks: 
BEGIN PUBLIC KEY,BEGIN CERTIFICATE 
 - File/path contains: 
 - 
Cap N findings per file (e.g., 50) to avoid log floods.
 
5) Scoring & de‑duping
Combine signals into a confidence score:
- +0.9 Regex “hard” match (e.g., OpenSSH private key)
 - +0.7 Regex “soft” match (e.g., AWS secret 40‑char near keyword)
 - +0.4 Entropy pass
 - +0.2 Suspicious filename/path
 - –0.5 Suppressor keyword/file
 - +0.2 Structural check passes (e.g., JWT decodes)
 
Severity
- Critical: private keys, cloud root creds, Docker auth, DB creds in URLs, verified JWT signing keys.
 - High: API tokens (GitHub/GitLab/Slack/Stripe), secrets in ENV/ARG history.
 - Medium: high‑entropy candidates with strong context.
 - Low: weak context/entropy only, or likely sample values.
 
De‑dupe same value across files/layers/envs; keep a single canonical record with occurrence list.
6) Docker‑specific checks you must implement
- ENV/ARG leakage in history
Parse 
config.History[].CreatedByordocker history --no-trunc. Flag anyENV/ARGwith suspicious key names or values matching detectors. - Deleted‑later files
If a file existed in an earlier layer and got deleted later (common 
.envmishap), still flag it and report layer + instruction that introduced it. .dockerignoreadvisory If high‑risk files (.env, .pem, .tfstate, credentials) entered the build context once, suggest.dockerignoreentries.
7) Runtime inspection rules
- 
Environment
- Scan all 
Envpairs; boost hits for keys containing:PASSWORD|PASS|PWD|SECRET|TOKEN|KEY|CLIENT_SECRET|SAS|CONNECTIONSTRING 
 - Scan all 
 - 
Process args
- Flag 
--password,--api-key,--token,--secret,--connection-string. 
 - Flag 
 - 
Mounted secrets
- Enumerate 
/run/secrets/*,/var/run/secrets/*(Swarm/K8s). - Ensure permissions are restrictive; still scan contents (apps sometimes copy them elsewhere).
 
 - Enumerate 
 - 
Logs
- Tail & scan. Provide optional redaction pipeline.
 
 
8) Reporting format (JSON)
Example JSON for one finding:
{
  "detectorId": "aws.accessKeyId",
  "name": "AWS Access Key ID",
  "severity": "HIGH",
  "confidence": 0.92,
  "valueSample": "AKIA************WXYZ",
  "locations": [
    {
      "type": "image-layer-file",
      "image": "repo/app:1.4.2",
      "layerDigest": "sha256:...abc",
      "path": "/app/.env",
      "line": 12
    },
    {
      "type": "container-env",
      "containerId": "f3e9d...",
      "envKey": "AWS_ACCESS_KEY_ID"
    }
  ],
  "context": {
    "filePathScore": 0.2,
    "regexMatch": true,
    "entropy": null,
    "nearbyKeywords": ["AWS_ACCESS_KEY_ID"]
  },
  "remediation": "Remove from image; inject via secrets manager or runtime mount; rotate the key."
}
Optionally also emit SARIF to plug into code‑scanning dashboards.
9) C# implementation sketch
Project layout
SecretsScanner/
  Core/
    IDetector.cs                 // interface: Detect(stream|text, path, context) -> Findings
    RegexDetector.cs             // holds Pattern, Hints, Confidence rules
    EntropyDetector.cs           // Shannon entropy
    JwtDetector.cs               // structural decoding check
    FileClassifier.cs            // text/binary check, ext-based hints
    Scoring.cs                   // combine signals; severity
    PathsHeuristics.cs           // globs & filename rules
    ReportModel.cs               // JSON schema / SARIF
  Docker/
    ImageReader.cs               // reads image tars, layers via Docker.DotNet or stream
    HistoryParser.cs             // extracts ENV/ARG from history
    ContainerInspector.cs        // env, args, mounts, logs (Docker.DotNet)
  Catalog/
    RegexCatalog.cs              // patterns (section 3), per-detector metadata
    Keywords.cs                  // boost/suppress lists
  Cli/
    Program.cs                   // options: image, container, path; json output; fail-on
C# snippets (illustrative)
Regex catalog
public static class RegexCatalog
{
    public static readonly (string Id, string Name, Regex Rx, string Severity, string Hint)[] Rules =
    {
        ("pem.openssh", "OpenSSH Private Key",
            new Regex(@"-----BEGIN OPENSSH PRIVATE KEY-----", RegexOptions.Compiled),
            "CRITICAL", "Remove private keys from images; use mounts or vault."),
        ("pem.private", "PEM Private Key",
            new Regex(@"-----BEGIN (?:RSA |DSA |EC |PGP )?PRIVATE KEY-----", RegexOptions.Compiled),
            "CRITICAL", "Remove private keys; rotate credentials."),
        ("aws.akid", "AWS Access Key ID",
            new Regex(@"\b(?:AKIA|ASIA|AGPA|AIDA|AROA|AIPA|ANPA)[A-Z0-9]{16}\b", RegexOptions.Compiled),
            "HIGH", "Rotate; use IAM roles/STS; remove from code/config."),
        ("github.pat", "GitHub Personal Access Token",
            new Regex(@"\bgh[prusoa]_[A-Za-z0-9]{36}\b", RegexOptions.Compiled),
            "HIGH", "Revoke PAT; use fine-grained tokens; remove from image."),
        // ... add remaining patterns from Section 3
    };
}
Entropy
public static class Entropy
{
    public static double Shannon(ReadOnlySpan<char> s, ReadOnlySpan<char> alphabet)
    {
        Span<int> counts = stackalloc int[256];
        int n = 0;
        foreach (var ch in s)
        {
            if (alphabet.IndexOf(ch) >= 0) { counts[ch]++; n++; }
        }
        if (n == 0) return 0.0;
        double H = 0.0;
        for (int i = 0; i < counts.Length; i++)
        {
            if (counts[i] == 0) continue;
            double p = counts[i] / (double)n;
            H -= p * Math.Log(p, 2);
        }
        return H;
    }
}
Candidate extraction (simplified)
static readonly Regex Base64Token = new(@"[A-Za-z0-9/_\-\+=]{20,}", RegexOptions.Compiled);
static readonly Regex HexToken    = new(@"[A-Fa-f0-9]{32,}", RegexOptions.Compiled);
IEnumerable<Candidate> ExtractCandidates(string line)
{
    foreach (Match m in Base64Token.Matches(line)) yield return new Candidate(m.Value, "b64", line);
    foreach (Match m in HexToken.Matches(line))    yield return new Candidate(m.Value, "hex", line);
}
Scoring
double Score(DetectionSignals s)
{
    double score = 0;
    if (s.RegexHard) score += 0.9;
    if (s.RegexSoft) score += 0.7;
    if (s.EntropyHit) score += 0.4;
    if (s.SuspiciousPath) score += 0.2;
    if (s.StructuralOk) score += 0.2;
    if (s.Suppressor) score -= 0.5;
    return Math.Clamp(score, 0, 1);
}
Docker (Docker.DotNet)
- Images: 
IImageOperations.GetImageHistoryAsync,Images.GetImageAsync+ tar unpack. - Containers: 
Containers.InspectContainerAsync,Exec.ExecCreateContainerAsync+ExecStart,GetArchiveFromContainerAsync,Logs.GetContainerLogsAsync. 
10) False‑positive control & hygiene
- Ignore lists: file globs (
test/**,**/*.example.*), value lists (REDACTED,example,dummy,changeme). - Public materials: downrank matches inside 
BEGIN PUBLIC KEY/BEGIN CERTIFICATE. - Thresholds: tune entropy and minimum lengths to your codebase; keep per‑detector knobs in config.
 - Masking: never print full values; keep secure logs.
 - Rate‑limits: cap per‑file matches; cap per‑container to avoid spam.
 
11) CI/CD and policy
- Build step: after 
docker build, run image scan; fail on High/Critical (configurable). - Pre‑deploy: scan runtime env for env/args/mounts (read‑only).
 - Baselining: allow a first pass to baseline known leftovers, then block any new secrets.
 - Rotation: auto‑emit per‑type remediation (e.g., rotate PAT, revoke AWS AK/SK, move to secret manager).
 
12) Optional enhancements
- SBOM‑guided scanning: use SBOM/file inventory to prioritize text/config assets; cache base layers.
 - JWT structural checks: base64url‑decode header/payload; verify JSON; flag if plausible.
 - Checksum checks: Luhn for CCNs (if in scope); simple format checks for cloud tokens.
 - Interactive audit: CLI 
--auditmode to triage and write an “allowlist/baseline”. 
13) Minimal “first list” your dev can paste today
Start with these detectors (high ROI):
- PEM/OPENSSH private keys
 - AWS AKID + secret (context‑aided)
 - GitHub PAT, GitLab PAT, NPM, PyPI
 - Slack, Stripe, SendGrid, Twilio
 - Docker config 
authfield - DB connection strings (Postgres/MySQL/Mongo/SQLServer)
 - JWT
 .aws/credentials,.npmrc,.docker/config.json,appsettings*.json,.env*,*.tfstate,*kubeconfig*(path heuristics)- Entropy (base64/hex/alnum) with context boosts/suppressors
 
That set alone catches the overwhelming majority of real‑world leaks.
Final note
This blueprint keeps everything offline (no external calls), so it’s safe in CI and reproducible. If you later want to add credential validation (e.g., confirm an AWS key via STS), make it opt‑in and heavily rate‑limited.
If you want, I can package these regexes and the scaffolding into a starter C# repo with a CLI (scan image <ref> | scan container <id> | scan path <dir>) and JSON output.