WP-06 Continuous Audit as a Service

Why "audit" stopped meaning what it used to mean

Twenty years ago, audit meant: an external firm visits once a year, asks for screenshots and spreadsheets, and writes a report. Five years ago, it meant: a SaaS like Vanta or Drata pulls API evidence continuously and surfaces gaps to a security team. Today — for an MSP — it has to mean something more: every codebase you ship is monitored continuously, every finding is tied to a specific objective, and every regression is reported as it happens.

The shift is driven by the same regulatory and operational pressures that show up across this paper series. EU AI Act, NIS2, DORA, SEC Climate, NIST AI RMF, TRAIGA, ELVIS, SOC 2 with AI controls. Buyers expect evidence on demand, not annually. Auditors expect retention measured in years, not weeks.

And almost every existing audit-as-a-service product is sold direct to a single customer. Vanta, Drata, Comp AI, Delve. They are excellent for a single startup proving SOC 2. They are not built for an MSP that needs to run the same posture across twenty-four of its own products and an arbitrary number of client portfolios, with multi-tenant evidence segregation and cross-portfolio benchmarking.

What we built — and run on ourselves

AiT Audit is a Cloud Run job that fires on the first of every month. In approximately 45 minutes, it does the following for every active codebase under our management:

Acquires a project lock via AiT Coord, our internal multi-agent orchestrator. If a deploy or migration is mid-flight, the audit skips that project this cycle and logs the conflict to the agent-runs table.
Clones the repo shallowly using a fine-grained PAT scoped to the relevant repo set.
Runs eight scanners in parallel: Semgrep (SAST with our centralized rule pack covering OWASP Top 10, Next.js, Clerk RLS, multi-tenant invariants), OSV-Scanner (dependencies), Gitleaks (secrets), Checkov (IaC + GitHub Actions), Knip (unused exports/files/deps), Spectral (OpenAPI lint), and a drift detector that compares the project’s lint/SAST/tsconfig against a canonical guardrails registry.
Runs DAST on deployed URLs for repos that expose them: Nuclei templates, Lighthouse CI, axe-core a11y.
Classifies every finding against the project’s docs/objectives.md using an Anthropic Haiku classifier with prompt caching. Each finding gets a label: threatens, aligns, adjacent, or out-of-scope. This is the move that turns a scan report from "200 things you might fix" into "the 3 findings that actively threaten your stated quarterly objective."
Persists everything to a Supabase database with row-level security scoped per project: projects, audit_runs, findings, guardrail_violations, market_signals, agent_runs.
Renders an IG-branded Markdown report per project, posts a Block Kit message to the audit Slack channel, sends an email digest, and updates the Trust Portal “Audit” tab.

A companion weekly cron pulls market signals: Hacker News /best filtered by relevance keywords, GitHub trending by topic, NVD CVE feed for our stack components, awesome-list diffs. Each signal is classified against project objectives the same way findings are. The Monday morning digest tells the team what changed in the world that touches what we’re building.

In production today

The first market-intel run wrote 29 real signals to the database in approximately three minutes — including a Hacker News story titled "Securing a DoD contractor: finding a multi-tenant authorization vulnerability" that mapped directly to objectives in two of our security products. The system is not theoretical. It found something useful on its first run.

Why "objective alignment" is the move

Every code-quality and security tool produces too many findings. Semgrep alone, on a non-trivial codebase, will surface hundreds of issues. The traditional response is severity filtering: ignore Low, look at High and Critical. This is fine and we do it. But severity is upstream-defined; it does not know what the project is for.

Reading a project’s objectives.md and asking "does this finding threaten Tier-1 objective X" produces a different ranking. A Medium-severity finding on the auth boundary of a multi-tenant SaaS is more important than a High-severity finding in a script that runs once a quarter. The classifier captures that priority. The team gets a pre-triaged list. The reviewer’s time goes to decisions, not rediscovery.

The classifier also does the opposite work. A High-severity finding that’s flagged as out-of-scope — "this Dockerfile lint warning fires on a build script that has never run in production" — gets surfaced separately so the team can either acknowledge it or correct the misclassification. False-positive triage is a normal step, not a tax.

What the centralized guardrails do

Every project pulls its lint, SAST, secret-scan, IaC, and OpenAPI rules from a single canonical registry. The audit’s drift detector hashes each project’s actual config file against the canonical version. Three drift kinds are recognized: missing (config doesn’t exist where it should), overridden (config differs and a justified override is documented), and stale-version (config differs and there’s no justification — the project drifted from canonical without anyone deciding to).

Stale-version drift is its own finding category. The team gets visibility into where guardrails are eroding without being shouted at; the path back to canonical is one PR and a sync script.

What this looks like in practice

A client signs an MSP agreement that includes "Audit-as-a-Service" as a line item. We bring our 24-codebase posture and apply it to their 8-repo portfolio. Within the first month they receive:

The first scan report per repo: SAST findings, dep vulnerabilities, secret-scan results, IaC issues, OpenAPI lint, dead-code map, and an objective-aligned ranking against whatever objectives.md they have (or we co-author one with them at onboarding).
A drift report comparing their config to the centralized guardrails — with a one-PR remediation path.
A Trust Portal entry that exposes their audit history to their own auditor on demand. SOC 2 evidence collection for a control family they previously scrambled to assemble annually now exists in retrievable form, dated, signed, retained.
A Slack channel where the next month’s findings will arrive on the first of the month, and a weekly market-intel digest of what changed in their stack’s security and competitive picture.

The MSP is doing a hundred dollars of work per repo per month and saving the client thirty thousand dollars of audit-prep labor at year-end. The deliverable is a continuously-current trust posture, not a once-a-year scramble.

Why pure-play audit SaaS doesn’t cover this

Vanta, Drata, Comp AI, and Delve are excellent direct-to-startup products. They are not designed for an MSP managing client portfolios with multi-tenant evidence segregation. Specifically:

The MSP cannot give one client’s auditor access without giving them a generic dashboard view of unrelated environments.
The pricing model assumes one customer per workspace; MSPs end up paying for n workspaces when one would do.
Cross-portfolio benchmarking — "your client’s posture is in the 67th percentile of MSPs we serve" — is structurally absent.
Code-level audit (SAST, SCA, drift detection against MSP’s own canonical rules) is not the product. They focus on operational evidence; we focus on operational evidence plus the code that produces it.

Where this fits

AiT Audit is the operational backbone of the AI portfolio. AiT Coord arbitrates the audits. AiT SOC Sentinel monitors the agent activity that performs them. AiT AI Gateway governs the LLM calls that classify findings. The Trust Portal exposes the results. Every paper in this series ties back to the audit posture; you can read this paper as the engine and the others as the surfaces it powers.

Continuous Audit as a Service