Automation Handles the 90%. Humans Handle the 10% That Matters.

Human-in-the-Loop Compliance

Full automation produces false confidence. Manual processes do not scale. The boundary between what should be automated and what requires human judgment is not arbitrary. Evidence collection, configuration monitoring, and cross-framework mapping are mechanical. Risk acceptance, policy exceptions, control inheritance confirmation, and governance decisions require human authority. A platform that automates everything is dangerous. A platform that automates nothing is unusable. The right answer is a platform that knows the difference.

Human-in-the-Loop Compliance

Automate what you should. Govern what you must.

Compliance involves two fundamentally different categories of work. Mechanical work follows deterministic rules: collect this configuration, compare it to this baseline, record the result. Judgment work requires context, authority, and accountability: accept this risk, approve this exception, confirm this mapping, sign this attestation. Conflating these categories produces platforms that either automate judgment (creating liability) or require manual effort for mechanical tasks (creating bottlenecks). The human-in-the-loop model draws a precise line between the two.

April 2026

Last updated April 3, 2026

The Boundary

What Can Be Automated Versus What Needs Human Judgment.

Not everything in compliance can be automated, and not everything should be manual. The distinction is not about technical capability. Modern systems can automate nearly any decision if you let them. The distinction is about accountability. When an organization accepts a risk by operating with a known control gap, that acceptance carries legal and contractual weight. An authorizing official signs the risk acceptance. Their name is on the document. Their judgment is the basis for the decision. No automated system should make that decision on a human's behalf, because no automated system bears the consequence of getting it wrong. Similarly, when an organization grants a policy exception that allows a system to deviate from a security baseline, that exception must be justified, time-bounded, reviewed, and approved by someone with the authority to accept the associated risk. Automating that approval removes the accountability that makes the exception governable.

The mechanical side of compliance is equally clear. Collecting a configuration snapshot from a running system is a deterministic operation. Comparing that snapshot against a security baseline produces a binary result: the configuration matches or it does not. Recording the result with a timestamp, source system identifier, and integrity hash is a data operation. Repeating this collection on a defined schedule is a scheduling operation. None of these steps require judgment. They require reliability, consistency, and tamper-evident storage. Performing them manually introduces human error, scheduling drift, and evidence gaps. An engineer who collects configuration evidence quarterly will occasionally miss the collection window. The evidence ages. The compliance posture degrades without anyone noticing until the next assessment preparation cycle reveals the gap. Automating mechanical evidence collection does not remove humans from the process. It removes humans from the parts of the process where they add error instead of value.

The boundary between mechanical and judgment work is not always obvious. Consider vulnerability management. Detecting a vulnerability is mechanical: a scanner identifies a known CVE in a deployed component. Assessing the vulnerability's applicability to your environment involves judgment: does the vulnerable code path execute in your deployment configuration? Is the vulnerability exploitable given your network segmentation? Prioritizing remediation involves more judgment: does this vulnerability affect a system that handles regulated data where the risk is higher, or an internal development environment where the blast radius is contained? Deciding to accept the risk of delayed remediation because a compensating control mitigates the exposure is a governance decision that requires human authority. Each step in the vulnerability management lifecycle crosses the automation boundary at a different point. A platform that treats the entire lifecycle as either fully automated or fully manual misses the nuance that determines whether the output is trustworthy.

The Problem

Full Automation Produces False Confidence. Manual Processes Do Not Scale.

Platforms that automate everything create a specific failure mode: false confidence backed by automated assertions that no human reviewed. The system reports 95% compliance. The dashboard is green. The executive team reports the number to the board. No one examined whether the automated assessment correctly interpreted each control. No one verified whether the evidence the system collected actually demonstrates what the control requires. No one reviewed whether the cross-framework mappings the system applied are accurate for the organization's specific implementation. The 95% number is a computation, not a judgment. When the assessor arrives and probes the controls behind that number, they find configurations that technically satisfy the automated check but do not satisfy the control's intent. They find evidence that demonstrates existence but not effectiveness. They find mappings that are structurally correct but contextually wrong because the organization's implementation differs from the standard pattern the automation assumed.

Manual compliance processes fail differently. They fail through exhaustion. A compliance team of three people managing CMMC Level 2 across two systems must track 220 control implementations, collect evidence for each on a recurring schedule, write and maintain narratives that describe the current state of each control, manage POA&M items for open gaps, coordinate remediation with engineering teams who have competing priorities, and prepare evidence packages for the assessor. Each task is individually manageable. The aggregate volume is not. Evidence collection falls behind schedule. Narratives describe last quarter's architecture. POA&M items age without progress because the compliance team lacks the authority to compel engineering resources. The team spends its time on mechanical evidence collection because that is the most visible deliverable, leaving no bandwidth for the judgment work that actually determines compliance outcomes: evaluating whether controls are operating effectively, identifying cross-control dependencies that create systemic risk, and making governance decisions about acceptable risk levels.

Both failure modes converge on the same outcome: the organization enters an assessment with an inaccurate understanding of its own compliance posture. The fully automated organization believes it is compliant because the dashboard says so. The fully manual organization knows it is behind but cannot quantify how far behind because it lacks the bandwidth to assess its own state. The assessor discovers the truth in both cases, and the discovery happens at the most expensive possible moment: during the assessment itself. The cost is not just the failed assessment. It is the months of preparation that produced an inaccurate picture, the remediation cycle that must follow, and the organizational credibility lost when the reported posture does not match the observed posture. The problem is not automation versus manual. The problem is drawing the line in the wrong place, or not drawing it at all.

Mechanical Work

Evidence Collection, Scanning, and Monitoring Are Deterministic. Automate All of It.

Evidence collection, vulnerability scanning, configuration monitoring, and baseline comparison are mechanical operations. They follow deterministic rules. The input is observable infrastructure state. The output is a recorded artifact with provenance metadata. No step in the chain requires interpretation, risk judgment, or organizational authority. These operations consume the majority of a compliance team's time in manual workflows: scheduling evidence collection windows, reminding engineers to export configurations, chasing missing screenshots, reformatting artifacts into the evidence repository's required structure, and verifying that timestamps fall within the freshness window. Every hour spent on mechanical collection is an hour not spent on governance decisions, narrative review, or risk assessment. The mechanical work does not require the compliance team's expertise. It requires their calendar, and that calendar is finite.

Doing this work manually consumes all available capacity because the volume scales with the number of controls, systems, and evidence freshness requirements. A single CMMC Level 2 system with 110 practices requires evidence for each practice, refreshed on the cadence the framework or the organization's policy dictates. Two systems double the volume. Adding FedRAMP to the compliance portfolio adds 325 controls with their own evidence requirements, some overlapping with CMMC but requiring separate freshness windows or format requirements. A three-person compliance team that spends 60% of its time on mechanical evidence collection has 40% remaining for the judgment work that determines whether the organization is actually compliant. That 40% is further divided between narrative writing, gap analysis, remediation coordination, and assessment preparation. The bandwidth available for actual governance, the work that distinguishes a compliant organization from one that merely has evidence, approaches zero.

Sentinel operates as a unified collection engine with four operational profiles: discovery, evidence collection, assessment, and monitoring. All four profiles share the same credential store, retry logic, rate limiting, and error handling. The discovery profile enumerates resources, configurations, and relationships. The evidence collection profile captures configuration snapshots with full provenance: source system identifier, collection timestamp, SHA-256 integrity hash, and OpenTelemetry trace ID. The assessment profile compares collected configurations against applicable baselines and records compliance determination. The monitoring profile watches for changes and triggers re-collection when drift is detected. Vanguard adds 14 scanner types for code-level analysis: multi-language SAST, secret detection, dependency analysis, container scanning, DAST, STIG verification, linting, code quality metrics, coverage analysis, fuzzing, API security testing, license compliance, IaC scanning, and SBOM generation. Every mechanical operation across both capabilities runs on automated schedules without human intervention. The compliance team's time shifts entirely to the judgment work that the platform cannot and should not perform on their behalf.

Judgment Work

Risk Acceptance, Policy Exceptions, and Categorization Require Human Authority.

Risk acceptance is a governance decision that carries legal and contractual weight. When an organization operates with a known control gap, it accepts the associated risk. The authorizing official signs the risk acceptance, and their signature attests that they understand the gap, evaluated the potential impact, considered compensating controls, and determined that the residual risk is acceptable given operational requirements. This decision cannot be automated because no automated system bears the consequence of the decision. The authorizing official can be named in a False Claims Act proceeding. The automated system cannot. Policy exceptions follow the same pattern: a specific system or component is allowed to deviate from a security baseline under defined conditions, for a bounded duration, with a compensating control in place. The exception must be justified, reviewed, and approved by someone with the organizational authority to accept the deviation. Categorization decisions determine which data types a system handles, which directly determines which controls apply and at what rigor. Categorizing a system as handling only Federal Contract Information when it actually processes Controlled Unclassified Information changes the entire compliance baseline. These decisions shape the compliance program. They require human accountability.

Automating judgment work produces compliance without accountability. A platform that automatically accepts risk when a control gap persists for a defined period has eliminated the governance check that makes risk acceptance meaningful. A platform that automatically categorizes data based on keyword matching has removed the human determination that regulatory frameworks require. A platform that automatically approves policy exceptions when compensating controls are detected has substituted a mechanical check for the organizational judgment that assessors will evaluate. The assessor does not ask whether a compensating control exists. The assessor asks whether the authorizing official evaluated the compensating control's adequacy, documented the rationale, defined a remediation timeline, and accepted the residual risk with full understanding of the implications. Automated approval produces a record. It does not produce the judgment that the record is supposed to represent.

Rampart enforces human decisions at control gates. Every control that requires judgment, whether risk acceptance, exception approval, categorization confirmation, or inheritance determination, includes a gate that blocks automated progression until a human with the appropriate role provides the required decision. Artificer presents context for each judgment decision: the current posture data from Sentinel, risk indicators derived from the three-dimensional scoring model (implementation, effectiveness, and evidence currency), and historical patterns from prior decisions on related controls. The human decision-maker receives everything the platform knows, organized for the specific decision they need to make. When they make the decision, Rampart records it as an immutable event: the decision type, the decision-maker's user ID, the session ID, the OpenTelemetry trace ID, a SHA-256 hash of the decision payload, the timestamp, and the stated rationale. Every judgment decision has a name attached to it, a reason recorded for it, and a cryptographic proof that it was not altered after the fact.

Escalation Design

The Right Finding Reaches the Right Human with the Right Context.

Escalation is not an error state. It is the designed behavior for every decision that crosses the automation boundary. When the platform detects a condition that requires human judgment, it generates an escalation event with full context: the specific control affected, the specific condition that triggered the escalation, the current evidence state, the applicable baseline requirement, the identity of the person or process that caused the condition, and the response options available. The escalation routes to the control owner, not to a generic compliance inbox. The control owner receives the context required to make an informed decision without conducting their own investigation. Escalation by design compresses the decision cycle from days or weeks of back-and-forth investigation to a single informed action.

Traditional compliance tools fail at escalation in two directions. Some escalate everything: every configuration deviation, every evidence gap, every baseline mismatch generates an alert that routes to the compliance team. The volume overwhelms the team's capacity to triage. Critical findings are buried under routine deviations. The team develops alert fatigue and begins ignoring notifications, which means genuine governance decisions age without attention. Other tools escalate nothing: deviations are silently recorded in a log that no one reviews until assessment preparation begins. The compliance team discovers six months of accumulated drift in a two-week preparation window and cannot remediate in time. Neither approach serves the purpose of escalation, which is to ensure that the right human makes the right decision at the right time with the right information.

Sentinel implements escalation through AutomationPolicy, a per-rule configuration that defines how each detection type is handled. Three escalation modes govern the response: auto-apply for safe, well-defined remediations within approved change windows (a configuration restored to its baseline value, for example); approval-gate for changes that carry operational risk and require human confirmation before execution; and manual-only for policy decisions that cannot be pre-authorized regardless of the context. The thresholds between modes are configurable by the organization based on control family, severity, data sensitivity, and operational tempo. Citadel routes each escalation to the appropriate human based on the control ownership model defined in Rampart. The escalation includes the full context: what changed, what the compliance impact is, what the response options are, and what precedent exists from prior decisions on similar conditions. When an escalation ages past its defined response window without action, Sentinel re-escalates to the next level in the notification chain. No compliance decision ages silently. Every escalation either receives a human response or triggers a higher-level notification until it does.

Approval Workflows

Overlay Operations, Mapping Confirmations, and Governance Decisions Require Authorization.

Overlay operations modify the control baseline for a system. Adding a DISA STIG overlay creates new implementation requirements on top of the base framework controls. Removing an overlay removes protections. Modifying a control's parameters changes what the assessor will evaluate. Each overlay operation carries four types: ADD (introduce new controls or requirements), MODIFY (change parameters or implementation guidance for existing controls), REMOVE (mark controls as not applicable with documented justification), and PARAMETER (adjust specific configuration values within a control's implementation). None of these operations should happen without deliberate human authorization. Each changes the compliance obligations that govern a system, and each must be justified, reviewed, and traceable to the person who approved the change.

Approval bottlenecks stall remediation when the approval workflow routes every decision through a single authority regardless of severity or impact. A configuration restoration that returns a system to its documented baseline should not require the same approval chain as a decision to remove a control from the assessment scope. Yet many compliance workflows treat all changes uniformly, routing every remediation action through the same approval queue. The result is a backlog where low-risk, well-understood remediations wait behind complex governance decisions. Engineers cannot remediate known drift because the approval has not been processed. The compliance posture degrades while the approval queue grows. The bottleneck is not the approval requirement itself. It is the failure to differentiate between approvals that require senior governance judgment and approvals that require operational confirmation.

Rampart enforces Manager-approval gates for all overlay operations. Each ADD, MODIFY, REMOVE, and PARAMETER operation generates an approval request with full impact analysis: which controls are affected, how the assessment score changes, which evidence gaps the change creates or resolves, and the rationale provided by the requestor. Artificer mapping confirmations follow the same approval pattern: when Artificer proposes that a control maps to another framework through the AI_SUGGESTED strategy, the mapping requires explicit human acceptance before Rampart activates it for cross-framework scoring and evidence sharing. Every approval is immutably logged with the approver's user ID, the session ID, the OpenTelemetry trace ID, a SHA-256 hash of the approval payload, and the timestamp. When an assessor asks who approved a specific overlay modification or mapping relationship, the organization produces the exact approval record with the approver's identity, the decision timestamp, and the stated justification. No approval is implicit. No authorization is assumed. Every governance action has a name, a time, and a reason attached to it.

Learning

Human Decisions Improve the System. Every Approval Teaches the Platform.

Every human decision in the platform generates a signal that improves future intelligence. When a reviewer confirms a cross-framework mapping, that confirmation strengthens the strategy that produced it. When a reviewer rejects a mapping, the rejection and its rationale inform future suggestions. When a control owner responds to an escalation by accepting a deviation with a risk justification, the platform records the decision pattern: this type of deviation, in this infrastructure context, was accepted with this rationale. When a similar condition occurs in the future, the intelligence layer references the precedent in the escalation context. It does not pre-approve the decision. It provides the next decision-maker with relevant organizational history so they can make a more informed judgment. The learning is contextual, not prescriptive. It improves the quality of what the platform presents. It never reduces the number of decisions that require human authorization.

Most compliance tools discard judgment. They record the outcome (approved, rejected, deferred) but not the reasoning, the context, or the precedent. When a similar decision arises six months later, the team has no institutional memory of how they handled it before, why they chose that path, or what the outcome was. The decision is made fresh each time, often by a different person, often with a different result. Inconsistent governance decisions create risk: if the same type of deviation is accepted for one system and rejected for another without a documented reason, the assessor will question the organization's governance maturity. If a mapping was confirmed by one reviewer and rejected by another for the same control pair, the cross-framework scoring becomes unreliable. Without decision memory, the compliance program cannot learn from its own history.

Artificer operates through a three-tool architecture: query (retrieve posture data, evidence status, mapping relationships, and historical decisions), act (propose narrative drafts, suggest mappings, generate remediation guidance), and visualize (produce compliance dashboards, evidence chains, and decision audit trails). Human decisions feed back through all three tools. When a confirmed AI_SUGGESTED mapping demonstrates accuracy over multiple assessment cycles, its fidelity graduates from AI_SUGGESTED to PUBLISHED in Rampart's mapping registry, reflecting the organization's validated interpretation. Narrative modifications teach the generation model the organization's preferred language patterns, evidence citation style, and level of implementation detail. Quality criteria for narratives are machine-verifiable: the narrative must reference specific infrastructure components by identifier, must cite evidence artifacts that exist and are within their freshness window defined by Sentinel's evidence expiration policies, and must use the framework's required terminology. Every learning signal operates within the human-in-the-loop boundary. The platform gets better at presenting information. Humans retain full authority over every decision.

The Right Balance

Automate What You Should. Govern What You Must. Prove That You Did Both.

The human-in-the-loop model is not a philosophical position. It is a structural requirement driven by how compliance assessments actually work. Assessors do not evaluate whether your platform collected evidence automatically. They evaluate whether the evidence demonstrates control effectiveness. Assessors do not evaluate whether your cross-framework mappings were computed algorithmically. They evaluate whether the mappings are accurate for your specific implementation. Assessors do not evaluate whether your risk acceptances were processed through an approval workflow. They evaluate whether the right person, with the right authority, made an informed decision with adequate justification. The automation handles the volume. The human governance handles the accountability. Both are necessary. Neither is sufficient alone.

The operational benefit is measurable. Evidence collection runs continuously without human scheduling, eliminating the evidence staleness that plagues manual workflows. Narrative drafts generate from observed state in minutes rather than the hours or days that manual narrative writing requires. Cross-framework scores compute in real time rather than the weeks that manual cross-referencing consumes. The compliance team's time shifts from mechanical evidence collection to judgment work: reviewing escalations, confirming mappings, approving narratives, making governance decisions. The team's output shifts from spreadsheet maintenance to compliance governance. The total effort decreases because the mechanical 90% is automated. The quality of the remaining 10% increases because the humans spend their time on the decisions that actually determine compliance outcomes rather than on the data collection that supports those decisions.

The transformation is provable. Every automated action in Sentinel carries a provenance record: what the platform did, when, based on which policy, triggered by which event, with a SHA-256 integrity hash and OpenTelemetry trace ID. Every human decision in Rampart carries a governance record: who decided, when, with what justification, in response to which escalation, cryptographically sealed at the moment of decision. The two record types are distinct and auditable. An assessor reviewing the evidence chain traces any compliance claim through both layers. The automated layer proves that evidence was collected reliably, consistently, and with cryptographic integrity. The human layer proves that governance decisions were made by authorized individuals with adequate context and documented rationale. Citadel presents both layers in a unified view: the action queue shows pending human decisions, the event stream shows completed automated actions, and the governance log shows the full decision history. The platform does not hide the automation behind a human facade or the human decisions behind an automated process. It makes both layers transparent, traceable, and verifiable. That is the proof an assessor requires.

Something is being forged.

The full platform is under active development. Reach out to learn more or get early access.

Request Early Access