"Keep a human in the loop" is the most repeated advice in AI-augmented research. It is also the most incomplete. The loop has positions. A reviewer who sees everything reviews nothing; a reviewer who sees the wrong stage approves failures that were invisible from where they sat. The position of the human is the design.
This essay is the technical companion to my piece on automation complacency. That one argued review gates have to demand work, not approval. This one is about the question that determines whether the work matters: where the gates go.
The version with zero gates
The thematic analysis pipeline I ship has six stages. The first version had none worth the name. The model produced themes, the researcher approved them, the work shipped. And the output looked rigorous — comprehensive theme lists, confident synthesis, quotes in all the right places.
It wasn't. The rigor was cosmetic, and what taught me that wasn't a catastrophic failure — it was the slow realization that I couldn't say where the analysis would fail if it failed, which meant I couldn't say it hadn't. The fix wasn't a better model. It was an architecture that put the researcher's attention exactly where the pipeline cannot compensate for itself.
The pipeline follows the spine of reflexive thematic analysis:Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. The gate placements follow their phase boundaries — after theme construction, after review, after defining. The pipeline is the method made executable, not a replacement for it.
familiarize → code → synthesize ▣ review ▣ insights ▣ report
gate 1 gate 2 gate 3
coverage faithfulness decisions
Six stages. Three end in a gate where the researcher does work the model cannot do. Three don't. Choosing which three was the most consequential design decision in the system.
Why not gate everything
The tempting answer is to put a human after every stage. It's also wrong, and the reason connects back to the complacency essay: reviewer attention is the scarcest resource in the system. Six gates means six approvals a day on autopilot by Friday. A gate earns its place only where the model is structurally blind — where the failure mode is invisible from inside the pipeline, not merely possible.
Familiarization, coding, and report composition don't meet that bar. Their failures are visible downstream, in artifacts the researcher will already be holding. The three positions that do meet it each guard a different kind of blindness.
Gate 1, after theme synthesis: the coverage audit
The synthesis stage takes the full set of coded data and proposes candidate themes. In this pipeline it works from a co-occurrence matrix that gives the model visibility into 100% of the corpus — up from 36% under the sampling approach the first version used.Internal pipeline figures. Sampling was the original compromise to fit context windows; the matrix representation removed it. The point stands independent of the exact numbers: mechanical coverage is now cheap, and that's precisely what makes it misleading. The output looks comprehensive, and mechanically it is.
But mechanical coverage and analytical coverage are not the same thing. A theme can touch every transcript and still systematically under-represent the participants whose language doesn't fit the dominant frame — the non-native speaker whose frustration reads as politeness, the power user whose workaround vocabulary never matches the feature names. The model cannot flag this, because the flag would require the model to know what it does not know.
So the gate at this position asks one thing, and it is not "approve these themes." It asks: identify which participants or segments your themes under-represent. Name the contradictions the synthesis smoothed over. The cognitive task is a coverage audit. It can only happen here — before this point there are no themes to audit; after the review stage, the structure has already hardened around whatever the synthesis missed.
Gate 2, after theme review: the faithfulness check
The review stage applies the reviewer's decisions to the preliminary themes. The mechanic matters: the reviewer writes decisions — "merge T2 and T5," "rename T1," "split T4" — and Python applies them, re-runs quote retrieval, and produces a clean revised themes object. The reviewer's logic and the pipeline's implementation are two different things, and the gap between them is where this failure mode lives.
Reviewer decisions can be reasonable in isolation and incoherent in aggregate. A merge that tightened the structure on Monday might, by Thursday's pass, be the merge that lost the deviant case — each step locally sensible, the sum quietly unfaithful. The pipeline cannot catch this. The pipeline is what implemented the decisions.
The gate here asks: walk each revised theme back to the coded data. If the revision makes the theme tidier but less faithful to what people said, reject it. Tidier is easy to approve. Faithful takes reading. The placement makes reading the default — and it has to sit after review, not before, because the thing being audited is the accumulated effect of the reviewer's own choices. This is the gate where the researcher checks themselves, not the model.
Gate 3, after insight generation: the decision test
The insight stage takes final themes and writes them up in the language of product decisions. This is where the prose gets confident. Models are very good at confident prose. They are far less good at noticing when a sentence describes a finding without changing what anyone would do.
The gate asks: for each insight, write one sentence naming the product decision it should change. Cut the ones that don't survive. A hypothetical pair shows the shape. "Users value flexibility in their workflow" — true-sounding, decision-free, a paraphrase of the theme it came from. "Users abandon setup at the integrations step because the defaults assume a stack they don't have — reorder it or lose them" — that sentence is exposed to being wrong, which is exactly what makes it worth shipping. You either have a decision-ready sentence or you have a paraphrase wearing one's clothes.
This is the gate that catches the most output, and its position is non-negotiable. Earlier, there are no insights to test against decisions. Later, the report is already drafted around them, and cutting a hollow insight means unwriting a narrative — which, under deadline, nobody does.
Three jobs, not three approvals
Notice what the three gates have in common: none of them asks "is this good?" Each demands a specific artifact — an under-representation list, a faithfulness verdict per revision, a decision sentence per insight — aimed at a specific blindness. Coverage, faithfulness, decisions. The model can't see the first because it doesn't know what it doesn't know; it can't see the second because it executed the drift; it can't see the third because confidence is its native register.
That's what "position is the design" means in practice. The gates aren't checkpoints sprinkled over a pipeline for safety theater. Their placement is the theory of where this kind of system fails — and a team that can articulate its own three blindnesses has done most of the architectural work already, whatever pipeline they end up building.
The dek of this essay says the position decides what the researcher becomes, and I mean that literally. Put the human at the end and they become an approver — complacency with a title. Put them at the three points of structural blindness and they become the thing no model replaces: the person who knows what the system cannot know about itself.
The first version of my pipeline had no gates, and its output looked rigorous. The current version interrupts a researcher three times per study, demands real work at each stop, and is slower for it. Every team that has used both prefers the slower one. Rigor you can locate beats speed you can't trust — that's the trade, and the architecture is how you buy it.
Which gate would your team build first? If you're working that question — designing AI into qualitative workflows without giving up the judgment that makes them research — that's exactly the work I do. Work with me →
Expanded from a piece first published on LinkedIn, April 2026.