BOSTON — A growing number of school districts and universities are quietly retiring the AI-detection tools they had deployed during the past two academic cycles, citing false-positive rates that have, in too many cases, affected the academic records of students whose work was authentically their own.

The retirements are not being announced with the visibility that accompanied the original deployments. Most districts are simply not renewing their licences and are, in their public communications with families, framing the change as a shift in instructional approach rather than a recognition that the tools did not perform as expected.

What the data showed

The performance data on the most widely-deployed detection tools has, by the published research available, fallen well below the marketing claims of the vendors. False-positive rates in the most rigorous studies have ranged from 6 percent to 14 percent depending on the tool and the writing context, with notably higher rates for non-native English speakers and for students with certain neurodivergent writing patterns.

The disparate-impact pattern was the part of the data that ultimately drove most of the rollback. Several districts that had absorbed the false-positive rates as a tolerable cost reconsidered when the disparate-impact data became visible.

What is replacing the tools

The instructional approaches that are replacing the detection tools are themselves a category. The most common is a shift toward in-class writing exercises, oral examination components, and process-based assessment that reduces the reliance on out-of-class written submissions for the highest-stakes evaluations.

Whether these approaches are sustainable at scale is the question that the next several academic cycles will sharpen. The labour intensity of process-based assessment is meaningfully higher than the assignment-and-submission model it is replacing, and the constraint on teacher time is, in most districts, already binding.

The vendor response

The vendors of the affected tools have, with varying degrees of public visibility, attempted to argue that the detection performance is improving and that the rollbacks reflect a moment in the underlying technology that is being addressed. The argument has not, on the available data, persuaded the most credible academic researchers in the area.

Several vendors have begun to pivot their offerings toward integrity-coaching tools that focus on supporting student writing process rather than on identifying generated content. Whether the pivot will preserve the relevant business models is an open question.

What does and does not change

What changes is the immediate practice in the districts that are rolling back the tools. The underlying question — how to sustain meaningful written assessment in an environment where students have access to generative AI tools at home — is not addressed by removing detection tools.

The deeper question is the subject of a longer institutional conversation that the rollbacks have, if anything, accelerated. The conversation is uncomfortable but no longer avoidable.