AI Is Already Inside the Federal Courts. Legal Purism Is Not a Governance Strategy

Denial Is Not Discipline

Executive Summary

 

Artificial intelligence is no longer a speculative issue for the federal courts. It is already present in judicial workflow. A 2026 random-sample survey of federal judges reports that more than 60% of responding judges used at least one AI tool in judicial work, with legal research as the leading reported use case and document review next. The same study found reported AI use by others in chambers as well. Those findings do not support overstatement, and the authors identify meaningful limitations, including a 22.3% response rate and possible self-selection and non-response bias. But they do establish the threshold fact that now matters most: AI is already inside federal judicial chambers.

That fact changes the terms of the debate. The central institutional question is no longer whether AI should enter the federal courts. It is whether the judiciary will govern a technology already affecting research, document handling, drafting-adjacent work, and chambers operations with the rigor adjudication requires. On that question, the present record shows a judiciary in transition rather than one operating under a settled control structure. The survey reports uneven policy and training: 45.5% of responding judges said court administration had not provided AI training, 15.7% were unsure whether training had been offered, and 24.1% reported having no official AI policy. If judges who discouraged but did not formally prohibit AI use are also treated as operating without formal policy, the paper states that figure rises to 41.7%.

The federal judiciary’s own institutional response confirms that this is now a governance problem, not a hypothetical one. The Administrative Office reported in 2025 that it had established an advisory AI Task Force and that the task force developed interim guidance for the Judiciary. That guidance directed judiciary users to review and independently verify AI-generated work product, cautioned against delegating core judicial functions to AI, and encouraged courts to define the tasks for which locally approved AI tools may be used. The federal judiciary’s September 2025 Strategic Plan separately called for establishment of an AI governance framework to guide responsible adoption and manage risks presented by new and evolving AI technologies. Courts have also begun responding more concretely through disclosure requirements, verification rules, and sanction frameworks aimed at fabricated or incorrect AI-assisted submissions.

These developments show a judiciary moving unevenly but unmistakably from reaction to administration. The real institutional problem is not AI in the abstract. It is AI inside adjudicative systems without a sufficiently mature framework of supervision, verification, training, disclosure, and task limitation. The naysayers are right that hallucinations, fabricated authorities, and erosion of professional judgment are serious risks. But once AI is already in use, resistance without governance is not rigor. It is drift. The federal courts will not preserve legitimacy by pretending AI is absent from chambers and litigation practice. They will preserve legitimacy, if at all, by governing a technology already inside the system with the same discipline they demand from everyone else.

I. The Debate the Purists Already Lost

The legal purists are arguing a question the federal courts have already moved beyond. Artificial intelligence is no longer standing outside the judicial system as a speculative threat, waiting for the profession to decide whether it belongs. It is already inside federal judicial chambers, already present in judicial workflow, and already forcing administrative, procedural, and disciplinary responses. A 2026 random-sample survey of federal judges reports that more than 60% of responding judges used at least one AI tool in judicial work. Legal research emerged as the leading reported use case, document review followed, and judges also reported AI use by others in chambers. That does not describe a future problem. It describes an existing institutional condition.

The purist instinct nevertheless has an obvious appeal. Courts are among the few institutions that still claim legitimacy through reasoned process rather than naked force. They insist on traceable authority, adversarial testing, disciplined justification, and accountable human judgment. A technology capable of producing plausible but false analysis, fabricated citations, synthetic confidence, and polished distortion appears, at first encounter, to be fundamentally at odds with adjudication. The concern is not irrational. In a legal culture trained to equate rigor with control, suspicion toward AI has real intellectual and professional force.

But force of instinct is not the same thing as adequacy of response. The problem with the purist position is not that it is morally unserious. The problem is that it is operationally late. By the time much of the profession settled into the view that AI should be treated as an external menace to be resisted at the courthouse door, the federal judiciary had already entered a transition period. The survey does not show saturation, and it should not be exaggerated into one. The authors themselves identify important limitations, including a 22.3% response rate and possible self-selection and non-response bias. But even with those limits, the threshold fact remains. AI is no longer hypothetical in federal judicial work.

That point is often obscured because the debate is framed in absolutist terms. If AI is not yet universal, some assume it remains marginal. If only 22.4% of responding judges reported weekly or daily use, some take comfort in the thought that the system remains essentially unchanged. If 38.4% reported never using any listed AI tool in their work, others infer that enough professional resistance could still arrest the transition. Those inferences are mistaken. Institutional consequence does not begin at saturation. In courts, it begins much earlier. A tool need not dominate judicial work to alter it. It is enough that the tool enters the upstream functions that shape what authorities are surfaced, what facts are foregrounded, what analytic routes are pursued, and what drafts first look like before final human judgment is expressed. A technology that enters research, review, summary, and workflow organization has already entered the environment in which adjudication is made.

This is why the usual public caricature of the problem is too shallow. The survey does not support the claim that federal judges are broadly handing decisions to machines. It reports only limited use in direct decision-making categories. But the more serious institutional issue was never confined to whether a judge would allow a chatbot to resolve a case. The harder question is whether AI alters how legal problems are framed before the formal act of judgment ever occurs. Courts do not simply decide. They research, filter, compare, summarize, organize, and frame before they decide. That is where institutional vulnerability lies. A system built on human adjudication can still be materially affected by machine assistance long before any judge would describe the machine as having “made” a decision.

The judiciary’s own conduct confirms that the issue has already moved from theory to administration. The Administrative Office reported in 2025 that it had established an advisory AI Task Force and that the task force developed interim guidance for the Judiciary during that year. The guidance directed judiciary users to review and independently verify AI-generated work product, cautioned against delegating core judicial functions to AI, encouraged courts to define the tasks for which locally approved AI tools may be used, and advised courts to consider disclosure and confidentiality concerns. The federal judiciary’s September 2025 Strategic Plan separately called for establishment of an AI governance framework to guide responsible adoption and manage the risks presented by new and evolving AI technologies. Once the judiciary’s own administrative arm has moved from observation to guidance, professional denial stops looking principled and starts looking unserious.

The same is true at the court level. Judge Michael Baylson did not issue his 2023 standing order in the Eastern District of Pennsylvania because AI was an imaginary concern. The District of Kansas did not issue its district-wide standing order in January 2026 because fabricated citations and AI-assisted inaccuracies were merely fashionable talking points. Those orders emerged because federal courts were already confronting the effect of AI on filing practice, source reliability, and procedural integrity. Disclosure requirements, certification requirements, and sanction warnings are not evidence of speculative anxiety. They are evidence of a system already trying to control a technology that has entered real litigation practice.

There is a deeper professional failure underneath the rhetorical one. Lawyers often confuse denunciation with discipline. The profession assumes that once it has articulated a sound normative objection, it has substantially discharged its institutional responsibility. But institutions do not run on objections. They run on procedures, rules, supervision, enforcement, and incentives. A lawyer may be entirely correct that AI hallucinates, fabricates, overstates, and compresses nuance into smooth error. Yet if that same lawyer resists written policies, approved-use categories, verification duties, disclosure rules, training obligations, and supervisory controls, that lawyer has not defended adjudication. The lawyer has merely left the field open to informal adaptation.

That is the point the purists have not confronted. Their skepticism may be justified, but skepticism is not governance. Professional disapproval is not a control structure. A judiciary that has already entered a period of uneven AI use cannot preserve legitimacy by pretending the issue remains premature. Once the technology is already present in chambers, already visible in filing practice, already prompting administrative guidance, and already producing local procedural responses, the only serious question is how it will be governed.

The profession can still decide what kind of control structure it wants. It can still decide whether AI use in federal courts will be supervised, verified, and transparently bounded, or whether it will remain patchy, informal, and reactive. What it can no longer plausibly claim is that the question remains external to adjudication. That argument has already been overtaken by the record.

Artificial intelligence is already inside the federal courts. The only serious question left is whether the judiciary intends to govern that fact or merely endure it.

II. The Threshold Fact: AI Is Already in Federal Judicial Work

The most important fact in this debate is now the simplest one. Artificial intelligence is already being used in federal judicial work. That proposition no longer rests on anecdote, conference chatter, or isolated scandal. It rests on a published survey of federal judges and on the federal judiciary’s own administrative response. A 2026 random-sample survey reports that more than 60% of responding judges used at least one AI tool in judicial work. The same study reports that 22.4% used AI weekly or daily, 19.6% monthly, 19.6% rarely, and 38.4% never used any of the listed AI tools in their work. Those numbers do not describe universal adoption. They do describe a judiciary that can no longer plausibly be characterized as untouched by AI.

That matters because the profession has been tempted to treat anything short of saturation as institutional insignificance. That is not how institutional change works, especially in courts. A technology does not need to become ubiquitous before it becomes consequential. It needs only to enter enough of the workflow that the old description of the institution ceases to be accurate. That point has already been reached. A judiciary in which a meaningful share of judges use AI at least occasionally, a smaller but still substantial group use it with some regularity, and chambers personnel are also reported to be using it is no longer confronting AI solely from the outside. It is already in a transition period.

The survey’s credibility lies partly in its restraint. The authors drew a stratified random sample of 502 federal bankruptcy, magistrate, district court, and court of appeals judges from a population of 1,738 current federal judges and received 112 responses, for a 22.3% response rate. They identified the standard limitations that attend survey work of this kind, including self-selection bias, non-response bias, and the possibility that judges with stronger views or greater engagement on AI may have been more likely to respond. They also cautioned that the Court of Appeals subset was especially thin and should be treated as anecdotal rather than representative. That methodological caution is not a weakness. It is part of the study’s credibility. The paper does not pretend to prove more than it can. Its contribution is narrower and more useful: it establishes that AI use in federal judicial work is real, measurable, and institutionally relevant.

The distribution of reported use is itself revealing. The categories do not merely separate users from nonusers. They map the contours of transition. A substantial minority remains outside current use. Another group appears to be experimenting only occasionally. A smaller but still meaningful group has moved into recurring use. That matters because transition is precisely the stage at which governance failures become most likely. When a technology is truly external, formal controls may seem premature. When it is fully integrated, controls are often forced into existence by necessity. The unstable period is the middle one—when use is real enough to matter, uneven enough to be underdefined, and sufficiently dispersed that institutions can still pretend the change is not yet fundamental. That is where the federal judiciary appears to be.

The survey also defeats a second simplification: the idea that AI in the courts is basically synonymous with free-form chatbot experimentation. Judges reported greater use of legal-specific AI tools than general-purpose systems and more frequent use of those legal-specific tools. Westlaw AI-Assisted or Deep Research was the most-used reported tool, followed by ChatGPT, with CoCounsel also showing notable use. Several general-purpose tools barely registered at all. That pattern is institutionally important because it suggests that judicial AI use is entering through familiar legal-research infrastructure as much as, and perhaps more than, through open-ended consumer products. AI in the federal courts is therefore not just a story about judges experimenting with public chatbots. It is also a story about machine assistance being integrated into professional platforms the judiciary already treats as part of ordinary research practice.

That distinction matters because technologies imported through legacy infrastructure are often normalized faster than those that present themselves as radical breaks from prior method. A judge who would recoil from drafting an order with a public chatbot may feel considerably less hesitation about using AI-assisted research inside a platform already associated with citators, databases, and searchable authority. The institutional significance of that shift is substantial. AI can enter judicial work under the appearance of continuity rather than disruption. It can be experienced not as abandonment of professional method, but as enhancement of tools the profession already trusts.

That does not make the development benign. It makes it more consequential. A legal-specific platform may feel safer than a general-purpose chatbot, but familiarity is not reliability. A professionally branded interface does not eliminate fabricated authority, misleading summaries, shallow synthesis, or false confidence. In some respects, those risks may become harder to police when they arrive through trusted infrastructure, because the user’s skepticism may be lower. The relevant point is not that legal-specific AI is inherently safe. It is that the judiciary’s adoption pattern shows AI entering ordinary workflow through channels that are easier to normalize and therefore harder to resist through abstract denunciation alone.

The federal judiciary’s own official conduct points in the same direction. The Administrative Office reported in its 2025 Annual Report that an advisory AI Task Force had been established and that interim guidance for the Judiciary was developed during that year. The guidance permitted use and experimentation while directing judiciary users to review and independently verify AI-generated work product, cautioning against delegation of core judicial functions, and encouraging courts to define approved tasks and consider disclosure and confidentiality issues. The federal judiciary’s Strategic Plan separately called for establishment of an AI governance framework to guide responsible adoption and manage the risks of new and evolving AI technologies. Those are not the actions of an institution still deciding whether AI matters. They are the actions of an institution that has already concluded it does.

This is why the threshold fact is larger than the survey alone. The empirical finding and the administrative response reinforce each other. The survey shows reported use inside chambers and judicial work. The judiciary’s own official materials show that AI has already become an object of internal governance, guidance, and strategic planning. Together, they foreclose the old posture in which lawyers speak as though the issue remains safely theoretical. Once a judiciary both uses a technology and begins building internal structures to manage it, the question is no longer whether that technology is relevant to adjudication. It is already relevant.

The survey’s reported use cases make that point harder to evade. Legal research was the most common reported judicial use case. Document review followed. Broader categories included reviewing, searching, and analyzing documents, as well as drafting and editing. Those functions are not peripheral to adjudication. They are upstream determinants of what the adjudicator sees, how the record is cognitively organized, and which authorities receive early emphasis. A profession that measures AI’s significance only by asking whether the judge made the decision is asking the question too late in the workflow. Decisions are built from research paths, document framing, chronology, and synthesis. If AI enters there, it has already entered the ecology of judgment.

That is the point at which the debate must become more exacting. It is no longer enough to say that judges should remain careful. Care is not a governance framework. It is no longer enough to say that human judgment must remain central. That proposition is true, but incomplete, unless the institution also specifies what counts as impermissible delegation, what counts as acceptable assistance, who bears the burden of verification, and how chambers supervision is supposed to work in practice.

The threshold fact, then, is double. Artificial intelligence is already being used in federal judicial work, and the federal judiciary itself has already acknowledged that reality through internal guidance and strategic planning. From that point forward, the question of whether AI belongs in the federal courts becomes secondary. The harder and more serious question is what institutional conditions should govern a technology already inside the system.

One of the easiest mistakes in the public debate over AI and courts is to focus on the most theatrical possibility instead of the most operationally important one. The dramatic fear is the fantasy of the robo-judge: a machine replacing deliberation, substituting synthetic output for human reasoning, and perhaps even rendering judgment directly. That image is rhetorically useful because it is obviously intolerable. It allows lawyers to reject AI in its most extreme form and feel that the institution has thereby been defended. But it is the wrong place to look if the goal is to understand what is actually happening inside federal chambers. The principal issue is not spectacular automation of judgment. It is quieter integration into the workflow from which judgment is built.

The survey makes that clear. The leading reported use case for judges was legal research. Reviewing documents followed. When related uses were grouped into broader categories, judges reported notable AI use for reviewing, searching, and analyzing documents; for legal research; and for drafting and editing. By contrast, only a very small share reported using AI to make decisions, and a somewhat larger but still limited share reported using AI to inform decisions. Those distinctions matter. They show that the technology is entering judicial work first through support functions rather than through explicit transfer of ultimate decisional authority. That fact is often misread as reassuring. It is not. It is the reason the problem has to be taken seriously.

In courts, support functions are not peripheral. They are the architecture of perception. Legal research determines which doctrines are surfaced and which remain invisible. Document review determines what factual mass becomes cognitively available. Summaries and chronologies affect perceived significance, causal sequence, and narrative salience. Drafting support can influence tone, framing, emphasis, and the early organization of legal analysis. A judicial process built from distorted inputs may remain formally human at the moment of final decision while becoming substantively warped much earlier in the chain. The line between support and judgment is real, but it is not impermeable. That is why arguments that focus only on whether a machine “made the decision” are too blunt to capture the actual institutional risk.

This is the central point the legal purists tend to miss. Courts do not begin with judgment. They arrive at judgment through layers of preliminary work. Someone identifies the issue. Someone gathers the cases. Someone decides which facts matter first. Someone organizes the record, compresses a chronology, prepares a bench memorandum, or shapes the first draft of a research path. Even when the judge is the sole ultimate decision maker, the decision emerges from a production process. If AI enters that process at the front end, it may influence adjudication without ever being described as deciding anything. The technology’s institutional significance therefore lies less in outright substitution than in mediation.

The survey’s qualitative responses reinforce that point because they show the range of entry points. Some judges described using AI as a tool like a treatise before beginning research. Others described broad-question orientation, document reading, summary assistance, or word-choice help. One judge reported using ChatGPT to prepare a first draft of a CLE outline rather than case-related work. Another used Copilot to prepare for a talk involving a foreign jurist. At the same time, one chambers anecdote described a law clerk who used AI to write a memo that cited ten fake cases out of eleven. Taken together, those examples matter because they show that AI does not arrive only when someone decides to generate an opinion. It arrives through orientation, organization, citation checking, administrative drafting, speech preparation, and curiosity-driven comparison. It enters through the low-friction points in professional workflow.

That breadth is what makes the governance problem difficult. A narrow prohibition aimed only at overt adjudicative delegation may leave untouched a wide field of machine-influenced tasks that still shape adjudication. Conversely, a broad prohibition on all AI use may quickly prove unrealistic in an environment where legal-specific research tools already integrate AI functions into ordinary workflow. The problem is therefore not susceptible to slogans. It requires classification. Which tasks are sufficiently peripheral that they may be allowed under verification rules? Which tasks are sufficiently close to adjudicative judgment that they should be tightly cabined or prohibited? Which uses require approval? Which require disclosure? Which require supervisory awareness? Which demand formal training before use? Those are not philosophical questions dressed up as administrative detail. They are the core of the institutional problem.

This is also why the federal judiciary’s interim guidance, at least in concept, matters so much. Its significance does not lie only in the warning against delegating core judicial functions to AI. Its significance lies in the recognition that courts must identify the tasks for which approved AI tools may be used and that users must independently review and verify AI-generated content. That is a workflow-centered response. It implicitly acknowledges that the real question is not simply whether final judicial judgment remains human. The real question is how far machine assistance may extend into the material from which human judgment is formed. A tool used for preliminary administrative support presents one level of risk. A tool used to summarize a complex record presents another. A tool used to frame research paths, extract legal themes, or draft operative adjudicative text presents still another. The location of the technology inside the workflow is therefore not incidental. It is the key variable.

The role of chambers personnel makes the issue even more complicated. Judicial work is collaborative, even when judicial authority is singular. Clerks and staff help research issues, organize records, prepare chronologies, and shape the early structure of analysis. The survey reports that judges saw AI use by others in chambers at higher rates than they reported for themselves in several categories, including legal research. The authors also note that judges may underreport others’ use because they may not know all the ways in which chambers personnel have begun experimenting with or integrating AI tools. That observation is institutionally significant. It means that AI’s entry into workflow may be partly opaque even within the chamber itself. A judge may maintain a principled skepticism while a clerk uses AI for preliminary issue spotting. A chambers policy may exist only as an assumption rather than a writing. Oversight may be sincere and still incomplete. That is how shadow procedure develops: not through a single dramatic institutional announcement, but through small acts of tolerated convenience that accumulate without clear supervision.

Once chambers are viewed as real institutions rather than as abstractions, the inadequacy of the standard debate becomes obvious. The relevant question is not simply whether the judge personally used AI. The relevant question is whether AI has begun influencing the chain of work product from which formally human judgment later emerges. That is a much harder question because it implicates supervision, attribution, and internal accountability rather than merely personal ethics. It also explains why the public fixation on the extreme nightmare—the machine judge—can be so misleading. Institutions are often changed less by spectacular substitution than by modest, repeated incursions into routine workflow.

The federal court orders that have emerged in response to AI misuse reflect the same understanding. They do not focus solely on whether AI displaced a human decision maker. They focus on whether AI was used in preparing a filing and whether the resulting content was verified. That approach is telling. It recognizes that the institution’s vulnerability lies not only in overt substitution of machine for lawyer or judge, but in unnoticed contamination of the materials courts receive and rely upon. Quoted passages, paraphrases, legal analysis, procedural histories, and factual summaries can all be shaped by AI without anyone claiming that the machine rendered judgment. Yet each of those elements can affect adjudication if not properly checked.

This is why “support function” is too comforting a phrase. In ordinary organizations, support work may be operationally important without being normatively central. In courts, support work is often where the legal and factual universe of a case first takes shape. That is why legal research is not just a back-office activity. It is one of the mechanisms by which law becomes visible to the decision maker. That is why document review is not merely clerical. It is one of the mechanisms by which fact becomes legible. That is why summarization is not just convenience. It is one of the mechanisms by which complexity is compressed into priority. Once AI enters those functions, it enters the preconditions of judgment.

The practical implication is uncomfortable but unavoidable. AI’s most important effect on courts may not come from any spectacular attempt to automate judicial decision making. It may come from its quieter integration into the daily mechanics of research, summary, organization, and drafting. It may come from changes in how issues are first framed, how records are first processed, and how authority is first surfaced. That is precisely why legal purism is inadequate as an institutional response. It condemns the nightmare and ignores the workflow.

The more serious professional task is therefore not to repeat, in ever firmer tones, that judges must remain human. Of course they must. The harder task is to identify where machine assistance has already entered the production of judicial work and to decide, function by function, what the institution will allow, what it will supervise, and what it will forbid. Until that question is answered, the debate will remain rhetorically intense and operationally underdeveloped. And that is exactly the condition in which a technology like this is most likely to outrun the people responsible for controlling it.

IV. Why the Judiciary Is Preferring Legal-Specific AI

A recurring weakness in public discussion of artificial intelligence and courts is the tendency to treat “AI” as though it were a single undifferentiated object. That habit is analytically lazy and institutionally misleading. The survey does not describe a judiciary engaging AI in one uniform way. It describes a judiciary that is already distinguishing among tools, whether or not the profession has fully absorbed the significance of that distinction. Judges reported using legal-specific AI tools more often than general-purpose systems and using them more frequently. Westlaw AI-Assisted or Deep Research was the most-used reported tool. CoCounsel also showed substantial use. ChatGPT ranked high enough to matter, but many other general-purpose tools registered only minimal or rare use. Those differences are not incidental. They show the route by which AI is becoming institutional rather than merely fashionable.

That route matters because institutions do not absorb new technologies in the abstract. They absorb them through channels of existing trust. Legal-specific AI tools arrive embedded in platforms already associated with searchable databases, citators, precedent retrieval, document comparison, and ordinary legal workflow. They do not present themselves, at least to the user, as a dramatic departure from professional method. They present themselves as an extension of it. To a judge, the move from traditional research to AI-assisted research inside a familiar legal platform may feel less like an epistemic risk and more like a feature enhancement. The interface is known. The vendor is known. The broader research environment is known. The user is therefore less likely to experience the encounter as a leap into machine reasoning and more likely to experience it as an incremental improvement in ordinary professional tools.

That perception may be partly justified and partly illusory, but its institutional force is undeniable. Courts are conservative institutions in the structural sense. They value continuity, reliability, bounded discretion, and recognizable procedure. They do not generally embrace novelty simply because it is novel. A legal-specific AI tool, even when technically dependent on the same broad family of machine-learning methods as consumer-facing generative systems, appears narrower, more disciplined, and more professionally domesticated. It is framed as research assistance, document review, or citation support rather than open-ended generation. That framing is not merely a matter of marketing language. It shapes the user’s sense of what the tool is for and therefore the range of uses the user experiences as legitimate.

This helps explain why legal-specific tools may pass beneath the profession’s ideological defenses more easily than general-purpose systems. It is relatively easy for lawyers and judges to recoil from the image of “ChatGPT in the courtroom.” That phrase carries the cultural baggage of open-ended prompt-and-response generation, public consumer use, and highly publicized hallucinations. It sounds disruptive. It sounds unserious. It sounds like a challenge to the hierarchy of legal method. By contrast, AI-enhanced research inside a platform the profession has used for decades sounds familiar, even when the underlying issues of verification and reliability remain unresolved. The same profession that bristles at generalized AI rhetoric may accept machine-assisted research if it arrives inside an established legal brand. That is not hypocrisy. It is institutional selection.

The survey therefore reveals something important not only about tools, but about judicial psychology. Judges appear to be gravitating toward AI tools that fit preexisting workflow norms. That does not mean they are embracing AI enthusiastically or uncritically. It means their adoption is being filtered through institutional habits of trust. A legal-specific AI tool is more likely to be perceived as an aid to the existing research function than as an invitation to abandon legal judgment. The effect is subtle but powerful. A profession that might resist overt technological disruption can still accept technological infusion when it is packaged as continuity.

None of this means legal-specific AI deserves automatic confidence. The opposite conclusion would be naive. A legal wrapper does not eliminate fabricated authority, flattening of nuance, skewed summaries, poor analogies, or false confidence. In some respects, errors inside familiar infrastructure may be more dangerous precisely because they are more likely to pass without instinctive distrust. A user who approaches a general chatbot warily may approach an AI-assisted legal platform with professionalism’s version of relaxed guard. That is an institutional vulnerability. The appearance of continuity can lower the threshold of skepticism at the very moment when skepticism is still required.

That is why the more serious question is not whether a tool is consumer-facing or legal-specific, but what role the tool is playing inside the workflow and under what controls. A research feature embedded in a legal database may be appropriate for some preliminary purposes and unacceptable for others. It may be useful for orienting a broad issue, surfacing a first set of authorities, or accelerating a document search, yet still unreliable for generating propositions of law without source verification or for compressing a contested record into a summary that later shapes judicial analysis. A general-purpose model, by contrast, may be wholly unsuitable for record citation or legal synthesis while still being tolerable for noncase-related tasks such as preparing a speech outline, testing a phrase, or organizing administrative content. The branding distinction therefore matters, but it is not dispositive. It is the beginning of the inquiry, not the end of it.

This is one reason task-based governance is more important than tool-based rhetoric. A system that simply announces “legal AI good, consumer AI bad” will fail because it mistakes branding for control. The federal judiciary’s own administrative posture, at least in concept, points in the right direction by emphasizing approved tasks and independent verification rather than assuming that any particular label resolves the institutional risk. That is a more serious approach. It recognizes that the permissibility of machine assistance depends on what is being asked of the tool, what kind of material is involved, what human verification follows, and who bears responsibility for the result. A court deciding whether a chambers user may employ AI for research orientation is confronting a different question from whether AI may be used to summarize sealed material, generate operative adjudicative text, or frame disputed facts. Governance that cannot distinguish among those tasks is not governance at all.

The preference for legal-specific AI also complicates the legal purist’s rhetorical position in a more fundamental way. It is easy to condemn a disruptive consumer technology. It is harder to condemn an enhancement to the research infrastructure on which the profession already relies. Once AI arrives through ordinary legal tools, the profession loses the comfort of treating the issue as external contamination. The question stops being whether courts should let a strange machine into the room and becomes whether the profession is willing to examine how its own ordinary tools are changing. That shift matters because it forces the conversation away from theatrical panic and toward institutional detail. It requires the bench and bar to decide where the line between assistance and distortion actually lies.

There is another important lesson in the survey’s pattern of adoption. Judicial preference for legal-specific AI suggests that the future of AI in the federal courts, at least in the near term, is unlikely to arrive as a single dramatic transformation. The federal judiciary is not likely to become an “AI judiciary” by public declaration. More likely, it will become increasingly AI-inflected through a series of incremental adjustments inside research, review, and administrative systems already familiar to the bench and bar. That form of adoption is slower, quieter, and less visible. It is also harder to police. Technologies that arrive as revolution attract scrutiny. Technologies that arrive as optimization often do not.

That is precisely why serious oversight has to begin here. The profession must resist two opposite temptations at once. It must resist the naive assumption that legal-specific AI is inherently trustworthy because it sits inside a familiar professional ecosystem. And it must resist the equally simplistic assumption that AI governance can be accomplished by condemning a few well-known consumer platforms while ignoring machine assistance embedded in legal workflow. The actual problem is subtler and more institutional. The federal judiciary is already being shaped by the integration of machine assistance into tools the profession experiences as normal. If that integration is allowed to proceed under the cover of continuity, then the courts will adapt to AI without ever squarely deciding the terms on which they meant to do so.

That is the deeper significance of judicial preference for legal-specific tools. It is not merely a product choice. It is a governance warning. It shows that AI will not challenge the federal courts only from the outside, as a visible and disruptive force that courts can denounce in dramatic language. It will also reshape judicial work from within, through familiar platforms, trusted workflows, and seemingly modest improvements in efficiency. The more familiar the delivery mechanism, the more disciplined the oversight must be. Otherwise, what presents itself as incremental modernization may become unexamined institutional dependence.

V. Chambers, Clerks, and the Reality of Distributed Judicial Work

One of the most persistent distortions in the current debate over AI and the federal courts is the fiction of solitary judging. The debate is often framed as though the judge were the exclusive site of judicial work and as though the legitimacy question could therefore be resolved simply by asking whether the judge personally typed a prompt, personally relied on a model, or personally delegated a cognitive task to a machine. That framing is neat. It is also false. Federal adjudication does not emerge from a sealed chamber of solitary thought. It emerges from institutional production. Chambers operate through layered collaboration among judges, law clerks, staff attorneys, judicial assistants, and, in some settings, interns or other support personnel. Opinions, orders, bench memoranda, record summaries, legal chronologies, and preliminary research pathways often take shape inside that collaborative structure before the judge refines, rejects, or adopts the resulting work. Any account of AI in the judiciary that ignores this reality is not merely incomplete. It is structurally misleading.

The survey makes the point impossible to avoid. Judges reported that other individuals in their chambers used AI for legal research at 39.8%, which is higher than the percentage of judges who reported using AI for legal research themselves. More broadly, the paper reports somewhat greater AI use by chambers personnel than by judges across most categories. The authors also note something especially important: judges may be more likely to underreport than overreport chambers use, because they may not know all the ways in which others in chambers have begun using AI. Those observations matter because they show that AI use inside the judiciary is not reducible to judges’ personal habits. It is embedded in the collaborative environment through which judicial work is actually produced.

That point has serious institutional consequences. The first is that the legitimacy debate must shift from individual ethics to supervisory architecture. A judge may sincerely believe that AI should play no role in substantive adjudication. But if the judge does not know whether a clerk used AI to summarize a record, organize a chronology, identify a cluster of authorities, or generate a first-pass issue map, then the judge’s personal principle may not correspond to chambers reality. That mismatch is not trivial. Courts derive legitimacy from the integrity of their method, not merely from the sincerity of individual judicial belief. Once chambers work is distributed, the question is no longer simply what the judge thinks ought to happen. The question is what actually happens inside the production process that feeds judicial decision making.

The second consequence is that disclosure becomes much more difficult than the public version of the debate tends to acknowledge. Most external discussions assume a direct chain: a lawyer uses AI, the lawyer files the document, the court evaluates the filing. Chambers disrupt that simplicity. If a clerk uses AI internally to organize research or summarize a record, what disclosure, if any, is required? Must the judge know? Must the parties know? Does the answer change if the clerk’s AI-assisted work materially influences an order, a bench memorandum, or a later public filing? These are not easy questions, and they cannot be answered by slogans about transparency alone. But the existence of difficulty does not excuse avoidance. The rise of AI in chambers means that disclosure doctrine, where relevant, must be rethought through the lens of distributed labor rather than personal authorship alone.

The third consequence is that training and approval structures become mandatory rather than optional. A chambers environment that allows, tolerates, or simply ignores AI use by clerks and staff without written expectations is not practicing prudent restraint. It is inviting unsupervised experimentation in one of the most legitimacy-sensitive settings in the legal system. That point is especially important because chambers often function through trust, informality, and inherited custom. Those features can be strengths in ordinary institutional life. In the AI context, they can become liabilities. Informal assumptions about what clerks “probably would not do” are not substitutes for protocols. Trust is not verification. Custom is not governance.

The survey’s qualitative responses bring this problem into focus. Some judges reported that they did not know whether others in chambers used AI at all. Others reported that clerks had used AI to create presentations or draft remarks for talks. One response described a law clerk who, after writing a memo, used AI out of curiosity to draft a version on the same issue and discovered that ten of the eleven cited cases were fake. The significance of that anecdote lies not in any claim of widespread breakdown. It lies in the banality of the entry point. Nothing in the story required a formal decision that chambers would begin relying on AI. Nothing in it required institutional approval, a policy memorandum, or even bad faith. It required only curiosity, convenience, a question, and a tool that was already available. That is how technological drift often begins inside institutions: not with dramatic adoption, but with local experimentation that appears too small to count as structural change until the pattern has already taken hold.

This is where legal purism becomes especially weak as a governing framework. A theory centered on personal abstention by judges does almost nothing to address distributed judicial labor. Even if a judge never personally opens an AI tool, chambers may still be affected by AI-mediated research, machine-influenced record organization, or preliminary drafting support. Formal authorship by the judge does not erase the institutional significance of those earlier stages. In some respects, the legitimacy risk increases when the judge retains final authorship while the upstream production chain becomes partially machine-influenced but only partially visible. The institution may continue describing itself in traditional terms while its internal methods quietly shift underneath that description.

There is also a generational and structural asymmetry that makes this harder. Law clerks, staff attorneys, and younger lawyers are often more likely than senior judges to encounter, experiment with, or casually normalize emerging technologies. That is not an accusation. It is a predictable feature of professional life. The people closest to time pressure, drafting pressure, and research pressure are often the people most likely to adopt tools that promise speed. They may also be more technically fluent, more curious, or more comfortable operating in hybrid digital workflows. None of that makes them reckless. But it does mean that institutional exposure to AI may arise through subordinate channels before it becomes legible at the level of formal judicial policy. A governance model that focuses only on what judges themselves do will therefore miss a significant part of the actual risk profile.

The federal judiciary’s interim guidance, although not framed specifically as a chambers-supervision document, implicitly recognizes this problem. It emphasizes that judiciary users and their approvers remain accountable for work performed with AI assistance. That is not the language of mere individual choice. It is the language of supervision. It presumes that work product may be generated or shaped by one person and approved by another, and that responsibility follows both use and oversight. That is an important institutional insight. A viable governance regime cannot assume that every chambers user possesses equal caution, equal training, and equal understanding of machine limitations. Nor can it assume that informal chambers culture will reliably police a technology whose main attraction is precisely that it saves time at the earliest stages of work. If the institution is serious, it must determine what subordinate personnel may do, what approval is required, what categories of work remain off-limits, and how outputs must be verified before they become part of the chamber’s working product.

This distributed-work reality also exposes a broader problem with how judicial legitimacy is often discussed. There is a tendency to speak as though legitimacy attaches primarily to the final authored text and the judge’s final signature. That is only part of the truth. Legitimacy also depends on the integrity of the process that produced the text. A court order is not made legitimate simply because a judge ultimately reviews and signs it. It is made legitimate, in part, because the system claims that the work beneath the signature was generated through methods consistent with law’s demands for rigor, traceability, and human accountability. If chambers increasingly incorporate AI at earlier stages without settled rules, that claim becomes harder to sustain in its old form.

The practical response cannot be a mere warning. Serious supervision in this setting requires more than a chamber head telling clerks to “be careful” or “not overuse” AI. It requires written protocols. It requires explicit instruction on approved and prohibited uses. It requires mandatory verification standards for any AI-influenced research or drafting support. It requires confidentiality safeguards. It requires clarity about when AI use must be disclosed upward within chambers. It may also require recordkeeping norms in some settings, especially where AI contributed to work that later shaped filed or operative judicial text. Without that architecture, chambers risk becoming environments in which machine influence is both real and only partly visible. That is precisely the kind of institutional condition in which confidence erodes slowly at first and then all at once when a visible failure forces the hidden practice into view.

The courts have long understood that legitimacy depends not just on what the judge believes, but on how the institution produces its work. Chambers have always been part of that story, even when public discussion preferred to simplify them away. AI simply makes the old truth harder to ignore. The federal judiciary is not confronting a world in which a few judges may choose to use or avoid a new tool in isolation. It is confronting a world in which machine assistance can move through the collaborative channels of chambers practice faster than the formal language of judicial responsibility has yet caught up. Until that institutional fact is addressed directly, the conversation about AI in courts will remain rhetorically intense but structurally shallow.

VI. The Governance Gap: Use Has Outpaced Formal Policy

The most revealing problem in the current record is not merely that artificial intelligence has entered federal judicial work. It is that formal governance has not kept pace with that entry. The survey shows a judiciary operating under a patchwork of permission, discouragement, prohibition, and silence. One in three judges reported permitting or permitting and encouraging AI use in chambers. At the same time, a significant portion reported formally prohibiting AI use, another group reported discouraging but not formally prohibiting it, and a substantial share reported having no official AI policy at all. If judges who discourage but do not formally prohibit use are treated as effectively operating without formal policy, the portion functioning outside a formal rule structure becomes even larger. That is not a mature institutional framework. It is a transition environment in which use is real, expectations are uneven, and governance remains underdeveloped.

Patchwork is understandable at the beginning of institutional change. It is not a defensible resting point for courts. Courts do not ordinarily tolerate major procedural issues being governed by atmosphere, habit, or unspoken assumption. They impose filing rules, scheduling orders, preservation duties, disclosure obligations, certification requirements, evidentiary foundations, and sanctions because procedure is one of the principal ways law disciplines power. Yet on AI, a technology capable of affecting research, drafting, record handling, filing practice, and evidentiary disputes, much of the federal judicial posture remains chamber-specific, judge-specific, or locally improvised. That is not a minor administrative lag. It is a governance gap.

The seriousness of that gap becomes clearer when measured against the judiciary’s own institutional instincts in every other domain. Courts do not tell litigants to “be careful” with evidence and stop there. They define admissibility rules. They do not respond to unreliable discovery conduct with generalized reminders about professionalism alone. They create obligations, deadlines, and sanctions. They do not treat ambiguity as a virtue when the integrity of adjudication is at stake. They reduce discretion by articulating procedure. That is one of the defining habits of legal institutions. The problem with AI is not that the judiciary lacks a procedural tradition. It is that it has not yet translated that tradition into a sufficiently coherent control architecture for this technology.

This is not because the issue has gone unnoticed. Quite the opposite. The federal judiciary’s own institutional materials effectively concede that present arrangements are unsettled. A call for an AI governance framework is, by its nature, an acknowledgment that the existing landscape is fragmented. Interim guidance is not the language of institutional completion. It is the language of temporary management while something more durable is still being built. That distinction matters. Institutions issue temporary guideposts when operational reality has already outrun formal structure. In other words, the judiciary’s own administrative posture confirms the central point: AI use has advanced faster than policy.

The court-level responses illustrate the same problem from another angle. Local standing orders requiring disclosure, verification, and certification are serious measures. Sanctions for fabricated authority are serious measures. Judicial warnings about false factual and legal statements generated through AI are serious measures. But these interventions, however important, are not the same thing as a coherent governance regime. They are reactive, partial, and often limited to discrete domains such as litigant submissions. They do not settle what chambers personnel may use AI for. They do not fully define supervision duties. They do not create a comprehensive training architecture. They do not resolve the relationship between machine assistance and internal judicial workflow. They address breakdown. They do not yet provide a full architecture for prevention.

That is why it is necessary to distinguish between control and governance. A standing order aimed at fabricated citations is a form of control. A sanction for AI-assisted misrepresentation is a form of control. A local certification requirement is a form of control. Those tools matter, and they should not be minimized. But governance is broader. Governance means the institution has defined what uses are approved, what uses are prohibited, what uses require disclosure, what verification is mandatory, who supervises whom, what training is required, and what consequences attach to noncompliance. Governance does not begin only when something goes wrong. It exists to shape conduct before the failure occurs.

The absence of that fuller structure creates a particularly dangerous kind of institutional ambiguity: responsible-seeming informality. A judge discourages AI use without reducing the expectation to writing. A chamber permits limited use for “research only” without defining the boundaries of research. A clerk uses an AI-assisted feature embedded in a familiar legal platform and assumes ordinary professional caution is enough. A court warns lawyers against fabricated citations but provides little chamber-specific education on internal use. Each of these arrangements can sound reasonable in isolation. None of them sounds reckless. Yet together they create an environment in which machine assistance is present, incentives toward convenience are real, oversight is uneven, and accountability remains diffuse. Informality begins to look responsible precisely because the institution has not yet forced itself to specify what responsibility requires.

That is one reason ideological resistance can worsen the very problem it claims to oppose. The legal purist often imagines that denunciation of AI creates institutional caution. In reality, denunciation without policy may simply push use into ambiguous zones of tolerated but underdefined practice. If the profession is unwilling to articulate clear rules because doing so feels like legitimating the technology, then use does not disappear. It migrates. It moves into informal workflows, individualized understandings, vague warnings, and local custom. That is a far more dangerous outcome for courts than open acknowledgment paired with disciplined control. Ambiguity is not a protective state. It is the environment in which inconsistent practice hardens into hidden norm.

The governance gap also magnifies every other AI-related risk. Hallucinations become harder to catch when no uniform verification expectation exists. Confidentiality risks become harder to manage when approved-task boundaries are undefined. Skill atrophy becomes harder to identify when chambers use is informal and unevenly supervised. Fabricated citations are more likely to reach the court when lawyers and judges operate under mixed signals about what tools may be used and what checking is required. Uneven policy also creates a legitimacy problem of its own. A judiciary that insists on procedural rigor from everyone else should not appear content to govern its own encounter with AI through scattered local responses and unspoken chamber culture.

There is a second institutional cost as well: unequal practice. In a patchwork system, the role AI plays in federal adjudication may depend too heavily on local assignment, chamber custom, judicial temperament, or professional culture within a particular office. One chamber may have a clear written protocol. Another may rely on verbal assumptions. One district may require certifications and warn of sanctions. Another may say very little. One judge may treat AI-assisted research as tolerable if verified. Another may discourage it without defining the point at which discouragement becomes a rule. That kind of unevenness may be survivable in a short transition period. It is much harder to justify as a stable condition in a federal judicial system that claims fidelity to equal procedure and disciplined administration.

The governance gap is therefore not just one issue among many. It is the condition under which every other issue becomes harder to control. It is the reason the debate cannot remain trapped at the level of moral posture. The real institutional failure is no longer that artificial intelligence exists in the federal courts. The real failure is that the courts have entered the AI era without yet completing the work of governing it. Until that changes, every warning about hallucinations, fabricated authority, confidentiality, and legitimacy will remain at least partly reactive. And reactive systems, especially in courts, are almost always late.

VII. The Training Deficit and the Myth of Responsible Informality

The survey’s training findings expose a problem deeper than uneven adoption. They reveal a judiciary in which use has begun before education has been systematized. According to the paper, 45.5% of responding judges said court administration had not provided training on AI tools, and another 15.7% said they were not sure whether such training had been offered. Among the judges who did report that training had been provided, most attended. That result matters for two reasons at once. It confirms that training remains underprovided, and it suggests that the obstacle is not simply judicial resistance. There appears to be demand for serious instruction. What is missing is a consistent institutional structure for delivering it. A court system that permits meaningful AI use without a training architecture is not practicing controlled modernization. It is improvising under the appearance of professionalism.

Improvisation is especially dangerous in legal institutions because legal culture tends to overestimate the transferability of general analytical skill. Judges and lawyers are trained readers. They are trained skeptics. They are trained source-checkers. From those strengths, it is easy to infer that they can safely experiment with AI so long as they remain cautious. But caution is not a substitute for instruction. It is not a curriculum, not a protocol, and not a method for distinguishing one class of risk from another. General professional competence does not tell a chambers user what kinds of hallucination are most common, how a legal-specific AI tool differs from a general-purpose model, when a summarization feature may flatten adversarial nuance, why a polished paraphrase may silently distort a holding, or how retrieval and prompt structure can affect output. Training addresses those questions. Without it, even highly capable professionals can mistake fluency for accuracy and convenience for competence.

The survey’s own qualitative material makes that danger visible. Judges expressed concern about hallucinations, “zombie cases,” and skill atrophy. One chambers anecdote described a clerk obtaining ten fake cases out of eleven cited in an AI-generated memo. That anecdote matters not because it proves widespread collapse, but because it captures the basic hazard of the technology. AI output often looks professionally usable before it has been professionally verified. It arrives wearing the surface markers of competence: structure, tone, fluency, and confidence. That is precisely why training cannot be treated as an optional supplement to otherwise sound professional judgment. The risk lies in the gap between polish and trustworthiness.

A court system that ignores that gap is likely to fall back on what might be called responsible informality. The idea is simple and superficially appealing: prudent, educated professionals can be trusted to explore cautiously, verify what matters, and develop good habits through experience. This model flatters the institution. It preserves autonomy, avoids bureaucracy, and allows actors to reassure themselves that professionalism alone will absorb the risk. But responsible informality is the wrong governance model for a technology whose attraction lies in reducing friction. AI saves time. It lowers the cost of producing first-pass work. It rewards convenience. In any setting shaped by workload and deadlines, convenience presses against scrutiny. The more a tool seems to accelerate familiar tasks, the greater the temptation to treat verification as selective rather than comprehensive. A training deficit therefore does not leave the institution in a neutral state. It creates the conditions in which convenience is more likely to outrun judgment.

That dynamic is especially acute in chambers. Judicial work is often deadline-driven, research-heavy, and delegation-dependent. Clerks and staff are under pressure to organize records, identify authorities, summarize briefs, and distill large volumes of material into workable form. AI is attractive in precisely those circumstances. It promises compression. It promises orientation. It promises speed. The danger is that users operating without formal instruction may fail to appreciate how easily machine-generated shortcuts can distort the very material they are trying to clarify. A summary may omit the adversarial caveat that changes the weight of a fact. A research-oriented output may surface plausible but secondary authorities while neglecting controlling law. A chronology generated from contested submissions may carry an unexamined narrative bias into later internal analysis. These are not spectacular failures in the sanctions sense. They are workflow distortions, and they are harder to detect precisely because they often appear helpful.

There is also a second category of training need that discussions of AI misuse often understate. The risk in courts is not limited to fabricated legal authority. It includes confidentiality, data handling, record integrity, and confusion about the line between assistance and delegation. A chambers user who pastes sensitive material into an unapproved system may create a problem entirely different from false citations. A clerk who uses AI to generate a chronology from contested facts may not fabricate a single authority and still distort the structure of the case. A judge who uses an AI tool to orient himself to an unfamiliar technical subject may come away with a subtly skewed understanding before ever turning to primary authority. None of these failures fits neatly into the headline category of “hallucinated cases,” but all of them implicate adjudicative integrity. Training is what allows users to see that these are not separate curiosities. They are part of the same institutional risk profile.

This is also why verification language in administrative guidance cannot do all the work by itself. Saying that users must independently review and verify AI-generated content is correct but incomplete. Verification is not a magic word. It is a practice, and competent practice has to be taught. What counts as verification of a legal proposition? What counts as verification of a factual summary? Is checking whether a cited case exists enough, or must the user also verify the quoted proposition, the procedural posture, the factual analogy, and the current validity of the authority? How should a chambers user verify a synthesized statement that compresses multiple authorities into one polished sentence? What about a chronology assembled from mixed record materials? Verification sounds straightforward only when the output is simple. In real workflow, the difficult cases are precisely the ones in which the AI produces something too smooth to force the user to notice what must still be checked.

Training also has an accountability dimension. Courts insist, correctly, that lawyers remain responsible for the filings they submit regardless of how drafts were generated. That principle appears in modern standing orders addressing AI-assisted submissions. But a judiciary that expects external actors to understand verification, candor, and accuracy obligations cannot rely on ad hoc internal learning for its own personnel. Otherwise the institution risks a posture of asymmetric rigor: warning lawyers about a technology the courts themselves have not yet systematically taught their own users to manage. That is not a sustainable position for an institution that claims authority through disciplined process.

The survey result that most judges attended training when it was offered is therefore more important than it first appears. It suggests that the present deficit should not be explained away as a natural byproduct of judicial conservatism. The problem may not be reluctance to learn so much as institutional underprovision. That matters because it changes the governance question. If the federal judiciary already has an audience for serious instruction, then the failure to provide it consistently becomes harder to justify. A judiciary that recognizes AI as significant enough to warrant an advisory task force and interim guidance should also recognize that the efficacy of any such guidance depends on whether users are taught how to operationalize it.

A serious training architecture would have to be judiciary-specific rather than vendor-specific. It would need to cover at least several distinct domains. It would need to address source verification and how to confirm not only the existence of authority but the accuracy of propositions attributed to it. It would need to address hallucination and fabrication risk, including the difference between confidently expressed error and transparent uncertainty. It would need to address confidentiality and data handling, especially for chambers personnel working with sensitive materials. It would need to address task-based boundaries so that users understand not only what they may do, but why certain uses are more dangerous than others. It would need to address chambers supervision and approval structures so that judges, clerks, and staff understand how responsibility travels through collaborative work. And it would need to address disclosure or certification obligations where AI influences material that reaches a court filing or an operative judicial product. A training program that simply repeats “be careful” or “always verify” does not address those institutional needs. It converts responsibility into a slogan.

There is a final point, and it is not merely pedagogical. Courts cannot preserve legitimacy by assuming that professionalism will absorb technological risk on its own. Professionalism is not self-executing. It depends on infrastructure. In this context, that means training. Without training, even a formally cautious judiciary remains vulnerable to the very kind of drift it claims to oppose. Legal purism often presents itself as a defense of standards. But standards that are not taught, specified, and operationalized are not standards in any meaningful sense. They are aspirations, and aspirations do not govern institutions.

VIII. The Purists Are Right About the Risks—and Wrong About the Response

The legal purists deserve a cleaner hearing than the more breathless advocates of legal technology often give them. Their core concern is not irrational, reactionary, or unserious. Courts are not ordinary workplaces. They are institutions that issue authoritative judgments in the name of law. A technology capable of inventing cases, misstating holdings, flattening factual nuance, compressing legal complexity into smooth error, and presenting all of it in the tone of professional competence is not a neutral efficiency aid in that setting. It is a risk multiplier. If the profession is uneasy, it has reason to be uneasy.

The survey itself confirms that the concern is grounded. Judges reported apprehension about hallucinations, “zombie cases,” and skill atrophy. The paper also includes the anecdote of a law clerk who used AI to generate a memo and found that ten of the eleven cited cases were fake. That is not a cosmetic defect. It cuts directly at the point where legal reasoning claims seriousness: fidelity to actual authority. A profession built on precedent, traceable source use, and articulated justification cannot casually absorb a tool that may produce convincing legal form without legal truth. The purists do not need to exaggerate the danger. The record already gives them enough.

The federal courts’ own conduct outside the survey underscores the same reality. Courts have already issued standing orders requiring disclosure of AI use in preparing filings and certification that citations were verified. They have warned that AI-assisted filings can contain fabricated or incorrect legal authority. Federal appellate courts have treated fabricated AI-generated citations as a live sanctions problem rather than a thought experiment. The question is no longer whether AI can corrupt legal work. It plainly can. The question is what follows from that fact.

That is where the purists fail. Their mistake is not in diagnosis. It is in remedy. They often speak as though the existence of serious risk itself proves that the proper response must be categorical resistance, moral suspicion, or professional abstention. That conclusion might once have carried more force—at the threshold moment when AI could still plausibly be treated as a technology outside judicial practice, one that might be stopped at the courthouse door by principled resistance. But that moment has passed. Once AI is already inside chambers, already shaping legal research and document review, already prompting standing orders, already generating sanctions disputes, and already forcing administrative and rulemaking responses, the problem is no longer whether the institution may someday confront AI. It is already confronting it. At that point, denunciation without design is not rigor. It is abdication.

The error is subtle because the purists are often right at the level of principle. They are right that judicial legitimacy depends on human accountability. They are right that citation to nonexistent authority is intolerable. They are right that a legal profession built on source-traceable reasoning cannot safely rely on systems that generate polished synthesis without guaranteeing actual source fidelity. They are right that convenience threatens discipline. But they are wrong to assume that the defense of those principles can be achieved by refusing to articulate a governing framework. Refusal to govern does not stop use. It pushes use into ambiguity, informality, and hidden dependence.

Courts already understand this in every other domain of legal risk. They do not respond to unreliable evidence by abolishing the concept of evidence. They create authentication standards, admissibility rules, burdens of proof, and review mechanisms. They do not respond to discovery abuse by abolishing discovery. They impose preservation duties, certification requirements, and sanctions. They do not confront attorney misconduct by lamenting professional decline in the abstract. They define obligations and enforce them. In other words, courts routinely confront danger not with nostalgia but with procedure. The law’s enduring strength has never been purity. It has been control.

That is why the federal judiciary’s emerging response matters. Interim guidance emphasizing independent verification, cautioning against delegation of core judicial functions, and urging task-based approval is not a concession to technological fashion. It is the beginning of what courts do when they recognize that a risk cannot be wished away. The movement toward governance is not evidence that the purists lost the moral argument. It is evidence that the institution has begun to understand a harder truth: uncontrolled use is a bigger threat than acknowledged and bounded use. The purists see the danger. What they do not offer is a viable method for controlling it once it is already present.

There is also an epistemic weakness in the purist position. It often treats AI as though the principal danger were explicit substitution of machine for lawyer or judge. But the more serious danger may be hidden mediation. A legal professional who knows AI is prohibited for overt drafting may still use it to orient research, organize a chronology, summarize a long brief, or test a line of inquiry informally. Those uses may feel too preliminary to count as real reliance. Yet they can shape what authorities are later pursued, what facts are later foregrounded, and what themes are later treated as central. A system of categorical denunciation does not address that problem because it does not identify where the lines are, how those lines will be supervised, or what obligations attach to staying within them. It condemns the visible breach while leaving the quieter mechanisms of institutional drift underdescribed.

That is why the purists’ position is best understood as incomplete rather than wholly wrong. They correctly perceive that AI presents doctrinal, professional, and legitimacy risks. But they stop at condemnation. That is the point at which they cease to be institutionally useful. The judiciary does not need additional moral theater about why fabricated citations are bad. It needs enforceable distinctions between acceptable assistance and unacceptable delegation, between permissible experimentation and careless contamination, between competent verification and reckless reliance. Fear may be justified. But fear alone does not build a governance architecture.

There is an irony here as well. The purists often present themselves as taking the harder line because they refuse compromise. In institutional terms, theirs may be the easier position. It is always simpler to declare that a dangerous technology should stay out than to do the more difficult work of defining task categories, supervision rules, verification duties, disclosure obligations, training requirements, and sanction structures. It is easier to denounce a system than to regulate one. But institutions are not judged by the purity of their rhetoric. They are judged by whether they can govern reality. On that measure, the purists have not offered enough.

The harder and more serious position is therefore not enthusiasm. It is governance. It is the insistence that if AI is already inside the federal courts, then the institution must say where it may be used, where it may not, who is accountable, how outputs must be checked, and what consequences follow when those rules are breached. That is not compromise with risk. It is the only available alternative to unmanaged drift.

IX. Federal Courts Are Already Building Patchwork Controls

The clearest proof that artificial intelligence has become a governance issue in the federal courts is no longer the survey alone. It is the fact that federal judicial actors have already begun building controls in multiple directions at once. Those controls are not unified. They do not cover the same terrain. They do not reflect a single settled theory. But taken together, they show something important: the federal courts have moved beyond generalized anxiety and into procedural response. That transition matters because institutions do not build controls around unreal problems. They build them when practice has advanced far enough that unmanaged discretion is no longer tolerable.

One of the earliest and clearest examples is Judge Michael Baylson’s June 2023 standing order in the Eastern District of Pennsylvania. Its logic is narrow, direct, and institutionally sound. Any attorney or pro se litigant who uses artificial intelligence in preparing a filing must disclose that fact, and the filing must certify that every citation to law or the record has been verified as accurate. The order does not attempt a broad theory of machine assistance. It does not spend time philosophizing about innovation or warning dramatically about existential danger. It targets the immediate problem a court can least afford to tolerate: submissions containing synthetic authority that no human has actually checked. In doing so, it ties AI use back to familiar professional duties. The lawyer remains responsible. The citation must be real. The record reference must be real. The filing cannot become a vehicle for machine-generated fiction under the cover of legal form.

That order is important not only because of what it requires, but because of what it assumes. It assumes that concealment is part of the problem. AI use “in any way” must be disclosed. That feature can be debated. Reasonable minds can differ about whether that scope is overbroad or whether disclosure should attach only to certain categories of use. But as an institutional move, it captures two enduring governance principles that remain difficult to improve upon: candor and verification. The order recognizes that the problem is not merely false output. It is false output entering adjudication without a clear line of accountability.

The District of Kansas’s January 2026 standing order is broader and, in some respects, more system-conscious. It opens by acknowledging that courts are seeing an increasing number of filings containing false statements of fact or law, including fabricated or incorrect legal authority, and that such filings may have been generated using AI. That framing matters because it ties the technology directly to judicial workload and procedural harm. The order does not treat AI misuse as a novelty embarrassment. It treats it as a threat to the integrity of proceedings and a source of waste for both courts and litigants. Most significantly, it states that lawyers and parties, including pro se litigants, remain responsible for the contents of their filings even when AI generates part or all of them. That is a critical governance move. It refuses the idea that technological mediation dilutes professional responsibility.

The Kansas order also goes beyond simple warning. It identifies a menu of possible judicial responses: striking filings, monetary sanctions, disciplinary referrals, disqualification, filing restrictions, and dismissal. It authorizes case-specific sworn statements requiring disclosure of the extent of AI use, the tool used, the sections generated, and verification efforts. This is not merely a cautionary memo against fake cases. It is an operational framework for policing AI-assisted filing practice through responsibility, verification, and remedial discretion. It does not settle every question, but it reflects a court attempting to move from generalized concern to enforceable process.

Appellate sanctions practice reinforces the same trend from a different direction. Once federal appellate courts begin sanctioning or formally addressing AI-related briefing failures, the issue changes character. It is no longer simply a local concern or an item of professional gossip. It becomes part of federal procedural seriousness. Sanctions at the appellate level communicate that AI misuse is not confined to inexperienced filers, fringe cases, or isolated trial-court embarrassments. It has reached a level at which federal appellate courts view fabricated citations and AI-assisted misrepresentation as conduct warranting formal judicial response. That development is significant because appellate sanctions help convert cultural alarm into legal consequence. They teach the bar that the problem is no longer rhetorical.

But sanctions, by their nature, remain reactive. They address breakdown after the defective filing has already been submitted, the court’s time has already been consumed, and the institutional cost has already been incurred. Sanctions matter. They are indispensable as a back-end control. But they are not governance in the full sense. They tell actors what happens when they fail. They do not fully tell chambers, clerks, judges, and litigants what the institution affirmatively permits, expects, supervises, or forbids before failure occurs. That distinction is crucial. A mature control structure requires more than punishment. It requires ex ante architecture.

That is where the federal judiciary’s administrative response enters the picture. Interim guidance produced through the Administrative Office’s AI Task Force adds a different layer of control. It is not principally a public-facing sanction regime. It is an internal administrative scaffold. It is aimed at shaping conduct before breakdown rather than merely punishing misconduct after the fact. By emphasizing independent verification, cautioning against delegation of core judicial functions, and encouraging courts to define approved tasks and consider disclosure, the guidance reflects a preventive rather than purely punitive mindset. In governance terms, that matters a great deal. A system cannot rely on sanctions alone unless it is content to let institutional learning occur mainly through failure.

The evidentiary rules process adds still another dimension. Proposed responses to machine-generated output and suspected deepfakes show that federal concern about AI is not limited to the drafting of filings or the internal operation of chambers. It extends to the admission, evaluation, and authentication of evidence itself. That broadening of focus is revealing. It means the judiciary is not treating AI as a single narrow problem. It is confronting AI across several fronts at once: filing practice, chambers use, internal administration, sanctions doctrine, and evidentiary integrity. That multidirectional response is itself evidence of institutional seriousness. It signals that the judiciary has begun to understand AI not as one issue, but as a cluster of related procedural and epistemic pressures.

And yet, despite all of this activity, the controls remain unmistakably patchwork. Judge-specific standing orders are not district-wide policy. District-wide standing orders are not national rules. Interim administrative guidance is not the same thing as binding Code-of-Conduct language or formal rule text. Proposed evidentiary responses do not answer the problems of chambers supervision or internal task definition. Appellate sanctions punish some categories of misconduct but do not define affirmative best practices for ordinary use. Each development is rational in its own domain. Each solves or attempts to solve a real problem. But together they still reveal the absence of a single settled governance architecture.

Patchwork has costs, and those costs are not merely aesthetic. It creates uneven expectations across districts. It makes litigant obligations contingent on local assignment. It leaves chambers personnel subject to differing degrees of supervision depending on where they work and under whom they serve. It risks generating the misleading impression that AI governance is mainly a problem for careless lawyers rather than a problem implicating the judiciary’s own internal workflow. Most importantly, patchwork invites adaptation through loopholes. Where one court demands disclosure, another may not. Where one chamber has a written policy, another may operate on informal assumptions. Where one district imposes strong sanctions, another may rely on ambient caution without procedural teeth. Institutions governed through uneven controls tend to produce uneven habits, and uneven habits are exactly what courts ordinarily resist in other contexts.

Still, patchwork should not be dismissed as though it were meaningless. It is often the normal form of early institutional response. The point is not that the federal courts have done nothing. The point is that they have already done enough to prove the issue is real, while not yet enough to produce convergence. That is what makes the present moment significant. The federal courts are no longer deciding whether AI requires procedure. They are deciding what kind of procedure, at what level, and under whose authority. That is a much more serious question, and it is the one the profession has to answer next.

X. From Patchwork to Principle: What Judicial AI Governance Should Require

The present landscape suggests that the federal judiciary is moving toward governance by accretion. Standing orders appear in individual courts. Sanctions emerge in response to defective briefing. Interim guidance develops at the administrative center. Committee work proceeds at the rulemaking level. Chamber-specific norms exist in scattered and uneven forms. That pattern is understandable in a transition period. It is not adequate as a destination. A judiciary that derives legitimacy from disciplined process should not remain indefinitely in a state where AI governance depends on a mixture of local innovation, professional instinct, and institutional improvisation. The question is therefore no longer whether some controls exist. They do. The question is what principles should organize them so that the system can move from reactive patchwork to coherent governance.

The first principle must be task distinction. Artificial intelligence cannot be governed intelligently if all uses are treated as though they carry the same institutional meaning. Legal research triage is not the same as opinion drafting. Administrative correspondence is not the same as summarizing a contested record. Preparing a speech outline is not the same as framing dispositive legal analysis. Chambers need a governance structure that begins with categories of function rather than abstract anxiety. Some tasks are peripheral enough that bounded use may be tolerable under verification rules. Some tasks are close enough to adjudicative judgment that they should be tightly cabined or prohibited. The point is not that every possible use can be reduced to a neat chart in advance. The point is that no serious institution should govern a consequential technology by pretending that all uses pose the same risk.

This principle matters because task confusion is one of the main ways informal dependence takes hold. If a chamber allows “AI for research” without defining what research includes, the label becomes elastic enough to swallow far more than intended. Does research include issue spotting? Summary of a factual record? Drafting a case comparison? Proposing an analytical framework? Organizing deposition excerpts into narrative order? Without task distinction, broad permission quietly becomes uncontrolled drift. Governance begins when the institution stops using large abstractions and starts specifying what conduct actually occurs inside workflow.

The second principle is verification as a non-delegable duty. Every serious federal response to AI so far converges on this point for good reason. Machine assistance may be tolerated in certain settings, but it cannot relieve the human actor of responsibility for source accuracy, factual correctness, procedural truthfulness, or fair representation of authority. Verification is not a best practice in this setting. It is the minimum condition of legitimacy. A court cannot claim fidelity to law if it permits machine-generated assertions to circulate through chambers or filings without disciplined human checking. That principle should be universal, not local. It should not depend on whether a judge happens to issue a standing order or whether a particular district has already encountered an embarrassing failure.

But verification has to be understood properly. It is not enough to check whether a case exists. Competent verification requires more than existence. It requires confirming that the authority says what the AI claims it says, that the quotation is accurate, that the procedural posture is correctly represented, that the case remains good law where that matters, and that the analogy being drawn has not been silently distorted in translation. The same is true for factual material. A polished summary of a record must be checked against the actual record, not merely accepted because it sounds balanced. Verification, in other words, is not a ritual word. It is a structured practice, and governance must treat it that way.

The third principle is supervisory accountability. Governance that focuses only on the individual end-user is inadequate because judicial work is distributed. Chambers are collaborative institutions. Clerks and staff may use AI in ways judges do not fully see, especially at the early stages of research and document handling. A judge therefore cannot meaningfully preserve human judicial accountability while allowing subordinate AI use to develop through private discretion and unwritten assumptions. Chambers need written protocols. Judges need to know whether, when, and how AI is being used by clerks and staff. Approved-task boundaries must be communicated and monitored. A governance regime that does not account for distributed judicial labor is not governing the institution that actually exists.

This is also where the old model of chamber culture becomes insufficient. Trust, informality, and professional instinct are often enough to manage ordinary workflow variation. They are not enough for a technology whose appeal lies precisely in how easily it can disappear into routine work. A clerk using AI to orient a research question or compress a record may believe the use is too preliminary to matter. A judge who does not ask may never know it happened. Supervisory accountability exists to close that gap. Not because clerks are untrustworthy, but because legitimacy cannot depend on invisible technological mediation inside a process that still presents itself as wholly traditional.

The fourth principle is role-based disclosure. Disclosure should not be performative, but neither should it be avoided simply because it is awkward or because the line is difficult to draw. The question is contextual: when does AI involvement become significant enough that candor to the court, the parties, or internal judicial supervisors is required? For litigant filings, the answer is clearest. If AI materially shaped a submission in ways that affect verification burdens, legal accuracy, or the court’s ability to rely on the filing, the court may reasonably require disclosure or certification. For chambers, the issue is more delicate but no less real. A judge may reasonably require that clerks disclose AI use upward within the chamber even if no public disclosure to the parties is required. The point is not maximal transparency at all times. The point is to identify the points at which undisclosed use creates accountability gaps the institution should not tolerate.

Role-based disclosure is also preferable to a crude universal rule because it respects the differences between internal and external use. Not every instance of AI assistance should trigger the same disclosure consequence. A clerk using a tool to draft a speech outline does not present the same concerns as a lawyer using AI to generate legal analysis for a filing. A judge using a tool to test wording in a nonadjudicative context does not raise the same institutional stakes as a chambers user summarizing a contested evidentiary record. Governance must therefore ask not simply whether AI was used, but by whom, for what, and with what effect on the integrity of the material later relied upon.

The fifth principle is mandatory training wherever use is permitted. This follows directly from the survey’s training deficit and from the judiciary’s own emphasis on verification. No court should authorize AI-assisted work without ensuring that users understand hallucination risk, fabricated authority, confidentiality constraints, task boundaries, and the difference between apparently polished output and actually reliable work. Training should be judiciary-specific, recurring, and role-sensitive. It should not be outsourced entirely to vendors, whose incentives do not fully align with institutional caution. And it should distinguish among tool types. The risks presented by AI-assisted legal research are not identical to those presented by transcription tools, document-summarization systems, or open-ended generative models. A serious training regime cannot speak in slogans. It must speak in functions.

Mandatory training also serves a legitimacy function. Courts cannot insist that lawyers remain responsible for AI-assisted submissions while treating their own internal training obligations as optional. If the judiciary expects external actors to understand the duties that survive technological mediation, it must build comparable understanding internally. Otherwise it will create the appearance of asymmetric rigor: a court system warning others about risks it has not systematically taught its own users to manage.

The sixth principle is calibrated enforcement. Courts should punish misconduct, not mere technological association. This distinction matters. An enforcement regime that punishes the label “AI use” without regard to function, transparency, or verification will either chill transparent low-risk uses or drive uses underground. Neither result is desirable. The proper targets of enforcement are fabricated authority, reckless verification failures, concealment where disclosure is required, misuse of sensitive data, task-boundary violations, and other conduct that undermines adjudicative integrity. Enforcement must be principled enough to deter abuse without encouraging secrecy as the easiest route to avoiding consequences.

That requires a mature understanding of deterrence. Institutions deter best when actors know not only that sanctions exist, but what conduct triggers them and why. A lawyer or chambers user who understands that the violation lies in fabricated authority or unverified factual representation is more likely to internalize the correct norm than one who hears only that “AI is dangerous.” Governance succeeds when it teaches actors to connect discipline to institutional principle rather than to the cultural stigma of a tool.

The seventh principle is convergence. Not total uniformity immediately, and not elimination of all local variation. But convergence toward a recognizable federal baseline. The federal courts should not allow the acceptable use of AI to depend too heavily on the happenstance of district assignment, chamber culture, or individual judicial preference. Some variation is inevitable and sometimes healthy. But baseline expectations concerning verification, supervisory responsibility, training, disclosure, and categories of prohibited delegation should not remain radically fragmented. Legitimacy suffers when institutions confronting the same technology operate under irreconcilably different assumptions about what professional responsibility requires.

Convergence matters for another reason as well. Patchwork governance encourages loophole thinking. Users learn that what is unacceptable in one setting may pass unnoticed in another. Institutions with sharply uneven control structures tend to breed sharply uneven habits. A judiciary that values equal process should hesitate before permitting such divergence to become the long-term condition of AI use. The goal is not an instant national code that answers every possible question. The goal is a baseline strong enough that local experimentation occurs within recognizable boundaries rather than in procedural free space.

This is the point at which the reader should resist two opposite temptations. One is the demand for total prohibition. The other is the demand for immediate normalization. Neither is realistic and neither is institutionally intelligent. The better path is principled containment: define the tasks, train the users, require verification, supervise chambers, demand candor where it matters, sanction actual misconduct, and move the system toward convergence. That is not capitulation to technology. It is the ordinary legal response to risk.

The law has never managed danger by purity alone. Its most durable achievement has been something more demanding: it translates principle into procedure. If artificial intelligence is already inside the federal courts, then the question is no longer whether the judiciary can preserve itself by refusing to think about it. The question is whether it can build a control structure worthy of adjudication.

XI. Legitimacy Requires Candor, Not Denial

At bottom, the dispute over AI in the federal courts is not mainly about efficiency. It is about legitimacy. Courts claim a special kind of authority. They do not merely resolve disputes. They purport to do so through reasoned judgment under law. That claim depends on more than correct outcomes. It depends on visible fidelity to process, source integrity, and accountable human agency. Any technology that threatens to blur those elements will naturally provoke anxiety. That anxiety is not technophobic. It is constitutional in spirit. A system that permits coercive state power to rest on synthetic error, hidden machine mediation, or unverifiable legal work risks weakening the very basis on which judicial authority is publicly accepted.

That is why the AI debate cannot be reduced to a dispute about convenience. The real issue is whether the federal courts can preserve confidence in adjudication while integrating a technology associated with fabrication, opacity, and uncertain control. Courts are not technology firms, consulting shops, or ordinary professional offices. They exercise power in the name of law. Their legitimacy therefore depends not only on what they decide, but on how they reach decision. If the process becomes opaque at the very points where authority, fact, and analysis are first assembled, the problem is not merely technical. It is institutional.

But legitimacy is not preserved by pretending that nothing operational has changed. Once artificial intelligence has entered judicial research, document review, filing practice, evidentiary debate, and administrative governance, the idea that courts can protect their authority through silence becomes untenable. Silence is not neutrality. It is institutional opacity. And opacity is itself corrosive when the subject is a technology widely associated with fabricated authority, hidden assistance, and uncertainty about who actually controlled the work. A court system that says little while the technology becomes more integrated does not appear restrained. It appears behind its own reality.

That is why recent federal judicial responses matter even beyond their immediate substance. Task forces, interim guidance, standing orders, sanctions, and rulemaking discussions all reflect a common institutional recognition: AI is no longer a hypothetical issue external to the judiciary. It is already a judicial problem. These measures may be partial and uneven, but they are nonetheless forms of candor. They acknowledge that the institution must respond. And that acknowledgment is itself important, because it aligns the judiciary’s public self-description more closely with its actual operating conditions.

Candor matters for at least three reasons.

First, candor aligns institutional rhetoric with institutional reality. Courts lose credibility when their public posture suggests an unchanged traditional process while their actual workflow has materially shifted. If chambers are using AI in research, summary, or drafting-adjacent tasks, and the judiciary continues to speak as though the problem exists only at the margins or only among careless lawyers, then the institution begins to describe itself inaccurately. That kind of mismatch is dangerous. Public confidence erodes not only when courts make mistakes, but when courts appear unwilling to describe honestly the conditions under which they now work.

Second, candor allows governance to develop before repeated breakdown forces it into existence. Institutions that suppress discussion of operational change often end up regulating only after visible scandal. That is the worst way to govern. It produces reactive policy, moral panic, uneven punishment, and rushed line-drawing under pressure. A candid institution does something different. It identifies the issue before the next failure. It states the risks openly. It defines responsibilities in advance. It tells judges, clerks, lawyers, and litigants what standards will apply before a fabricated citation, distorted summary, or undisclosed use creates a legitimacy event. Candor is therefore not merely transparency for its own sake. It is a condition of timely supervision.

Third, candor enables accountability. A court cannot assess whether standards are being met if it refuses to say when AI use matters, who is responsible for checking it, what uses are acceptable, and what counts as failure. Accountability requires visible norms. Without them, error can always be treated as an isolated lapse rather than the predictable product of underdefined practice. Candor supplies the baseline against which conduct can be judged. It makes responsibility traceable rather than atmospheric.

The alternative is denial, and denial is not a neutral resting state. In institutional life, denial usually produces one of two outcomes. Either the practice continues informally under conditions of uneven supervision, or the practice becomes visible only through scandal when a particularly glaring failure forces acknowledgment. Both outcomes are worse than open governance. Informal continuation breeds drift. It allows actors to adapt quietly, without clear limits, until the institution’s actual practice is farther along than its formal rules admit. Scandal-driven acknowledgment is no better. It invites overcorrection, symbolic punishment, and improvised restrictions that may be more performative than principled. Neither path is consistent with a judiciary committed to reasoned and evenhanded administration.

There is also a democratic dimension to candor that should not be understated. The public is entitled to know, at least in broad terms, how the courts are confronting a technology that can affect filings, evidence, judicial workflow, and the production of authoritative legal texts. This does not mean every internal use of a tool must become a public spectacle. It does mean that the judiciary should not rely on mystique. In modern institutions, trust no longer survives on ceremony alone. It depends increasingly on intelligible safeguards. If courts expect lawyers and litigants to disclose or verify AI-assisted work where appropriate, it is hardly unreasonable to expect the judiciary to articulate its own baseline principles for internal use and control.

This is where the legitimacy question becomes sharper than many purists allow. Yes, artificial intelligence poses risks to reasoned human judgment. But unmanaged or undisclosed AI use poses a second and perhaps greater risk: that the judiciary will appear either unable or unwilling to govern a technology already affecting its work. In that circumstance, the problem is no longer just machine unreliability. It is institutional evasiveness. A judiciary that warns others about technological risk while speaking too softly about its own encounter with that risk invites a different kind of doubt—the doubt that it understands the demands of its own legitimacy.

This is also why legal purism, standing alone, underperforms as a legitimacy theory. It treats judicial authority as something protected by refusal alone. But refusal cannot protect authority once the underlying practice persists. If AI continues to enter chambers, filings, and evidentiary disputes despite rhetorical resistance, then legitimacy no longer depends on whether the institution disapproves. It depends on whether the institution governs honestly. An institution that refuses to name the problem after the problem has already arrived does not preserve integrity. It forfeits control.

The question, then, is no longer whether the federal courts can preserve legitimacy by keeping AI outside the system entirely. That possibility has already been overtaken by events. The real question is whether the judiciary will preserve legitimacy by facing the technology directly—by stating where it is present, where it is permitted, where it is prohibited, who is accountable, and how the integrity of adjudication will be protected despite its presence. That is the work candor makes possible. Without candor, legitimacy becomes an unsupported claim. With candor, it at least has a chance to remain a governed one.

XII. Conclusion: Denial Is Not Discipline

The debate over artificial intelligence in the federal courts has moved beyond its opening stage. The issue is no longer whether AI will matter to federal adjudication. It already does. It is present in judicial work. It is present in chambers. It is present in legal research, document handling, filing practice, sanctions disputes, and the judiciary’s own administrative planning. The question is no longer whether the federal courts may someday confront AI as an institutional problem. They are confronting it now.

What makes the present moment significant is not that the issue has been resolved. It plainly has not. The judiciary is in a transitional condition rather than a settled one. A substantial number of judges still report no use of listed AI tools. Training remains uneven. Formal policy is often absent, incomplete, or chamber-specific. Some courts have begun building procedural controls. Others are still operating through local improvisation, verbal custom, or interim guidance. The national rulemaking process is moving, but it is not complete. The federal courts have therefore entered the AI era without yet completing the work of governing it. That is not unusual in institutional life. Practice often arrives before principle. But in courts the lag matters more, because courts claim legitimacy through disciplined method, not through retrospective rationalization.

The legal purists are right about the stakes. Artificial intelligence can fabricate, distort, flatten, and deceive. It can tempt busy professionals to mistake polished output for reliable analysis. It can encourage a style of legal work in which fluent synthesis arrives before disciplined engagement with actual sources. In adjudicative systems, those are not ordinary efficiency tradeoffs. They are legitimacy risks. A court that relies, even indirectly, on synthetic error does not merely make a technical mistake. It compromises the very conditions under which its judgments claim authority.

But the purists remain wrong about the response. Denial cannot solve a problem already inside the institution. It cannot train judges, clerks, or staff. It cannot define task boundaries. It cannot protect confidentiality. It cannot tell chambers when AI use becomes significant enough to require supervision or disclosure. It cannot distinguish between acceptable assistance and impermissible delegation. It cannot create a verification culture. It cannot build a rule structure. It can only leave those questions unresolved while actual use continues. That is not institutional rigor. It is a refusal to govern.

The proper response is therefore governance. Not rhetorical panic. Not abstract technological enthusiasm. Governance. That means written policies rather than ambient expectations. It means task-based distinctions rather than undifferentiated slogans. It means verification as a non-delegable duty. It means supervisory accountability inside chambers. It means role-sensitive disclosure where candor is necessary. It means judiciary-specific training rather than vague appeals to caution. It means sanctions for actual misconduct rather than symbolic hostility to software labels. It means movement toward baseline federal expectations even if some local variation remains unavoidable.

This is not a concession to technology. It is the ordinary legal response to risk. Law has never preserved its seriousness by merely repeating its values in louder language. It has preserved seriousness by translating values into procedure. When evidence is unreliable, the law builds admissibility rules. When lawyers overreach, the law imposes duties and sanctions. When institutions confront a new source of distortion, the law’s task is not to sentimentalize older methods. It is to decide what structures are now necessary to preserve the integrity of judgment. AI is no exception.

That is why the question before the federal courts is now narrower and harder than many commentators admit. The question is not whether the judiciary can restore a world in which this technology remained wholly outside the courthouse doors. That moment has passed. The question is whether the judiciary can now govern a technology already in its midst with enough seriousness, candor, and procedural discipline to preserve confidence in adjudication. If it can, then AI will become one more risk managed through law’s familiar tools of supervision, verification, and accountability. If it cannot, then the problem will not be technology alone. The problem will be that the courts failed to impose on themselves the same rigor they demand from everyone else.

Artificial intelligence is already inside the federal courts. That is not prophecy. It is the present tense. The real measure of the judiciary will not be whether it managed to keep AI entirely outside the system. The real measure will be whether it can meet the technology with the one thing courts are supposed to know how to provide: disciplined control. Denial is not discipline. In this setting, only governance is.

 

Reader Supplement

To support this analysis, I have added two companion resources below.

First, a Slide Deck that distills the core legal framework, and institutional patterns discussed in this piece. It is designed for readers who prefer a structured, visual walkthrough of the argument and for those who wish to reference or share the material in presentations or discussion.

Second, a Deep-Dive Podcast that expands on the analysis in conversational form. The podcast explores the historical context, legal doctrine, and real-world consequences in greater depth, including areas that benefit from narrative explanation rather than footnotes.

These materials are intended to supplement—not replace—the written analysis. Each offers a different way to engage with the same underlying record, depending on how you prefer to read, listen, or review complex legal issues.

About the Author

Eric Sanders is the owner and president of The Sanders Firm, P.C., a New York-based law firm concentrating on civil rights and high-stakes litigation. A retired NYPD officer, Eric brings a unique, “inside-the-gate” perspective to the intersection of law enforcement and constitutional accountability.

Over a career spanning more than twenty years, he has counseled thousands of clients in complex matters involving police use of force, sexual harassment, and systemic discrimination. Eric graduated with high honors from Adelphi University before earning his Juris Doctor from St. John’s University School of Law. He is licensed to practice in New York State and the Federal Courts for the Eastern, Northern, and Southern Districts of New York.

A recipient of the NAACP—New York Branch Dr. Benjamin L. Hooks “Keeper of the Flame” Award and the St. John’s University School of Law BLSA Alumni Service Award, Eric is recognized as a leading voice in the fight for evidence-based policing and fiscal accountability in public institutions.

Scroll to Top