Executive Summary
Artificial intelligence is no longer a speculative issue for the federal courts. It is already present in judicial workflow. A 2026 random-sample survey of federal judges reports that more than 60% of responding judges used at least one AI tool in judicial work, with legal research as the leading reported use case and document review next. The same study found reported AI use by others in chambers as well. Those findings do not support overstatement, and the authors identify meaningful limitations, including a 22.3% response rate and possible self-selection and non-response bias. But they do establish the threshold fact that now matters most: AI is already inside federal judicial chambers.
That fact changes the terms of the debate. The central institutional question is no longer whether AI should enter the federal courts. It is whether the judiciary will govern a technology already affecting research, document handling, drafting-adjacent work, and chambers operations with the rigor adjudication requires. On that question, the present record shows a judiciary in transition rather than one operating under a settled control structure. The survey reports uneven policy and training: 45.5% of responding judges said court administration had not provided AI training, 15.7% were unsure whether training had been offered, and 24.1% reported having no official AI policy. If judges who discouraged but did not formally prohibit AI use are also treated as operating without formal policy, the paper states that figure rises to 41.7%.
The federal judiciary’s own institutional response confirms that this is now a governance problem, not a hypothetical one. The Administrative Office reported in 2025 that it had established an advisory AI Task Force and that the task force developed interim guidance for the Judiciary. That guidance directed judiciary users to review and independently verify AI-generated work product, cautioned against delegating core judicial functions to AI, and encouraged courts to define the tasks for which locally approved AI tools may be used. The federal judiciary’s September 2025 Strategic Plan separately called for establishment of an AI governance framework to guide responsible adoption and manage risks presented by new and evolving AI technologies. Courts have also begun responding more concretely through disclosure requirements, verification rules, and sanction frameworks aimed at fabricated or incorrect AI-assisted submissions.
These developments show a judiciary moving unevenly but unmistakably from reaction to administration. The real institutional problem is not AI in the abstract. It is AI inside adjudicative systems without a sufficiently mature framework of supervision, verification, training, disclosure, and task limitation. The naysayers are right that hallucinations, fabricated authorities, and erosion of professional judgment are serious risks. But once AI is already in use, resistance without governance is not rigor. It is drift. The federal courts will not preserve legitimacy by pretending AI is absent from chambers and litigation practice. They will preserve legitimacy, if at all, by governing a technology already inside the system with the same discipline they demand from everyone else.
I. The Debate the Purists Already Lost
The legal purists are arguing a question the federal courts have already moved beyond. Artificial intelligence is no longer standing outside the judicial system as a speculative threat, waiting for the profession to decide whether it belongs. It is already inside federal judicial chambers, already present in judicial workflow, and already forcing administrative, procedural, and disciplinary responses. A 2026 random-sample survey of federal judges reports that more than 60% of responding judges used at least one AI tool in judicial work. Legal research emerged as the leading reported use case, document review followed, and judges also reported AI use by others in chambers. That does not describe a future problem. It describes an existing institutional condition.
The purist instinct nevertheless has an obvious appeal. Courts are among the few institutions that still claim legitimacy through reasoned process rather than naked force. They insist on traceable authority, adversarial testing, disciplined justification, and accountable human judgment. A technology capable of producing plausible but false analysis, fabricated citations, synthetic confidence, and polished distortion appears, at first encounter, to be fundamentally at odds with adjudication. The concern is not irrational. In a legal culture trained to equate rigor with control, suspicion toward AI has real intellectual and professional force.
But force of instinct is not the same thing as adequacy of response. The problem with the purist position is not that it is morally unserious. The problem is that it is operationally late. By the time much of the profession settled into the view that AI should be treated as an external menace to be resisted at the courthouse door, the federal judiciary had already entered a transition period. The survey does not show saturation, and it should not be exaggerated into one. The authors themselves identify important limitations, including a 22.3% response rate and possible self-selection and non-response bias. But even with those limits, the threshold fact remains. AI is no longer hypothetical in federal judicial work.
That point is often obscured because the debate is framed in absolutist terms. If AI is not yet universal, some assume it remains marginal. If only 22.4% of responding judges reported weekly or daily use, some take comfort in the thought that the system remains essentially unchanged. If 38.4% reported never using any listed AI tool in their work, others infer that enough professional resistance could still arrest the transition. Those inferences are mistaken. Institutional consequence does not begin at saturation. In courts, it begins much earlier. A tool need not dominate judicial work to alter it. It is enough that the tool enters the upstream functions that shape what authorities are surfaced, what facts are foregrounded, what analytic routes are pursued, and what drafts first look like before final human judgment is expressed. A technology that enters research, review, summary, and workflow organization has already entered the environment in which adjudication is made.
This is why the usual public caricature of the problem is too shallow. The survey does not support the claim that federal judges are broadly handing decisions to machines. It reports only limited use in direct decision-making categories. But the more serious institutional issue was never confined to whether a judge would allow a chatbot to resolve a case. The harder question is whether AI alters how legal problems are framed before the formal act of judgment ever occurs. Courts do not simply decide. They research, filter, compare, summarize, organize, and frame before they decide. That is where institutional vulnerability lies. A system built on human adjudication can still be materially affected by machine assistance long before any judge would describe the machine as having “made” a decision.
The judiciary’s own conduct confirms that the issue has already moved from theory to administration. The Administrative Office reported in 2025 that it had established an advisory AI Task Force and that the task force developed interim guidance for the Judiciary during that year. The guidance directed judiciary users to review and independently verify AI-generated work product, cautioned against delegating core judicial functions to AI, encouraged courts to define the tasks for which locally approved AI tools may be used, and advised courts to consider disclosure and confidentiality concerns. The federal judiciary’s September 2025 Strategic Plan separately called for establishment of an AI governance framework to guide responsible adoption and manage the risks presented by new and evolving AI technologies. Once the judiciary’s own administrative arm has moved from observation to guidance, professional denial stops looking principled and starts looking unserious.
The same is true at the court level. Judge Michael Baylson did not issue his 2023 standing order in the Eastern District of Pennsylvania because AI was an imaginary concern. The District of Kansas did not issue its district-wide standing order in January 2026 because fabricated citations and AI-assisted inaccuracies were merely fashionable talking points. Those orders emerged because federal courts were already confronting the effect of AI on filing practice, source reliability, and procedural integrity. Disclosure requirements, certification requirements, and sanction warnings are not evidence of speculative anxiety. They are evidence of a system already trying to control a technology that has entered real litigation practice.
There is a deeper professional failure underneath the rhetorical one. Lawyers often confuse denunciation with discipline. The profession assumes that once it has articulated a sound normative objection, it has substantially discharged its institutional responsibility. But institutions do not run on objections. They run on procedures, rules, supervision, enforcement, and incentives. A lawyer may be entirely correct that AI hallucinates, fabricates, overstates, and compresses nuance into smooth error. Yet if that same lawyer resists written policies, approved-use categories, verification duties, disclosure rules, training obligations, and supervisory controls, that lawyer has not defended adjudication. The lawyer has merely left the field open to informal adaptation.
That is the point the purists have not confronted. Their skepticism may be justified, but skepticism is not governance. Professional disapproval is not a control structure. A judiciary that has already entered a period of uneven AI use cannot preserve legitimacy by pretending the issue remains premature. Once the technology is already present in chambers, already visible in filing practice, already prompting administrative guidance, and already producing local procedural responses, the only serious question is how it will be governed.
The profession can still decide what kind of control structure it wants. It can still decide whether AI use in federal courts will be supervised, verified, and transparently bounded, or whether it will remain patchy, informal, and reactive. What it can no longer plausibly claim is that the question remains external to adjudication. That argument has already been overtaken by the record.
Artificial intelligence is already inside the federal courts. The only serious question left is whether the judiciary intends to govern that fact or merely endure it.
II. The Threshold Fact: AI Is Already in Federal Judicial Work
One of the easiest mistakes in the public debate over AI and courts is to focus on the most theatrical possibility instead of the most operationally important one. The dramatic fear is the fantasy of the robo-judge: a machine replacing deliberation, substituting synthetic output for human reasoning, and perhaps even rendering judgment directly. That image is rhetorically useful because it is obviously intolerable. It allows lawyers to reject AI in its most extreme form and feel that the institution has thereby been defended. But it is the wrong place to look if the goal is to understand what is actually happening inside federal chambers. The principal issue is not spectacular automation of judgment. It is quieter integration into the workflow from which judgment is built.
The survey makes that clear. The leading reported use case for judges was legal research. Reviewing documents followed. When related uses were grouped into broader categories, judges reported notable AI use for reviewing, searching, and analyzing documents; for legal research; and for drafting and editing. By contrast, only a very small share reported using AI to make decisions, and a somewhat larger but still limited share reported using AI to inform decisions. Those distinctions matter. They show that the technology is entering judicial work first through support functions rather than through explicit transfer of ultimate decisional authority. That fact is often misread as reassuring. It is not. It is the reason the problem has to be taken seriously.
In courts, support functions are not peripheral. They are the architecture of perception. Legal research determines which doctrines are surfaced and which remain invisible. Document review determines what factual mass becomes cognitively available. Summaries and chronologies affect perceived significance, causal sequence, and narrative salience. Drafting support can influence tone, framing, emphasis, and the early organization of legal analysis. A judicial process built from distorted inputs may remain formally human at the moment of final decision while becoming substantively warped much earlier in the chain. The line between support and judgment is real, but it is not impermeable. That is why arguments that focus only on whether a machine “made the decision” are too blunt to capture the actual institutional risk.
This is the central point the legal purists tend to miss. Courts do not begin with judgment. They arrive at judgment through layers of preliminary work. Someone identifies the issue. Someone gathers the cases. Someone decides which facts matter first. Someone organizes the record, compresses a chronology, prepares a bench memorandum, or shapes the first draft of a research path. Even when the judge is the sole ultimate decision maker, the decision emerges from a production process. If AI enters that process at the front end, it may influence adjudication without ever being described as deciding anything. The technology’s institutional significance therefore lies less in outright substitution than in mediation.
The survey’s qualitative responses reinforce that point because they show the range of entry points. Some judges described using AI as a tool like a treatise before beginning research. Others described broad-question orientation, document reading, summary assistance, or word-choice help. One judge reported using ChatGPT to prepare a first draft of a CLE outline rather than case-related work. Another used Copilot to prepare for a talk involving a foreign jurist. At the same time, one chambers anecdote described a law clerk who used AI to write a memo that cited ten fake cases out of eleven. Taken together, those examples matter because they show that AI does not arrive only when someone decides to generate an opinion. It arrives through orientation, organization, citation checking, administrative drafting, speech preparation, and curiosity-driven comparison. It enters through the low-friction points in professional workflow.
That breadth is what makes the governance problem difficult. A narrow prohibition aimed only at overt adjudicative delegation may leave untouched a wide field of machine-influenced tasks that still shape adjudication. Conversely, a broad prohibition on all AI use may quickly prove unrealistic in an environment where legal-specific research tools already integrate AI functions into ordinary workflow. The problem is therefore not susceptible to slogans. It requires classification. Which tasks are sufficiently peripheral that they may be allowed under verification rules? Which tasks are sufficiently close to adjudicative judgment that they should be tightly cabined or prohibited? Which uses require approval? Which require disclosure? Which require supervisory awareness? Which demand formal training before use? Those are not philosophical questions dressed up as administrative detail. They are the core of the institutional problem.
This is also why the federal judiciary’s interim guidance, at least in concept, matters so much. Its significance does not lie only in the warning against delegating core judicial functions to AI. Its significance lies in the recognition that courts must identify the tasks for which approved AI tools may be used and that users must independently review and verify AI-generated content. That is a workflow-centered response. It implicitly acknowledges that the real question is not simply whether final judicial judgment remains human. The real question is how far machine assistance may extend into the material from which human judgment is formed. A tool used for preliminary administrative support presents one level of risk. A tool used to summarize a complex record presents another. A tool used to frame research paths, extract legal themes, or draft operative adjudicative text presents still another. The location of the technology inside the workflow is therefore not incidental. It is the key variable.
The role of chambers personnel makes the issue even more complicated. Judicial work is collaborative, even when judicial authority is singular. Clerks and staff help research issues, organize records, prepare chronologies, and shape the early structure of analysis. The survey reports that judges saw AI use by others in chambers at higher rates than they reported for themselves in several categories, including legal research. The authors also note that judges may underreport others’ use because they may not know all the ways in which chambers personnel have begun experimenting with or integrating AI tools. That observation is institutionally significant. It means that AI’s entry into workflow may be partly opaque even within the chamber itself. A judge may maintain a principled skepticism while a clerk uses AI for preliminary issue spotting. A chambers policy may exist only as an assumption rather than a writing. Oversight may be sincere and still incomplete. That is how shadow procedure develops: not through a single dramatic institutional announcement, but through small acts of tolerated convenience that accumulate without clear supervision.
Once chambers are viewed as real institutions rather than as abstractions, the inadequacy of the standard debate becomes obvious. The relevant question is not simply whether the judge personally used AI. The relevant question is whether AI has begun influencing the chain of work product from which formally human judgment later emerges. That is a much harder question because it implicates supervision, attribution, and internal accountability rather than merely personal ethics. It also explains why the public fixation on the extreme nightmare—the machine judge—can be so misleading. Institutions are often changed less by spectacular substitution than by modest, repeated incursions into routine workflow.
The federal court orders that have emerged in response to AI misuse reflect the same understanding. They do not focus solely on whether AI displaced a human decision maker. They focus on whether AI was used in preparing a filing and whether the resulting content was verified. That approach is telling. It recognizes that the institution’s vulnerability lies not only in overt substitution of machine for lawyer or judge, but in unnoticed contamination of the materials courts receive and rely upon. Quoted passages, paraphrases, legal analysis, procedural histories, and factual summaries can all be shaped by AI without anyone claiming that the machine rendered judgment. Yet each of those elements can affect adjudication if not properly checked.
This is why “support function” is too comforting a phrase. In ordinary organizations, support work may be operationally important without being normatively central. In courts, support work is often where the legal and factual universe of a case first takes shape. That is why legal research is not just a back-office activity. It is one of the mechanisms by which law becomes visible to the decision maker. That is why document review is not merely clerical. It is one of the mechanisms by which fact becomes legible. That is why summarization is not just convenience. It is one of the mechanisms by which complexity is compressed into priority. Once AI enters those functions, it enters the preconditions of judgment.
The practical implication is uncomfortable but unavoidable. AI’s most important effect on courts may not come from any spectacular attempt to automate judicial decision making. It may come from its quieter integration into the daily mechanics of research, summary, organization, and drafting. It may come from changes in how issues are first framed, how records are first processed, and how authority is first surfaced. That is precisely why legal purism is inadequate as an institutional response. It condemns the nightmare and ignores the workflow.
The more serious professional task is therefore not to repeat, in ever firmer tones, that judges must remain human. Of course they must. The harder task is to identify where machine assistance has already entered the production of judicial work and to decide, function by function, what the institution will allow, what it will supervise, and what it will forbid. Until that question is answered, the debate will remain rhetorically intense and operationally underdeveloped. And that is exactly the condition in which a technology like this is most likely to outrun the people responsible for controlling it.
IV. Why the Judiciary Is Preferring Legal-Specific AI
A recurring weakness in public discussion of artificial intelligence and courts is the tendency to treat “AI” as though it were a single undifferentiated object. That habit is analytically lazy and institutionally misleading. The survey does not describe a judiciary engaging AI in one uniform way. It describes a judiciary that is already distinguishing among tools, whether or not the profession has fully absorbed the significance of that distinction. Judges reported using legal-specific AI tools more often than general-purpose systems and using them more frequently. Westlaw AI-Assisted or Deep Research was the most-used reported tool. CoCounsel also showed substantial use. ChatGPT ranked high enough to matter, but many other general-purpose tools registered only minimal or rare use. Those differences are not incidental. They show the route by which AI is becoming institutional rather than merely fashionable.
That route matters because institutions do not absorb new technologies in the abstract. They absorb them through channels of existing trust. Legal-specific AI tools arrive embedded in platforms already associated with searchable databases, citators, precedent retrieval, document comparison, and ordinary legal workflow. They do not present themselves, at least to the user, as a dramatic departure from professional method. They present themselves as an extension of it. To a judge, the move from traditional research to AI-assisted research inside a familiar legal platform may feel less like an epistemic risk and more like a feature enhancement. The interface is known. The vendor is known. The broader research environment is known. The user is therefore less likely to experience the encounter as a leap into machine reasoning and more likely to experience it as an incremental improvement in ordinary professional tools.
That perception may be partly justified and partly illusory, but its institutional force is undeniable. Courts are conservative institutions in the structural sense. They value continuity, reliability, bounded discretion, and recognizable procedure. They do not generally embrace novelty simply because it is novel. A legal-specific AI tool, even when technically dependent on the same broad family of machine-learning methods as consumer-facing generative systems, appears narrower, more disciplined, and more professionally domesticated. It is framed as research assistance, document review, or citation support rather than open-ended generation. That framing is not merely a matter of marketing language. It shapes the user’s sense of what the tool is for and therefore the range of uses the user experiences as legitimate.
This helps explain why legal-specific tools may pass beneath the profession’s ideological defenses more easily than general-purpose systems. It is relatively easy for lawyers and judges to recoil from the image of “ChatGPT in the courtroom.” That phrase carries the cultural baggage of open-ended prompt-and-response generation, public consumer use, and highly publicized hallucinations. It sounds disruptive. It sounds unserious. It sounds like a challenge to the hierarchy of legal method. By contrast, AI-enhanced research inside a platform the profession has used for decades sounds familiar, even when the underlying issues of verification and reliability remain unresolved. The same profession that bristles at generalized AI rhetoric may accept machine-assisted research if it arrives inside an established legal brand. That is not hypocrisy. It is institutional selection.
The survey therefore reveals something important not only about tools, but about judicial psychology. Judges appear to be gravitating toward AI tools that fit preexisting workflow norms. That does not mean they are embracing AI enthusiastically or uncritically. It means their adoption is being filtered through institutional habits of trust. A legal-specific AI tool is more likely to be perceived as an aid to the existing research function than as an invitation to abandon legal judgment. The effect is subtle but powerful. A profession that might resist overt technological disruption can still accept technological infusion when it is packaged as continuity.
None of this means legal-specific AI deserves automatic confidence. The opposite conclusion would be naive. A legal wrapper does not eliminate fabricated authority, flattening of nuance, skewed summaries, poor analogies, or false confidence. In some respects, errors inside familiar infrastructure may be more dangerous precisely because they are more likely to pass without instinctive distrust. A user who approaches a general chatbot warily may approach an AI-assisted legal platform with professionalism’s version of relaxed guard. That is an institutional vulnerability. The appearance of continuity can lower the threshold of skepticism at the very moment when skepticism is still required.
That is why the more serious question is not whether a tool is consumer-facing or legal-specific, but what role the tool is playing inside the workflow and under what controls. A research feature embedded in a legal database may be appropriate for some preliminary purposes and unacceptable for others. It may be useful for orienting a broad issue, surfacing a first set of authorities, or accelerating a document search, yet still unreliable for generating propositions of law without source verification or for compressing a contested record into a summary that later shapes judicial analysis. A general-purpose model, by contrast, may be wholly unsuitable for record citation or legal synthesis while still being tolerable for noncase-related tasks such as preparing a speech outline, testing a phrase, or organizing administrative content. The branding distinction therefore matters, but it is not dispositive. It is the beginning of the inquiry, not the end of it.
This is one reason task-based governance is more important than tool-based rhetoric. A system that simply announces “legal AI good, consumer AI bad” will fail because it mistakes branding for control. The federal judiciary’s own administrative posture, at least in concept, points in the right direction by emphasizing approved tasks and independent verification rather than assuming that any particular label resolves the institutional risk. That is a more serious approach. It recognizes that the permissibility of machine assistance depends on what is being asked of the tool, what kind of material is involved, what human verification follows, and who bears responsibility for the result. A court deciding whether a chambers user may employ AI for research orientation is confronting a different question from whether AI may be used to summarize sealed material, generate operative adjudicative text, or frame disputed facts. Governance that cannot distinguish among those tasks is not governance at all.
The preference for legal-specific AI also complicates the legal purist’s rhetorical position in a more fundamental way. It is easy to condemn a disruptive consumer technology. It is harder to condemn an enhancement to the research infrastructure on which the profession already relies. Once AI arrives through ordinary legal tools, the profession loses the comfort of treating the issue as external contamination. The question stops being whether courts should let a strange machine into the room and becomes whether the profession is willing to examine how its own ordinary tools are changing. That shift matters because it forces the conversation away from theatrical panic and toward institutional detail. It requires the bench and bar to decide where the line between assistance and distortion actually lies.
There is another important lesson in the survey’s pattern of adoption. Judicial preference for legal-specific AI suggests that the future of AI in the federal courts, at least in the near term, is unlikely to arrive as a single dramatic transformation. The federal judiciary is not likely to become an “AI judiciary” by public declaration. More likely, it will become increasingly AI-inflected through a series of incremental adjustments inside research, review, and administrative systems already familiar to the bench and bar. That form of adoption is slower, quieter, and less visible. It is also harder to police. Technologies that arrive as revolution attract scrutiny. Technologies that arrive as optimization often do not.
That is precisely why serious oversight has to begin here. The profession must resist two opposite temptations at once. It must resist the naive assumption that legal-specific AI is inherently trustworthy because it sits inside a familiar professional ecosystem. And it must resist the equally simplistic assumption that AI governance can be accomplished by condemning a few well-known consumer platforms while ignoring machine assistance embedded in legal workflow. The actual problem is subtler and more institutional. The federal judiciary is already being shaped by the integration of machine assistance into tools the profession experiences as normal. If that integration is allowed to proceed under the cover of continuity, then the courts will adapt to AI without ever squarely deciding the terms on which they meant to do so.
That is the deeper significance of judicial preference for legal-specific tools. It is not merely a product choice. It is a governance warning. It shows that AI will not challenge the federal courts only from the outside, as a visible and disruptive force that courts can denounce in dramatic language. It will also reshape judicial work from within, through familiar platforms, trusted workflows, and seemingly modest improvements in efficiency. The more familiar the delivery mechanism, the more disciplined the oversight must be. Otherwise, what presents itself as incremental modernization may become unexamined institutional dependence.
V. Chambers, Clerks, and the Reality of Distributed Judicial Work
One of the most persistent distortions in the current debate over AI and the federal courts is the fiction of solitary judging. The debate is often framed as though the judge were the exclusive site of judicial work and as though the legitimacy question could therefore be resolved simply by asking whether the judge personally typed a prompt, personally relied on a model, or personally delegated a cognitive task to a machine. That framing is neat. It is also false. Federal adjudication does not emerge from a sealed chamber of solitary thought. It emerges from institutional production. Chambers operate through layered collaboration among judges, law clerks, staff attorneys, judicial assistants, and, in some settings, interns or other support personnel. Opinions, orders, bench memoranda, record summaries, legal chronologies, and preliminary research pathways often take shape inside that collaborative structure before the judge refines, rejects, or adopts the resulting work. Any account of AI in the judiciary that ignores this reality is not merely incomplete. It is structurally misleading.
The survey makes the point impossible to avoid. Judges reported that other individuals in their chambers used AI for legal research at 39.8%, which is higher than the percentage of judges who reported using AI for legal research themselves. More broadly, the paper reports somewhat greater AI use by chambers personnel than by judges across most categories. The authors also note something especially important: judges may be more likely to underreport than overreport chambers use, because they may not know all the ways in which others in chambers have begun using AI. Those observations matter because they show that AI use inside the judiciary is not reducible to judges’ personal habits. It is embedded in the collaborative environment through which judicial work is actually produced.
That point has serious institutional consequences. The first is that the legitimacy debate must shift from individual ethics to supervisory architecture. A judge may sincerely believe that AI should play no role in substantive adjudication. But if the judge does not know whether a clerk used AI to summarize a record, organize a chronology, identify a cluster of authorities, or generate a first-pass issue map, then the judge’s personal principle may not correspond to chambers reality. That mismatch is not trivial. Courts derive legitimacy from the integrity of their method, not merely from the sincerity of individual judicial belief. Once chambers work is distributed, the question is no longer simply what the judge thinks ought to happen. The question is what actually happens inside the production process that feeds judicial decision making.
The second consequence is that disclosure becomes much more difficult than the public version of the debate tends to acknowledge. Most external discussions assume a direct chain: a lawyer uses AI, the lawyer files the document, the court evaluates the filing. Chambers disrupt that simplicity. If a clerk uses AI internally to organize research or summarize a record, what disclosure, if any, is required? Must the judge know? Must the parties know? Does the answer change if the clerk’s AI-assisted work materially influences an order, a bench memorandum, or a later public filing? These are not easy questions, and they cannot be answered by slogans about transparency alone. But the existence of difficulty does not excuse avoidance. The rise of AI in chambers means that disclosure doctrine, where relevant, must be rethought through the lens of distributed labor rather than personal authorship alone.
The third consequence is that training and approval structures become mandatory rather than optional. A chambers environment that allows, tolerates, or simply ignores AI use by clerks and staff without written expectations is not practicing prudent restraint. It is inviting unsupervised experimentation in one of the most legitimacy-sensitive settings in the legal system. That point is especially important because chambers often function through trust, informality, and inherited custom. Those features can be strengths in ordinary institutional life. In the AI context, they can become liabilities. Informal assumptions about what clerks “probably would not do” are not substitutes for protocols. Trust is not verification. Custom is not governance.
The survey’s qualitative responses bring this problem into focus. Some judges reported that they did not know whether others in chambers used AI at all. Others reported that clerks had used AI to create presentations or draft remarks for talks. One response described a law clerk who, after writing a memo, used AI out of curiosity to draft a version on the same issue and discovered that ten of the eleven cited cases were fake. The significance of that anecdote lies not in any claim of widespread breakdown. It lies in the banality of the entry point. Nothing in the story required a formal decision that chambers would begin relying on AI. Nothing in it required institutional approval, a policy memorandum, or even bad faith. It required only curiosity, convenience, a question, and a tool that was already available. That is how technological drift often begins inside institutions: not with dramatic adoption, but with local experimentation that appears too small to count as structural change until the pattern has already taken hold.
This is where legal purism becomes especially weak as a governing framework. A theory centered on personal abstention by judges does almost nothing to address distributed judicial labor. Even if a judge never personally opens an AI tool, chambers may still be affected by AI-mediated research, machine-influenced record organization, or preliminary drafting support. Formal authorship by the judge does not erase the institutional significance of those earlier stages. In some respects, the legitimacy risk increases when the judge retains final authorship while the upstream production chain becomes partially machine-influenced but only partially visible. The institution may continue describing itself in traditional terms while its internal methods quietly shift underneath that description.
There is also a generational and structural asymmetry that makes this harder. Law clerks, staff attorneys, and younger lawyers are often more likely than senior judges to encounter, experiment with, or casually normalize emerging technologies. That is not an accusation. It is a predictable feature of professional life. The people closest to time pressure, drafting pressure, and research pressure are often the people most likely to adopt tools that promise speed. They may also be more technically fluent, more curious, or more comfortable operating in hybrid digital workflows. None of that makes them reckless. But it does mean that institutional exposure to AI may arise through subordinate channels before it becomes legible at the level of formal judicial policy. A governance model that focuses only on what judges themselves do will therefore miss a significant part of the actual risk profile.
The federal judiciary’s interim guidance, although not framed specifically as a chambers-supervision document, implicitly recognizes this problem. It emphasizes that judiciary users and their approvers remain accountable for work performed with AI assistance. That is not the language of mere individual choice. It is the language of supervision. It presumes that work product may be generated or shaped by one person and approved by another, and that responsibility follows both use and oversight. That is an important institutional insight. A viable governance regime cannot assume that every chambers user possesses equal caution, equal training, and equal understanding of machine limitations. Nor can it assume that informal chambers culture will reliably police a technology whose main attraction is precisely that it saves time at the earliest stages of work. If the institution is serious, it must determine what subordinate personnel may do, what approval is required, what categories of work remain off-limits, and how outputs must be verified before they become part of the chamber’s working product.
This distributed-work reality also exposes a broader problem with how judicial legitimacy is often discussed. There is a tendency to speak as though legitimacy attaches primarily to the final authored text and the judge’s final signature. That is only part of the truth. Legitimacy also depends on the integrity of the process that produced the text. A court order is not made legitimate simply because a judge ultimately reviews and signs it. It is made legitimate, in part, because the system claims that the work beneath the signature was generated through methods consistent with law’s demands for rigor, traceability, and human accountability. If chambers increasingly incorporate AI at earlier stages without settled rules, that claim becomes harder to sustain in its old form.
The practical response cannot be a mere warning. Serious supervision in this setting requires more than a chamber head telling clerks to “be careful” or “not overuse” AI. It requires written protocols. It requires explicit instruction on approved and prohibited uses. It requires mandatory verification standards for any AI-influenced research or drafting support. It requires confidentiality safeguards. It requires clarity about when AI use must be disclosed upward within chambers. It may also require recordkeeping norms in some settings, especially where AI contributed to work that later shaped filed or operative judicial text. Without that architecture, chambers risk becoming environments in which machine influence is both real and only partly visible. That is precisely the kind of institutional condition in which confidence erodes slowly at first and then all at once when a visible failure forces the hidden practice into view.
The courts have long understood that legitimacy depends not just on what the judge believes, but on how the institution produces its work. Chambers have always been part of that story, even when public discussion preferred to simplify them away. AI simply makes the old truth harder to ignore. The federal judiciary is not confronting a world in which a few judges may choose to use or avoid a new tool in isolation. It is confronting a world in which machine assistance can move through the collaborative channels of chambers practice faster than the formal language of judicial responsibility has yet caught up. Until that institutional fact is addressed directly, the conversation about AI in courts will remain rhetorically intense but structurally shallow.
VI. The Governance Gap: Use Has Outpaced Formal Policy
The most revealing problem in the current record is not merely that artificial intelligence has entered federal judicial work. It is that formal governance has not kept pace with that entry. The survey shows a judiciary operating under a patchwork of permission, discouragement, prohibition, and silence. One in three judges reported permitting or permitting and encouraging AI use in chambers. At the same time, a significant portion reported formally prohibiting AI use, another group reported discouraging but not formally prohibiting it, and a substantial share reported having no official AI policy at all. If judges who discourage but do not formally prohibit use are treated as effectively operating without formal policy, the portion functioning outside a formal rule structure becomes even larger. That is not a mature institutional framework. It is a transition environment in which use is real, expectations are uneven, and governance remains underdeveloped.
Patchwork is understandable at the beginning of institutional change. It is not a defensible resting point for courts. Courts do not ordinarily tolerate major procedural issues being governed by atmosphere, habit, or unspoken assumption. They impose filing rules, scheduling orders, preservation duties, disclosure obligations, certification requirements, evidentiary foundations, and sanctions because procedure is one of the principal ways law disciplines power. Yet on AI, a technology capable of affecting research, drafting, record handling, filing practice, and evidentiary disputes, much of the federal judicial posture remains chamber-specific, judge-specific, or locally improvised. That is not a minor administrative lag. It is a governance gap.
The seriousness of that gap becomes clearer when measured against the judiciary’s own institutional instincts in every other domain. Courts do not tell litigants to “be careful” with evidence and stop there. They define admissibility rules. They do not respond to unreliable discovery conduct with generalized reminders about professionalism alone. They create obligations, deadlines, and sanctions. They do not treat ambiguity as a virtue when the integrity of adjudication is at stake. They reduce discretion by articulating procedure. That is one of the defining habits of legal institutions. The problem with AI is not that the judiciary lacks a procedural tradition. It is that it has not yet translated that tradition into a sufficiently coherent control architecture for this technology.
This is not because the issue has gone unnoticed. Quite the opposite. The federal judiciary’s own institutional materials effectively concede that present arrangements are unsettled. A call for an AI governance framework is, by its nature, an acknowledgment that the existing landscape is fragmented. Interim guidance is not the language of institutional completion. It is the language of temporary management while something more durable is still being built. That distinction matters. Institutions issue temporary guideposts when operational reality has already outrun formal structure. In other words, the judiciary’s own administrative posture confirms the central point: AI use has advanced faster than policy.
The court-level responses illustrate the same problem from another angle. Local standing orders requiring disclosure, verification, and certification are serious measures. Sanctions for fabricated authority are serious measures. Judicial warnings about false factual and legal statements generated through AI are serious measures. But these interventions, however important, are not the same thing as a coherent governance regime. They are reactive, partial, and often limited to discrete domains such as litigant submissions. They do not settle what chambers personnel may use AI for. They do not fully define supervision duties. They do not create a comprehensive training architecture. They do not resolve the relationship between machine assistance and internal judicial workflow. They address breakdown. They do not yet provide a full architecture for prevention.
That is why it is necessary to distinguish between control and governance. A standing order aimed at fabricated citations is a form of control. A sanction for AI-assisted misrepresentation is a form of control. A local certification requirement is a form of control. Those tools matter, and they should not be minimized. But governance is broader. Governance means the institution has defined what uses are approved, what uses are prohibited, what uses require disclosure, what verification is mandatory, who supervises whom, what training is required, and what consequences attach to noncompliance. Governance does not begin only when something goes wrong. It exists to shape conduct before the failure occurs.
The absence of that fuller structure creates a particularly dangerous kind of institutional ambiguity: responsible-seeming informality. A judge discourages AI use without reducing the expectation to writing. A chamber permits limited use for “research only” without defining the boundaries of research. A clerk uses an AI-assisted feature embedded in a familiar legal platform and assumes ordinary professional caution is enough. A court warns lawyers against fabricated citations but provides little chamber-specific education on internal use. Each of these arrangements can sound reasonable in isolation. None of them sounds reckless. Yet together they create an environment in which machine assistance is present, incentives toward convenience are real, oversight is uneven, and accountability remains diffuse. Informality begins to look responsible precisely because the institution has not yet forced itself to specify what responsibility requires.
That is one reason ideological resistance can worsen the very problem it claims to oppose. The legal purist often imagines that denunciation of AI creates institutional caution. In reality, denunciation without policy may simply push use into ambiguous zones of tolerated but underdefined practice. If the profession is unwilling to articulate clear rules because doing so feels like legitimating the technology, then use does not disappear. It migrates. It moves into informal workflows, individualized understandings, vague warnings, and local custom. That is a far more dangerous outcome for courts than open acknowledgment paired with disciplined control. Ambiguity is not a protective state. It is the environment in which inconsistent practice hardens into hidden norm.
The governance gap also magnifies every other AI-related risk. Hallucinations become harder to catch when no uniform verification expectation exists. Confidentiality risks become harder to manage when approved-task boundaries are undefined. Skill atrophy becomes harder to identify when chambers use is informal and unevenly supervised. Fabricated citations are more likely to reach the court when lawyers and judges operate under mixed signals about what tools may be used and what checking is required. Uneven policy also creates a legitimacy problem of its own. A judiciary that insists on procedural rigor from everyone else should not appear content to govern its own encounter with AI through scattered local responses and unspoken chamber culture.
There is a second institutional cost as well: unequal practice. In a patchwork system, the role AI plays in federal adjudication may depend too heavily on local assignment, chamber custom, judicial temperament, or professional culture within a particular office. One chamber may have a clear written protocol. Another may rely on verbal assumptions. One district may require certifications and warn of sanctions. Another may say very little. One judge may treat AI-assisted research as tolerable if verified. Another may discourage it without defining the point at which discouragement becomes a rule. That kind of unevenness may be survivable in a short transition period. It is much harder to justify as a stable condition in a federal judicial system that claims fidelity to equal procedure and disciplined administration.
The governance gap is therefore not just one issue among many. It is the condition under which every other issue becomes harder to control. It is the reason the debate cannot remain trapped at the level of moral posture. The real institutional failure is no longer that artificial intelligence exists in the federal courts. The real failure is that the courts have entered the AI era without yet completing the work of governing it. Until that changes, every warning about hallucinations, fabricated authority, confidentiality, and legitimacy will remain at least partly reactive. And reactive systems, especially in courts, are almost always late.
Reader Supplement
To support this analysis, I have added two companion resources below.
First, a Slide Deck that distills the core legal framework, and institutional patterns discussed in this piece. It is designed for readers who prefer a structured, visual walkthrough of the argument and for those who wish to reference or share the material in presentations or discussion.
Second, a Deep-Dive Podcast that expands on the analysis in conversational form. The podcast explores the historical context, legal doctrine, and real-world consequences in greater depth, including areas that benefit from narrative explanation rather than footnotes.
These materials are intended to supplement—not replace—the written analysis. Each offers a different way to engage with the same underlying record, depending on how you prefer to read, listen, or review complex legal issues.
About the Author
Eric Sanders is the owner and president of The Sanders Firm, P.C., a New York-based law firm concentrating on civil rights and high-stakes litigation. A retired NYPD officer, Eric brings a unique, “inside-the-gate” perspective to the intersection of law enforcement and constitutional accountability.
Over a career spanning more than twenty years, he has counseled thousands of clients in complex matters involving police use of force, sexual harassment, and systemic discrimination. Eric graduated with high honors from Adelphi University before earning his Juris Doctor from St. John’s University School of Law. He is licensed to practice in New York State and the Federal Courts for the Eastern, Northern, and Southern Districts of New York.
A recipient of the NAACP—New York Branch Dr. Benjamin L. Hooks “Keeper of the Flame” Award and the St. John’s University School of Law BLSA Alumni Service Award, Eric is recognized as a leading voice in the fight for evidence-based policing and fiscal accountability in public institutions.
