Opinion: Air Marshal (Ret’d) John Harvey AM asks a critical question as militaries around the world accelerate their adoption of artificial intelligence: Can machines become repositories of military wisdom, or does effective command still depend on uniquely human judgement?
Military institutions have always relied on written doctrine to compress hard-won experience into portable guidance for an uncertain future.
They are now adopting a different kind of knowledge repository – the large language model (LLM). The two share a structural objective, which makes the comparison tempting; but their internal mechanics are fundamentally different, and it is in the differences that the lessons for command lie.
The comparison matters because militaries are increasingly inclined to treat LLMs as repositories of institutional knowledge, and the points at which doctrine and an LLM align – and part – offer a practical test of where such a model can assist decision making and where it cannot.
The shared act of compression
Both doctrine and the LLM are exercises in radical compression. Doctrine condenses a vast, chaotic record of campaigns and after-action review into a structured volume a single officer can carry.
Crucially, it is not merely distilled experience: it is experience that has passed through institutional review, legal scrutiny and command endorsement before it is issued, and it carries authority because that process is known and deliberate.
The Australian Defence Force defines doctrine as the fundamental principles guiding the actions of forces, adding the decisive qualification that it is “authoritative but requires judgement in application”.
Doctrine is descriptive, not prescriptive; it tells the force how to think about fighting and relies on the commander to adapt that thinking to the case at hand.
It is not confined to any single level of war but spans the strategic, operational and tactical levels – from the principles that shape national military strategy through the design of campaigns to the conduct of a unit in contact – and the comparison that follows holds across all three.
An LLM compresses through a mathematical pipeline instead – condensing billions of semantic relationships into a file of weights – and arrives at a superficially similar product through an entirely dissimilar process.
Where the analogy breaks
The similarities are real, but the analogy breaks down across five domains. The first two concern trust and accountability, the next two are reasoning and operational behaviour, and the last, the human’s own relationship to the machine’s output.
Readability: Doctrine is written by people in explicit language; a commander can trace a doctrinal claim to public debate, committee papers and specific lessons.
It resembles a classical expert system whose rules can be inspected. An LLM’s reasoning is distributed across an opaque matrix of weights that resists reconstruction and the “reasoning” traces some models emit are better understood as post-hoc rationalisations than faithful records.
Provenance: Doctrine carries a visible pedigree – who wrote it, who approved it, when it was revised and what authority it commands. An LLM offers none: it is impossible to say which sources shaped a recommendation or what evidence underpins it. For doctrine, authority is traceable; for an LLM, it is asserted.
Causation: Doctrine is written to articulate why particular actions produce particular outcomes, constrained throughout by law and ethics. An LLM has no accountable model of intent or truth; it produces a recommendation because that sequence of words is statistically probable and its fluency is not evidence of understanding.
Failure mode: Doctrine’s characteristic failure is institutional rigidity – texts that lag a changing reality. The LLM’s is hallucination – the fluent fabrication of plausible untruths.
A rigid manual can stall an army; a hallucinating model can misdirect it. The two also diverge in currency: doctrine is revised deliberately, whereas a model is frozen unintentionally at the moment training stops, and even when connected to current databases, its underlying judgement remains shaped by the assumptions embedded during training.
What the machine does to the human
The fifth divergence is the most consequential. Doctrine demands interpretation, and that demand is a safeguard. It is also the substrate of professional military education – the material through which officers are formed over years of training and command. Working a decision out of doctrine exercises a trained faculty that doctrine itself helped create, and the effort reconstructs the context and makes the judgement the commander’s own.
An LLM removes that labour and has no part in the formation that makes it possible. It can return a digested yes-or-no answer without the user having reconstructed any of the conditions it supposedly accounts for.
Its natural affordance runs towards that digested answer – the pull of automation bias, the tendency to overtrust a fluent machine output. The risk is not that the machine decides, but that the human ratifies a decision they never had to understand, and so never exercises the judgement their profession was built to develop.
None of this is inherent to the technology. A well-designed assistant can do the opposite – surfacing assumptions, presenting alternatives and pressing the user to justify a course of action rather than supplying one. Used that way, an LLM can sharpen command judgement rather than supplant it.
The danger lies not in AI assistance as such, but in treating the machine’s synthesis as a substitute for the intellectual labour of command.
The indispensability of judgement
Both systems are repositories of distilled history, each constrained by its inputs; neither can evaluate the present moment. Because an LLM is a statistical reflection of historical text, it cannot substitute for command judgement, though it can accelerate analysis, summarise intelligence and draft orders as a capable staff partner.
The unmet design problem is to keep human judgement load-bearing rather than nominal. The distinction that matters is between an LLM treated as an oracle, whose fluent output is accepted as truth, and an LLM treated as a bounded instrument inside a governance envelope – a deterministic layer that fixes what the model may assert, where its outputs must defer to authoritative data, and at which points a human must own the decision.
Doctrine already encodes this discipline for human staff: authoritative yet explicitly deferring to judgement. The task now is to build the machine equivalent of that deference – not a model trusted to know the rules, but an architecture in which the rules sit outside the model and constrain it.
This conclusion is not reached in isolation.
Allied defence organisations are converging on the same place: doctrine writers using LLMs insist that humans review every line; analysts note that models drift out of date far faster than doctrine is revised; and defence-specific evaluation against rules-of-engagement prompts has found systematic misinterpretation of the rules governing force.
The aye-aye of naval obedience was always given by someone accountable for the order. The challenge of the AI transition is to ensure that accountability does not quietly dissolve into the weights of the model but remains attributable to an accountable human decision maker.
A full version of the paper is available here.
John Harvey is a former Air Marshal in the Royal Australian Air Force and has a PhD in computer science from UNSW Canberra. His postings have included Chief Capability Development Group, F-35 project manager, director Military Strategy and director Air Power Studies Centre.
Want to see more stories from trusted news sources?
Make Defence Connect a preferred news source on Google.
Click here to add Defence Connect as a preferred news source.