You Cannot Offload Understanding

What cognitive science says about knowledge, AI, and learning

cognition
AI
education
metacognition
Published

March 5, 2026

The claim that AI makes knowledge less important conflates knowledge with information. Knowledge integrated into a human mind restructures how that mind perceives and reasons. That capacity cannot be offloaded.

A recurring claim in discussions about the future of education goes something like this: because AI makes knowledge universally accessible, having knowledge will become less important. Therefore, education should pivot toward “future skills” like creativity and critical thinking. The argument is wrong, and it is wrong in ways that matter for how we think about education.

The confusion starts with a conflation of two very different things: knowledge integrated into a human mind, and information available in an external system. What you have learned does not sit in memory waiting to be recalled. It shapes what you notice, how you interpret it, and what you can do with it. Information in an AI system does none of that until a person retrieves it and makes sense of it. Retrieving information may be easy. Making sense of that information is the hard part, and it depends on what you already know. The claim that AI renders internal knowledge redundant assumes that the mind is a filing cabinet. Decades of research on expertise and knowledge organisation have shown otherwise.

What expertise actually looks like

Expertise research has demonstrated that experts do not simply know more than novices. They have richly organised knowledge structures, typically called schemas, that allow them to recognise patterns and chunk information into meaningful configurations that novices miss entirely. The classic demonstration comes from Chase and Simon (1973), who showed that chess masters could reconstruct meaningful board positions from brief exposure far better than novices, but showed no advantage for random arrangements. What set experts apart was not memory capacity, but the ability to see meaningful structure, which depends entirely on deep domain knowledge.

Chi, Feltovich, and Glaser (1981) found the same pattern in physics: experts categorised problems by underlying principles (e.g. conservation of energy), while novices categorised by surface features (e.g. “problems with inclined planes”). The pattern generalises. An experienced researcher examining a new study immediately notices that the control group differs from the treatment group in ways that could explain the result. A student reads the same study, sees a positive result, and concludes the treatment works. In each case, the expert’s perception is organised by a structural understanding of the domain. No external tool can reorganise someone’s perception for them.

Knowledge is also hierarchically organised. We do not merely learn individual concepts; we learn what Kemp, Perfors, and Tenenbaum (2007) call overhypotheses: higher-order regularities that constrain how new concepts in a domain can be acquired. A child learning their first words grasps not just individual labels but the principle that object names pick out shapes rather than colours or textures. This higher-order regularity narrows the hypothesis space for every subsequent word, which is why vocabulary acquisition accelerates so dramatically after the first dozen words.

The same hierarchical structure appears wherever expertise develops. A researcher who has internalised the general logic of confound control does not merely know a list of specific confounds. They have acquired a higher-order expectation that any observed association might be produced by an uncontrolled variable, and this generates predictions about where to look in any new study, regardless of the domain. The most cognitively valuable knowledge is the most abstract, because it constrains the hypothesis space at the level below, operating across every instance rather than just one. You can look up a specific confound, but you cannot look up the behavioural disposition to expect confounds.

Cognition as inference

The “knowledge is less important” framing implicitly models cognition as search: you need a fact, you look it up, you apply it. But the mind does not work by searching a database. It works by maintaining a generative model of the world, one that generates predictions, supports causal and counterfactual reasoning, and revises itself when those predictions fail (Tenenbaum et al. 2011). Knowledge, in the sense that matters for cognition, is this model.1 It is integrated into the agent’s perception-action loop, shaping what they notice and what they can imagine.

Schemas, the knowledge structures that expertise research has documented, can be understood as generative models. The chess master’s advantage in Chase and Simon (1973) is exactly this: a model of meaningful board positions that makes random arrangements no more memorable than they are for anyone else. In physics, the deep categorisation Chi, Feltovich, and Glaser (1981) documented reflects structured knowledge of causal relationships. And the experienced researcher who spots confounds is running a causal model of how studies produce their results.

Learning, in this framework, is driven by prediction error: the discrepancy between what the model expected and what actually occurred. That signal only exists if there was an expectation in the first place, which requires prior knowledge. Without prior knowledge there are no expectations, and without expectations there is nothing to be surprised by. Knowledge does not merely bias inference toward one answer. It determines the hypothesis space: the set of possible explanations a person can even entertain. An experienced researcher and a student reading the same unexpected result can entertain very different explanations, because the researcher’s understanding of methodology and the domain lets them consider interpretations the student would never arrive at.

Knowledge also makes inference tractable. Human minds are bounded systems, and strong prior knowledge narrows the search space so that a bounded agent can reach good answers within its actual cognitive budget (Lieder and Griffiths 2020). Even an AI-augmented human remains bounded: formulating useful queries, interpreting responses, and integrating results with ongoing reasoning all benefit from strong priors.

What a generative model does

What does it mean, concretely, for knowledge to function as a generative model? Consider an experienced researcher reading a new empirical study. Before any conscious evaluation, they see the study differently from how a student sees it. Sample composition, the choice of outcome measure, which covariates were and were not controlled, the plausibility of the proposed mechanism: these stand out as methodological decisions with consequences, because the researcher already has a model of how studies produce their results. The student, reading the same paper, sees a significant p-value and a confident abstract. But the researcher’s model does more than organise what is on the page. It can simulate: if the authors had included an active control condition, would the effect survive? What if they had used a behavioural measure instead of self-report? Would a different analytic strategy change the conclusion? This counterfactual reasoning requires a generative model rich enough to run forward, producing predictions about data that were never collected (Gerstenberg 2024).

When the results contradict what the model predicted, the prediction error carries a specific learning signal: the discrepancy between the expected pattern and what actually turned up in the data. But that signal only exists if there was an expectation. An expert’s concentrated expectations make deviations diagnostic; the prediction was specific, so its failure points somewhere specific. A novice’s expectations, spread thinly across many possibilities, generate no such signal. A strong prior produces specific, falsifiable predictions; a weak prior is close to unfalsifiable, because almost any outcome is compatible with it.

This researcher was once a student. The model they now run was built through years of designing studies that had holes in them, getting reviewer feedback that exposed those holes, and slowly assembling an understanding of how methodological choices propagate into results. A student who offloads the evaluation of studies to an AI receives a competent-sounding critique but skips the process that would have built the model. The pattern holds across all knowledge domains. What kind of learning builds generative models, and what kind short-circuits the process?

Why the effort matters

A natural response to the availability of AI is to treat the effort of acquiring knowledge as a cost to be minimised. If the answer is one prompt away, why spend hours working through the material yourself? Research on desirable difficulties (R. Bjork and Bjork 1992; E. L. Bjork and Bjork 2011) shows that effort matters: testing beats rereading, and generating an answer beats passive reception. The harder path produces more robust learning because how you engage with the material determines how well it is encoded.

There is a deeper reason effort matters.2 The brain does not solve each new problem from scratch. It builds on accumulated experience, developing pattern-recognition that makes future judgements faster and more accurate. Each time a researcher evaluates a study, that experience sharpens the system that will make the next evaluation quicker and better calibrated. The desirable difficulty is not just building a better model; it is training the fast judgement that makes the model usable under real-world time pressure.

But AI offloading raises a more fundamental question: whether the relevant cognitive operations happen at all. When a student offloads cognitive work to AI, they get the output but skip the process that would have produced learning. The student gets the deliverable without building the competence necessary to produce it. This is the distinction between performance and learning (Grinschgl, Papenmeier, and Meyerhoff 2021). AI tools can reliably improve performance (the quality of what gets produced in the moment) while simultaneously undermining learning (the construction of the model that would support future independent performance). The two effects are not contradictory. Performance depends on what is currently available to the system, including external tools; learning depends on what has been processed through the mind’s own machinery.

This follows directly from bypassing the cognitive operations that give rise to learning. Struggling with a problem, hitting dead ends, revising your mental model of how the pieces fit: that process is not incidental to learning. It is a sign that the relevant cognitive work is happening. The researcher’s model was built precisely through the experience of being wrong in informative ways.

You need knowledge to evaluate knowledge

AI-assisted work has a metacognitive problem. Effective use of AI-generated content requires judging whether that content is accurate and complete. This judgement depends on domain knowledge, and the dependence creates a circularity that is difficult to escape. R. A. Bjork (1994) documented how learners systematically confuse familiarity with understanding and fluency with competence. Metacognitive monitoring, the ability to judge the accuracy of your own judgements and detect your own errors (Fleming and Daw 2017), depends on domain knowledge; it is not a free-standing skill you can apply to any content regardless of what you know.

Applied to AI, this creates a troubling asymmetry. A student reads an AI-generated explanation of something they know well and immediately notices where the explanation simplifies too much or gets something subtly wrong. They can evaluate because their model generates expectations that the text might violate. Now the same student reads an AI-generated explanation of something they know nothing about. It reads fluently, seems coherent, and arrives with the same confidence regardless of whether it is accurate. The student has no basis for surprise, no expectation the answer could have violated. The gap between what they understand and what they think they understand is invisible to the student.

You cannot separate knowing from thinking

Far from making knowledge less important, AI tools make the metacognitive capacities that depend on knowledge more important.

There is a tempting compromise here: concede that domain knowledge matters for perception and evaluation, but argue that what education should really prioritise is flexible thinking, the “higher-order” capacities that knowledge merely serves. This treats knowing and thinking as separable mental faculties. They are not.

One of the most distinctive features of human intelligence is what cognitive scientists call compositional generalisation: the capacity to recombine known elements in novel configurations to handle situations never previously encountered (see e.g. Lake et al. 2017/ed; Summerfield 2022). A child who has learned “purple” and “square” can immediately think about a purple square without ever having seen one. A researcher who understands the logic of controlling for confounds can reason about threats to validity in a study they have never seen.

Compositional generalisation requires structured representations to compose over. You cannot recombine elements you have not represented as separable, abstract components. The physicist in Chi, Feltovich, and Glaser (1981) who categorises problems by conservation of energy has decomposed the domain into abstract primitives that transfer across superficially different situations. The novice who categorises by “inclined plane” is bound to surface features; there is nothing to recombine. A researcher who understands why randomisation matters has an abstract structural component (the logic of causal identification) that transfers to any empirical question. A student who learned “use a t-test for two groups” has a procedure, not a transferable principle. Reasoning is the flexible deployment of structured knowledge.

This closes an important escape route. Someone might accept that knowledge matters but still argue that education should focus on “domain-general” capacities like critical thinking and problem solving, developed independently of any particular subject matter and then applied across domains. Willingham (2008) has argued persuasively that this gets the relationship backwards: what looks like domain-general critical thinking is actually the flexible deployment of domain knowledge. Critical thinking about a claim in molecular biology requires knowing molecular biology well enough to spot when something doesn’t add up and to see what a good experiment would look like. The same applies to evaluating a statistical analysis or a legal argument. “Critical thinking” without domain knowledge mimics the forms of evaluation without the substance that would make them reliable (Tricot and Sweller 2014).

What “future skills” advocates actually want, when pressed, is transfer: the ability to apply competence in situations that were not explicitly taught. Transfer is arguably the most important goal of education, but it is a product of abstraction, and abstraction requires deep structural understanding of a domain (Barnett and Ceci 2002). Far transfer (applying what you learned in one context to a genuinely different one) is rare, and when it occurs it is built on representations that have been decomposed into abstract, recombinable components.

The logic of offloading

None of this implies that externalising cognitive work is inherently harmful. Cognitive offloading, the use of physical actions or external tools to alter information-processing demands, is a normal and pervasive feature of human cognition (Risko and Gilbert 2016). We write things down, draw diagrams, count on our fingers, use calculators. When does offloading help and when does it do damage?

Offloading makes sense when the person already possesses the relevant knowledge but faces limits on what they can hold in mind at once. A surgeon using a checklist frees attentional resources for judgement. An expert programmer using a linter offloads routine error detection to focus on architecture. In both cases, the underlying competence is intact; the tool relieves a performance constraint. This is the extended mind as Clark and Chalmers (1998) intended it: cognitive processes genuinely distributed across brain and environment, but with the internal contribution doing the work that matters.

Performance bottlenecks and learning bottlenecks respond to offloading differently. Offloading helps when it relieves a performance constraint (working memory limits, attentional capacity) while preserving the underlying competence. It becomes harmful when the cognitive effort being avoided was itself the mechanism that would have built competence. A calculator extends the mathematician’s mind. The same calculator in the hands of someone who does not understand division just produces outputs the person cannot evaluate.

The empirical evidence confirms this. Grinschgl, Papenmeier, and Meyerhoff (2021) found that cognitive offloading boosts immediate performance but diminishes subsequent memory for the offloaded information. The gain in the moment comes at the cost of learning that would have occurred through internal processing. Hu, Luo, and Fleming (2019) showed that metamemory (awareness of what one does and does not know) plays a mediating role: people who offload habitually may also lose the metacognitive signal that tells them what they still need to learn. The offloading does not merely bypass one instance of encoding. It can erode the self-monitoring that would have prompted further study.

What matters is whether knowledge is represented in a way that supports flexible recombination (Summerfield 2022). When someone already has this structure, offloading is straightforward: externalising intermediate results while working through a problem frees capacity for the compositional operations that matter. Offloading that prevents this structure from forming in the first place is where the damage occurs. If a student never works through the reasoning behind an analysis, the abstract structural components that would have supported transfer to new problems are never acquired. The question educators should be asking is: does the student have the internal structure that makes looking things up useful?

But what about LLMs?

A critic might accept the generative-model account and still object: large language models are themselves generative models. Consulting an LLM is less like consulting a database than like consulting a system that generates predictions and produces outputs that can be genuinely surprising. Hasn’t the filing-cabinet objection been dissolved?

The objection is partly right. LLMs are not filing cabinets, and there is genuine debate about the extent to which they acquire internal world models through training on language (see e.g. Li et al. 2023). But a researcher’s model predicts how a study’s design will shape its results and revises when those predictions fail. Whatever predictive capacities an LLM may have, they remain the LLM’s, not the user’s.

What matters is whether the model is integrated into the agent’s perception-action loop or merely consultable by the agent. An LLM generates predictions, but they remain the LLM’s. They do not restructure the user’s perception or refine their causal understanding of the domain, and passively consuming the output, however fluent, does not build the internal model that would let the user perceive and reason independently. The outputs may be excellent; the user’s own inferential machinery remains unchanged unless they do something cognitively active with what they receive.

This is why the performance-learning asymmetry survives even when the external system is sophisticated. The LLM can generate the analysis and the explanation. But it cannot run the model inside the student’s head that would have allowed those cognitive operations to become their own. The sophistication of the tool merely changes what can be offloaded.

The stronger objection

There is a version of the argument that this analysis does not touch, and it deserves an answer. The claim is that the entire cognitive loop, from retrieval through comprehension and evaluation to action, can be automated. If one AI system produces information and another consumes and acts on it, the human is no longer in the loop at all. That is arguing that humans matter less, not that knowledge matters less for humans. If that becomes true, this entire essay is addressing the wrong question.

But something important follows from control theory, and it applies as long as humans remain in any oversight role. Craik (1943) argued that any system that successfully regulates another must maintain an internal model of that system. Conant and Ashby (1970) proved this formally: every good regulator of a system must contain a model of that system. Anyone overseeing an AI system is functioning as a controller. Effective control requires a generative model rich enough to predict the system’s behaviour, detect its failures, and know when to intervene.

This inverts the automation objection. As AI systems become more powerful, the controller’s model must become richer, not simpler. More capable systems produce fewer obvious errors, but the errors they do produce are harder to detect. A controller without domain knowledge can catch gross failures but not subtle ones. The argument for internal knowledge therefore does not weaken as AI advances. Whether the focus remains on understanding the domain directly or shifts toward understanding it well enough to regulate AI systems operating within it, the knowledge required is at least equally demanding. Either way, it requires integrated, model-based knowledge, schemas rich enough to predict where the system will fail.

What does this mean for teaching?

None of this argues against AI tools in education. The argument is for building the generative models that make those tools usable. AI does, however, change something about the value of different types of knowledge. The common move of distinguishing “mere rote recall” from “higher-order thinking” and then dismissing the former is too hasty. Much of what looks like rote memorisation is actually cognitive infrastructure. A child who has automated the times table has freed working memory for algebra and mathematical reasoning. A person who knows the chronology of the French Revolution is not hoarding trivia; that chronology provides a temporal scaffold for causal explanations. Memorised knowledge and conceptual understanding are not opposed, because memorisation often provides the raw material for understanding.

What does shift is the return on different types of knowledge. Genuinely isolated facts, the kind that serve no further cognitive function, were arguably never the most important educational outcome. Conceptual understanding, the ability to frame problems and integrate information across domains, becomes more important when AI handles routine lookup and generation. But the path to conceptual understanding runs through a large body of well-organised, readily accessible knowledge.

The computational framework also suggests principles for when AI interaction could support model-building. An AI system that generates predictions for the learner to evaluate, rather than evaluations for the learner to consume, exercises the learner’s own predictive model. An AI that provides targeted feedback on the learner’s reasoning, highlighting where their expectations diverge from the evidence, can sharpen the error signal beyond what unassisted study would provide. The key criterion is whether the interaction runs the learner’s model or bypasses it. Passive consumption of AI-generated answers bypasses it. Using AI to get feedback on your own reasoning, or to generate cases that test your understanding, does not.

The implication for education is that we should be more deliberate about which knowledge we prioritise and how we help students build it. “Teach concepts instead of facts” recreates the false separation this essay has argued against. The goal is to prioritise the knowledge that is most productive for building generative models: the facts, principles, and structural relationships that participate in the most causal models and support the widest range of inference.

AI is a powerful external resource, but it is not a substitute for the internal structures that make external resources useful. A world in which people have less knowledge is a world in which people are less capable of using the tools available to them, however sophisticated those tools become.

Back to top

References

Barnett, Susan M., and Stephen J. Ceci. 2002. “When and Where Do We Apply What We Learn? A Taxonomy for Far Transfer.” Psychological Bulletin 128 (4): 612–37. https://doi.org/10.1037/0033-2909.128.4.612.
Bjork, Elizabeth Ligon, and Robert A. Bjork. 2011. “Making Things Hard on Yourself, but in a Good Way: Creating Desirable Difficulties to Enhance Learning.” In Psychology and the Real World: Essays Illustrating Fundamental Contributions to Society, 56–64. New York, NY, US: Worth Publishers.
Bjork, Robert A. 1994. “Memory and Metamemory Considerations in the Training of Human Beings.” In Metacognition: Knowing about Knowing, 185–205. Cambridge, MA, US: The MIT Press. https://doi.org/10.7551/mitpress/4561.001.0001.
Bjork, Robert, and Elizabeth Bjork. 1992. “A New Theory of Disuse and an Old Theory of Stimulus Fluctuation.” Essays in Honor of William K. Estes, Vol. 1991: From Learning Theory to Connectionist Theory, January, 1935–67.
Chase, William G., and Herbert A. Simon. 1973. “Perception in Chess.” Cognitive Psychology 4 (1): 55–81. https://doi.org/10.1016/0010-0285(73)90004-2.
Chi, Michelene T. H., Paul J. Feltovich, and Robert Glaser. 1981. “Categorization and Representation of Physics Problems by Experts and Novices.” Cognitive Science 5 (2): 121–52. https://www.sciencedirect.com/science/article/pii/S0364021381800298.
Clark, Andy, and David Chalmers. 1998. “The Extended Mind.” Analysis 58 (1): 7–19. https://www.jstor.org/stable/3328150.
Conant, Roger C., and W. Ross Ashby. 1970. “Every Good Regulator of a System Must Be a Model of That System.” International Journal of Systems Science 1 (2): 89–97. https://doi.org/10.1080/00207727008920220.
Craik, K. J. W. 1943. The Nature of Explanation. The Nature of Explanation. Oxford, England: University Press, Macmillan.
Dasgupta, Ishita, Eric Schulz, Joshua B. Tenenbaum, and Samuel J. Gershman. 2020. “A Theory of Learning to Infer.” Psychological Review 127 (3): 412–41. https://doi.org/10.1037/rev0000178.
Fleming, Stephen M., and Nathaniel D. Daw. 2017. “Self-Evaluation of Decision-Making: A General Bayesian Framework for Metacognitive Computation.” Psychological Review 124 (1): 91–114. https://doi.org/10.1037/rev0000045.
Gershman, S., and Noah D. Goodman. 2014. “Amortized Inference in Probabilistic Reasoning.” Cognitive Science. https://www.semanticscholar.org/paper/Amortized-Inference-in-Probabilistic-Reasoning-Gershman-Goodman/93f5a28d16e04334fcb71cb62d0fd9b1c68883bb.
Gerstenberg, Tobias. 2024. “Counterfactual Simulation in Causal Cognition.” Trends in Cognitive Sciences, May, S1364661324001074. https://doi.org/10.1016/j.tics.2024.04.012.
Griffiths, Thomas L., Nick Chater, Charles Kemp, Amy Perfors, and Joshua B. Tenenbaum. 2010. “Probabilistic Models of Cognition: Exploring Representations and Inductive Biases.” Trends in Cognitive Sciences 14 (8): 357–64. https://doi.org/10.1016/j.tics.2010.05.004.
Grinschgl, Sandra, Frank Papenmeier, and Hauke S Meyerhoff. 2021. “Consequences of Cognitive Offloading: Boosting Performance but Diminishing Memory.” Quarterly Journal of Experimental Psychology (2006) 74 (9): 1477–96. https://doi.org/10.1177/17470218211008060.
Hu, Xiao, Liang Luo, and Stephen M. Fleming. 2019. “A Role for Metamemory in Cognitive Offloading.” Cognition 193 (December): 104012. https://doi.org/10.1016/j.cognition.2019.104012.
Kemp, Charles, Andrew Perfors, and Joshua B. Tenenbaum. 2007. “Learning Overhypotheses with Hierarchical Bayesian Models.” Developmental Science 10 (3): 307–21. https://doi.org/10.1111/j.1467-7687.2007.00585.x.
Lake, Brenden M., Tomer D. Ullman, Joshua B. Tenenbaum, and Samuel J. Gershman. 2017/ed. “Building Machines That Learn and Think Like People.” Behavioral and Brain Sciences 40 (2017/ed). https://doi.org/10.1017/S0140525X16001837.
Li, Kenneth, Aspen K. Hopkins, David Bau, Fernanda Viégas, Hanspeter Pfister, and Martin Wattenberg. 2023. “Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task.” January 25, 2023. http://arxiv.org/abs/2210.13382.
Lieder, Falk, and Thomas L. Griffiths. 2020. “Resource-Rational Analysis: Understanding Human Cognition as the Optimal Use of Limited Computational Resources.” Behavioral and Brain Sciences 43: e1. https://doi.org/10.1017/S0140525X1900061X.
Risko, Evan F., and Sam J. Gilbert. 2016. “Cognitive Offloading.” Trends in Cognitive Sciences 20 (9): 676–88. https://doi.org/10.1016/j.tics.2016.07.002.
Summerfield, Christopher. 2022. Natural General Intelligence: How Understanding the Brain Can Help Us Build AI. Oxford University Press. https://doi.org/10.1093/oso/9780192843883.001.0001.
Tenenbaum, Joshua B., Charles Kemp, Thomas L. Griffiths, and Noah D. Goodman. 2011. “How to Grow a Mind: Statistics, Structure, and Abstraction.” Science (New York, N.Y.) 331 (6022): 1279–85. https://doi.org/10.1126/science.1192788.
Tricot, André, and John Sweller. 2014. “Domain-Specific Knowledge and Why Teaching Generic Skills Does Not Work.” Educational Psychology Review 26 (2): 265–83. https://doi.org/10.1007/s10648-013-9243-1.
Willingham, Daniel T. 2008. “Critical Thinking: Why Is It So Hard to Teach?” Arts Education Policy Review 109 (4): 21–32. https://doi.org/10.3200/AEPR.109.4.21-32.

Footnotes

  1. “Generative model” is a claim at Marr’s computational level (Griffiths et al. 2010). It characterises the problem the cognitive system is solving (inferring latent structure from observed data) without specifying the neural algorithms that implement the solution. The claim is that the functional role of knowledge is best understood as a model that generates predictions, not that the brain literally runs a probabilistic program.↩︎

  2. Computational work on amortized inference (Gershman and Goodman 2014; Dasgupta et al. 2020) provides a formal account of this process: the brain learns recognition routines, trained by accumulated experience, that approximate good inferences rapidly.↩︎

Reuse