In out latest guest blog, Mike Goves of the Oxfordshire Teaching School Hub & River Learning Trust begins to explore the mechanisms and research behind artificial intelligence models and highlights how the TDT values can support us in being critically selective of its use in education.

Good advice typically involves starting with ‘why’. Defining the purpose before devoting resources of any kind to action. Because this blog is about artificial intelligence (AI), I think it should actually start with ‘how’. Let me explain.

With so much hype around AI globally, you’d be forgiven for suffering from cognitive overload from the sheer volume of information, publications, new developments, and even apps (like the GPT store – an equivalent of the app store but for ChatGPT) in circulation. But shouldn’t we know a bit about how they actually work before launching into using any of these offerings? I don’t know exactly how a car or a Wi-Fi router works these days, but that doesn’t really matter so long as they do their job. And that’s the point. With AI, I do want to know something about how it works to help figure out if it is doing a job I want/need well. I can ask anything frankly, but how do I know if its response is any good?

When given a query/task (called a prompt), the AI chops that into loads of permutations (to make sense of the input) and generates what it thinks is the most likely response you would expect based on a HUGE number of relevant examples in its database. It’s like yelling a question in Trafalgar Square (very loudly, to all of London) and getting an average response from millions of people. You can then follow the response up, like a conversation. For brevity, I’m referring to large language models (LLMs) as a type of generative AI, such as ChatGPT, given their increasingly widespread use. 


Let’s look a little deeper.

“The city councilmen refused the demonstrators a permit because they [feared/advocated] violence.” Does ‘they’ refer to the councilmen or the demonstrators? That depends on whether the sentence uses ‘feared’ or ‘advocated’. This seems obvious. But to a computer, this is a trickier test. At least it was.

This example comes from the Winograd Schema Challenge, devised by Hector Levesque in 2012. The test is named after Terry Winograd, a Stanford professor of Computer Science who sought better ways to evaluate AI beyond the critiqued Turing Test. As of around 2014, generative AI can reliably understand these types of situations. 

A new evaluation challenge called ‘Evograd’ now exists to test common-sense reasoning further. At the time of writing, GPT-3.5 solves Evograd challenges 65.0% of the time, compared to 92.8% for humans (for details, see Sun & Emami, 2024).


Why does this matter, and how does it relate to education? 

On one hand, generative AI is simply predictive text on steroids, using very large word strings (or other inputs like images) to predict what the next item (called a token) should be. Its strength comes from enormous datasets, clarity of prompt, and our ability to give it feedback. Over time, its responses look fantastic, as the output reaches something we find useful.

Indeed, using AI (from copilots to Google & Microsoft integrations) can provide tremendous productivity gains, support expertise development, and save time. I use Consensus to help with academic research and Perplexity to converse with because it shows insights into how it understood the prompt, provides sources with its answers so I do my own fact and reliability checking, and gives follow-up prompts I can choose from to explore more. Specifically for education, Mollick & Mollick (2023) detail seven uses for ChatGPT, suggesting how the uses could work in practice, stating relevant pitfalls/risks (see Table 1): 


They helpfully show their prompts, too, even colour-coding them against five criteria (see key below). Here’s an interesting AI Coach example for leaders, illustrating the importance of a pre-mortem before implementing a plan:


In a similar vein, TDTs ‘Teacherverse’ will undoubtedly prove useful for many teachers, using a scenario-based conversational approach to simulate what they would do in certain situations, getting plausible feedback from multiple perspectives (e.g. students, coach). In doing so, they can build their mental models in as close to an authentic environment as practicably possible. 

On the other hand, AI is not – and can never be – actually human. It is simply not true to assume machine deep learning IS a neural network. It’s a model. Models are approximations. In this case, going from a single neuron:

Image 1: Modelling a Neuron: (source


To a deep learning network:

Image 2: Modelling a Network of Neurons (source)


To be clear, in modelling neurons, we’re mimicking their output, not replicating their function. 

Language is at the heart of all this. Language has more human information per bit than potentially any other form of data (Bzdok et al., 2024). This is a startling claim. But why might that be? Well, my hypothesis is that the processes we use to generate responses – human cognition – are largely (some argue entirely) emotion-based. As Mary Helen Immordino-Yang wrote in a New York Times interview in 2016, “It is literally neurobiologically impossible to build memories, engage complex thoughts or make meaningful decisions without emotion.” She expands on this in numerous research studies and her book, ‘Emotions, Learning and The Brain.’

This is one reason AI suffers from confabulation/hallucination. Apart from it being designed to be conversational (vs quoting factual information back verbatim, which would be classed as ‘overfitting’), our use of language doesn’t follow strict logic rules you can learn, like some sort of game. Ludwig Wittgenstein found this out when devoting his life to the logic and philosophy of language. This makes sense when we realise words are expressions of sentiment. After all, why is it that so many different interpretations are routinely observed in response to exactly the same language? That’s before we even get to how something is said.

Equally, attempts to code emotions will never BE emotions. We can just mimic sentiment to the point we are satisfied it’s close enough to be useful. More controversially, we can’t code ethics for similar reasons. What is ‘good’ or ‘right’ to one person can vary to another. How do you represent in code whether it’s ok for an autonomous car to avoid a nail in the road at the expense of hitting a cat? What if it’s your cat, a stray, or very old? Insert any moral dilemma. We could go on – infinitely – which is the point. We’d need infinite inputs and still disagree on the ‘right’ course of action.

We mustn’t forget or replace the importance of human connection. In a world that is, in many ways, smaller than ever, we find significant challenges with belonging and connectedness. We are a social species. People need people. The culture we work in matters. Productivity gains are great, but we would be wise to avoid overly reductionist computational approaches to learning and building relationships. Our nuanced use of emotion is one of our strengths.

This is why TDT’s values are so helpful and pertinent. In being smart with AI, we can understand what it’s actually doing, guiding us in how we should both value and challenge its output. With heart, we are reminded that human cognition and computational thinking are not the same. People have desires. People feel. We behave with moral and civic virtues shaped by those of others in a dynamic society. It is the combination of these that allows for humility. Always learning and curious, but ready to adapt and engage with complexity. All the while staying anchored in what it means to be human. 



  • Bzdok et al., (2024) Data science opportunities of large language models for neuroscience and biomedicine, Neuron. 7:S0896-6273(24)00042-4. doi: 10.1016/j.neuron.2024.01.016. Epub ahead of print. PMID: 38340718.  Accessible at: 
  • Immordino-Yang, M. H. (2016). Emotions, learning, and the brain: Exploring the educational implications of affective neuroscience. W. W. Norton & Company.
  • Immordino-Yang, M.H. (2016). To Help Students Learn, Engage The Emotions. Available at: [accessed 28th February 2024).
  • Mollick, E., & Mollick, L., (2023) ASSIGNING AI: SEVEN APPROACHES FOR STUDENTS WITH PROMPTS, Wharton School of the University of Pennsylvania & Wharton Interactive.
  • Sun & Emami (2024) EvoGrad: A Dynamic Take on the Winograd Schema Challenge with Human Adversaries. University of Montreal/Mila, Brock University Montreal, Canada, Saint Catharines, Canada. Arxiv:2402.13372v2.
  • Apart from asking AI itself to know more about how it works, guides like this could help.


Mike Goves qualified as a science teacher in 2005, since teaching in maintained and independent sectors, and was recognised as a Top Overseas Teacher by the Ministry of Education, Singapore, in 2011. As a researcher, his expertise is in the fields of neuroscience, cognitive psychology, and artificial intelligence. In 2023, he completed a Master’s Degree in the cognitive science of expertise development. He leads professional development for River Learning Trust, is the NPQ Lead for Oxfordshire Teaching School Hub and is a national judge for the Teaching Awards Trust.