Series · Part 2 of 2 Understanding LLMs · Start from Part 1: Meet the LLM →
A young boy reading a book in a large library

The Curious Kid Inside the Machine: How LLMs Understand Without Truly Knowing

In the last piece, we met four characters living inside an LLM. Number four was The Curious Kid. I said it was the best one. I did not explain why. This is the explanation.

Imagine a child who, from the age of three, reads every book in every library in the world. Not just the children's section. All of them. History, science, medicine, poetry, cooking, engineering, law, football, music, space. Every conversation ever written down. Every story. Every instruction manual. Every letter, email, and Wikipedia article that humans have ever put into words.

By the time this child is ready to talk to you, they have read more than any human being could read in a hundred lifetimes.

Now here is what this child does well. You show them a sentence that stops halfway through, and they can complete it... because they have seen ten thousand sentences that started the same way. You ask them about a topic they have never spoken about before, and they can connect it to six other topics they have read about. You explain a problem, and they find a pattern that matches something from an entirely different subject.

This child is incredibly useful to talk to. They make connections you would never have thought of. They explain things in five different ways until one works for you. They ask questions that make you realise you had been thinking about something the wrong way.

But here is something important about this child. They have never left the library.

When I was working out of a small office in Pune for six months, we had an intern who memorised 400 pages of technical documentation but didn't know how to turn on the coffee machine. It was exactly like that.

They have read about pain but never felt it. They have read about kindness but never chosen it. They have read about lying, but honestly, they don't have any particular reason to avoid it or understand why humans care so much about it. They can describe the taste of a mango in four paragraphs because they have read those paragraphs... but they have never tasted one. They know how the word true functions in a sentence, but they don't really have any grasp of what true actually means out here in the physical world.

They are not pretending to understand. They genuinely understand everything they have read. That is different from understanding everything that is real.

A teenage student at a large chalkboard covered in interconnected words and arrows, Pattern Recognition at the centre, standing back to see all the connections
Pattern recognition is not the same as understanding. But the distance between them is smaller than we assumed.

Wait, I am getting ahead of myself.

And there is one more thing about how this child actually learned. They did not learn by memorising definitions. They learned by seeing which words came after which other words, millions of times, until they could predict the next one. They learned by noticing patterns. Not rules. Patterns.

"The sky is ___." They have seen that sentence end with "blue" more times than any other word. So they say "blue." They did not look up the definition of sky. They noticed what word fits that gap most of the time; and they fit it.

That is the mechanism. Not memory. Not logic. Pattern, then prediction.

I am not sure why I keep coming back to this. Maybe it does not matter.

The mapping

"This is not a metaphor. The child in the library is not like an LLM. The child in the library is how an LLM works. The mechanism is the same. Scale it up by about one trillion, and you have the machine."

Now let us go through it precisely. Without jargon.

An LLM... a Large Language Model... is trained on an enormous amount of text. We are talking about most of the internet, billions of books, millions of conversations; essentially everything humans have written down and made available digitally. The training dataset for a modern LLM contains more words than a human being could read in thousands of lifetimes. That is the library.

What is a token? A token is a small piece of language; not always a full word. "Curious" is one token. "ly" might be another. "New York" might be two tokens or one depending on how it appears. The model does not read letters or words the way you do. It reads these chunks. A single sentence might be 15 tokens. A paragraph might be 80.

The model does not learn definitions. It learns patterns between tokens. It sees that the token "water" is very often followed by "is", "flows", "bottle", "boils." It sees that "the sky" is very often followed by "is blue" or "turned dark" or "lit up." It sees these patterns trillions of times until it can predict, with high accuracy, what token is most likely to come next given everything that came before.

This is what an LLM is doing when it answers your question. It is not looking up an answer in a database. It is not thinking through logic step by step the way you might solve a maths problem. It is predicting; one token at a time; what the most likely and appropriate next word is, given the entire conversation so far. The output you read is the result of millions of tiny predictions chained together.

What is attention? (the Transformer part) When you are in a classroom and the teacher is talking, you do not pay equal attention to every word. You pay more attention to the words that are relevant to what you are currently trying to understand. If the teacher says "the most important thing is..." your brain tunes in more sharply. This selective focus is called "attention." Transformers; the architecture that powers most modern LLMs; give the model the same ability. When predicting the next token, the model pays different amounts of attention to different parts of the conversation, depending on what is most relevant right now. This is what allows it to keep track of a long conversation without losing the thread.

Now here is the part that connects back to the child in the library. The LLM has read everything. It understands patterns between ideas across millions of subjects. So when you ask it to connect your engineering problem to something from biology, or explain physics through cricket, or find the flaw in your business argument... it can do that. Genuinely. The connections are real; not guessed.

But it does not know what is true in the world outside the text. It knows what is commonly written. It knows what fits the pattern. If something false was written down enough times, it learned the false thing. If something important is almost never written about, it does not know about it. Its "knowledge" is the world as described in text; not the world as it actually is.

This is why, in the previous piece, we met IKIA... the character who answers with complete confidence even when the answer is wrong. IKIA exists because the model predicts the most likely answer; not the true answer. And sometimes those are different things. Now you know why. The same mechanism that makes The Curious Kid brilliant is the mechanism that makes IKIA dangerous. They are not two different settings. They are the same engine running in two different situations.

This series

The child in the library is the most useful person in the room; as long as you remember they have never left the library. Their connections are real. Their patterns are accurate. Their predictions are often right. And sometimes they will tell you something with complete confidence that is simply not true; because it fit the pattern, and they have no way of knowing the difference.

The Curious Kid is not pretending to be smart. It is genuinely doing something impressive. What it cannot do is tell you when impressive and correct are not the same thing. That part is still your job. :)

← Part 1: Meet the LLM All writing
📚 6-Part Series: Understanding LLMs
01Meet the LLM 02The Curious Kid 03The Eloquent Speaker 04AI Workflows 05Augmented Engineer 06The Three Zones