What an LLM Actually Is
Strip away the magic. An LLM is a probability machine for the next token.
A Large Language Model is, at its core, a single function with a single job: given some text, predict the next token.
That's it. Everything else — chat, code generation, reasoning, translation, agents — is built on top of that one primitive, repeated thousands of times in a loop.
The next-token game
Imagine playing a game where someone shows you a sentence with the last word missing, and you have to guess it. After billions of rounds, you'd get freakishly good. You'd notice that "The cat sat on the ___" is almost always "mat" or "floor." You'd notice that "France's capital is ___" is "Paris." You'd notice that the word after "function" in JavaScript code is usually a name, then (.
LLMs play that game at a scale humans can't comprehend — trillions of examples, hundreds of billions of parameters tuning their guesses. The model doesn't "know" anything in the way a database knows facts. It has internalised the statistical shape of human language so well that simulating an answer and knowing an answer become hard to tell apart.
Text → Tokens → Token IDs
Each token (including its leading space) becomes a single integer the model can look up.
Why this matters
Once you accept that LLMs are next-token predictors, three things stop being surprising:
- Hallucinations — if the next plausible-sounding token is wrong, the model says it anyway. There is no fact-checker inside.
- Sensitivity to phrasing — small wording changes shift the probability distribution and so the output.
- Chain-of-thought helps — writing intermediate steps gives the model more context tokens to condition on, making the next prediction better.
The chat illusion
ChatGPT, Claude, Gemini — they look like conversations, but underneath, every "reply" is generated one token at a time. The model reads your message + everything said so far, picks the most likely next token, appends it, then repeats. Stop generating, and the illusion of "thinking" collapses back into pure prediction.
This is the foundation. Every advanced topic you'll see — RAG, agents, fine-tuning — is a wrapper around this single mechanic.