Demo

Artificial Intelligence

How an LLM works: 3D pipeline visualization

Tokenization, embedding, self-attention and output in one interactive demo. Watch what happens inside a Large Language Model, step by step.

A Large Language Model does not “understand” text in the human sense. It transforms text into sequences of numbers, projects them into high-dimensional spaces, computes relationships between every word and every other word, then generates the most probable next token. This demo shows exactly these six stages, on a real input.

Each step of the pipeline is visualized in 3D: spheres represent tokens, lines represent attention relationships, colors encode the identity of each text fragment. The side panel explains the math behind what you see.

The six pipeline stages

From raw text to the generated token, the model performs six distinct transformations. Each one adds or refines information. Understanding them means understanding why models behave the way they do.

① Tokenization

The text is split into fragments called tokens. Each token becomes a numeric ID in the model's vocabulary. The model never sees words: it sees integers.

② Embedding

Each ID is projected into a dense high-dimensional vector. Semantically similar tokens end up in nearby regions of the space. This is where meaning becomes geometry.

③ Positional Encoding

The Transformer processes all tokens in parallel. To preserve order, it adds a position vector to each embedding. Without this, "dog bites man" and "man bites dog" would be identical.

④ Self-Attention

Each token queries all the others: how relevant are you to me? The answers become weights that modulate the final representation. This is the heart of the Transformer.

⑤ Feed-Forward

After attention, each token passes through a two-layer dense network. It introduces non-linearity and increases representational capacity. It repeats for every layer in the stack.

⑥ Output

The final vector is projected onto the vocabulary, producing a score for every possible token. Softmax converts it into probabilities. The token with the highest probability is selected.

What you learn from this demo

  • Why an LLM reads numbers, not words, and what this implies for model limitations
  • How self-attention lets the model "understand context" without explicit memory
  • Why positional encoding is essential: without order, different sentences produce the same result
  • How the output probability distribution explains the non-deterministic behavior of models
  • What changes between low temperature (deterministic) and high temperature (creative) sampling

How to use the demo

Type any text in the top bar and press PROCESSA. The (simulated) model tokenizes the input and computes embeddings, attention and output. Navigate through the six stages using the Prec and Succ buttons, or use Auto-Play for the automatic sequence.

The right panel shows the mathematical explanation for each stage. The center column rotates the 3D scene to show spatial relationships. Try different inputs and observe how positions, connections and probabilities change.

Want to apply these concepts in your career?

Understanding how LLMs work internally sets you apart as a developer. The Software Architecture course teaches you to make solid technical decisions, even when working with AI.

Try the demo

Enter a text, select a stage and watch the pipeline in 3D. Use Auto-Play for the full sequence or navigate manually stage by stage.

LLM PIPELINE 3D VISUALIZER
INPUT: Step 1 / 6
LLM PIPELINE
X · Y · Z — 3D PCA projection
🖱
Click to interact
Scroll: zoom · Drag: rotate
STEP 01
Tokenization
ATTENTION MATRIX
P(NEXT TOKEN | CONTEXT)