Glossary
Token
Un token est une unité de texte utilisée par les modèles de langage pour "lire" et "comprendre" les mots. Un token n’est pas forcément un mot entier, c’est plutôt une portion de texte (un mot, une syllabe, ou un caractère).
🔡 What is a token in an LLM?
A token is a unit of text used by language models to "read" and "understand" words. A token is not necessarily a full word — it's a chunk of text (a word, a syllable, or even a character).
🔧 How is text split into tokens?
LLMs use a tokenizer, a tool that breaks raw text into smaller pieces based on statistical rules. For example:
Text | Tokens |
---|---|
"Bonjour tout le monde" | "Bonjour", " tout", " le", " monde" → 4 tokens |
"anticonstitutionnellement" | "anti", "const", "itution", "nel", "lement" → ~5 tokens |
"42" | "42" → 1 token |
Emoji 😊 | "😊" → 1 token |
The way text is tokenized depends on the model (GPT, Claude, etc.) and its specific tokenizer.
💰 What does this have to do with LLM pricing?
💸 The cost is calculated based on the number of tokens used.
Each call to an LLM is billed according to:
- Input tokens: everything you send (instructions, text, history…)
- Output tokens: everything the model sends back
🔁 The total cost = input + output tokens.
📦 Example with GPT-4-turbo (June 2025, OpenAI public pricing):
Model | Price per 1,000 input tokens | Price per 1,000 output tokens |
---|---|---|
GPT-4-turbo | $0.01 | $0.03 |
GPT-3.5-turbo | $0.001 | $0.002 |
📊 Real-world example:
Scenario:
You send a prompt + chat history totaling 800 tokens, and the model replies with 1,200 tokens.
Calculation:
- 800 input tokens → $0.008
- 1,200 output tokens → $0.036
➡️ Total = $0.044 for this request
🧠 Why does it matter?
- Cost optimization: Managing your token usage helps control expenses.
- Quality control: More tokens ≠ better answers. Long answers can be vague; short ones can be impactful.
- Context window limits: GPT-4-turbo accepts up to 128,000 tokens (~300 pages), but older or free models are much more limited.
🛠️ Best practices for managing tokens
- Write clear and concise prompts.
- Avoid redundant context or repeated examples.
- Limit overly long outputs unless necessary (e.g., a summary ≠ a full transcript).
- Monitor token count using available tools (e.g., OpenAI Tokenizer).