OpenAI Tokenizer & Tiktoken
Interactive Tool: https://platform.openai.com/tokenizer
Python Library: https://github.com/openai/tiktoken
~4
Chars per token (English avg)
¾
English words = ~1 token
o200k
Encoding for GPT-5.4 family
Free
Both tools are free to use
What is this resource?
This entry covers two closely related tools that address one of the most commonly misunderstood fundamentals of building with LLM APIs: token management. The first is the OpenAI Tokenizer, an interactive web tool where you can paste any text and instantly see how the model breaks it into tokens — displayed with color-coded highlighting, a total token count, and the numerical token IDs. The second is tiktoken, an open-source Python library from OpenAI that performs the same tokenization programmatically, letting you count tokens in your code before making an API call.
Understanding tokens is not optional for anyone building applications on top of LLM APIs. Every API provider charges by the token, enforces context window limits measured in tokens, and structures its responses around token usage. Without understanding what a token is and how your text maps to tokens, you cannot accurately predict costs, prevent context window overflow errors, or implement efficient conversation history management. These two tools — the visual tokenizer for building intuition and tiktoken for programmatic control — are the practical solution to those problems.
What's in it?
A token is the basic unit that language models read and generate. It is not a word, a character, or a syllable — it is a chunk of text determined by the model's vocabulary, called an encoding. Common English words are often single tokens ("the" = 1 token, "cat" = 1 token), but less common words get split across multiple tokens ("tokenization" = 3 tokens: "token", "ization", might vary). Punctuation, numbers, and code are tokenized differently from prose, often at higher token counts per character. Non-English text is typically tokenized less efficiently — a sentence in Japanese or Arabic may use 2–5× as many tokens as the equivalent English sentence of the same semantic content.
The interactive Tokenizer tool at platform.openai.com/tokenizer lets you toggle between different encodings and see exactly how the same text tokenizes differently across models. There are two encodings you'll use in practice: o200k_base — used by all modern models including the entire GPT-5.4 family (gpt-5.4, gpt-5.4-mini, gpt-5.4-nano), GPT-5.3, as well as the older GPT-4o and GPT-4.1 models — and cl100k_base, used only by legacy models like GPT-4, GPT-3.5-turbo, and text-embedding-ada-002. If you're building anything in 2026, you're using o200k_base. The o200k encoding has a larger vocabulary (200k tokens vs 100k) and tokenizes non-English text, code, and whitespace more efficiently than its predecessor.
The tiktoken library on GitHub is the programmatic equivalent. After pip install tiktoken, you load an encoding by name, call encode() on any string, and get back a list of integer token IDs. The length of that list is the token count. You can also use tiktoken.encoding_for_model("gpt-4.1") to automatically get the right encoding for a specific model — tiktoken maps model names to their encoding internally. The README explains the API and includes ready-made examples for the most common use cases: counting tokens in a plain string, counting tokens in a full chat conversation (which includes overhead tokens the message format adds), and truncating text to fit within a budget.
How is it relevant to your purpose?
Token management is the first thing that surprises developers who are new to building with LLM APIs. Most people intuitively expect to be charged per message or per word, but APIs charge by token, and the token count for a single request includes far more than just the user's most recent input. It includes the entire conversation history, the system prompt, and the model's response. In a chatbot with 20 back-and-forth turns, by the 20th turn you might be sending 5,000+ tokens per request even if each individual message is short — because all 19 previous turns are re-sent as context.
Without a token counter, this accumulation is invisible until you see an unexpectedly high API bill or — worse — get a context_length_exceeded error at runtime because the conversation grew longer than the model's context window allows. Tiktoken gives you the ability to check token counts before every API call, implement a sliding window that trims the oldest messages when the count gets too high, and log token usage per request so you can profile and optimize your application. For a developer building any kind of conversational AI feature, these aren't optional optimizations — they're architectural necessities.
Real cost example (2025 pricing)
A GPT-5.4 system prompt of 500 words is roughly 650 tokens. If your chatbot sends that on every request and you get 10,000 daily users each sending 5 messages, that's 32.5 million tokens just from the system prompt per day — about $81/day at GPT-5.4 input pricing ($2.50/1M). Switch to GPT-5.4 mini and it drops to $24/day. Enable OpenAI's prompt caching (50% discount on cached tokens) and it falls to ~$12/day. Drop to GPT-5.4 nano ($0.20/1M) and it's under $2/day. Tiktoken makes these calculations trivial — use it to run the math before you deploy, not after you see the bill.
Recommended Watch
Let's Build the GPT Tokenizer — Andrej Karpathy
A deep dive into how BPE tokenization actually works, built from scratch by Andrej Karpathy (former OpenAI). Goes far beyond how to use tokenization — explains why it works the way it does. Essential viewing for any serious AI developer.
Counting Tokens with Tiktoken
Install with pip install tiktoken. The second example shows a production-ready function for counting tokens in a full chat conversation, including the per-message overhead.
import tiktoken
# --- Basic token counting ---
enc = tiktoken.get_encoding("o200k_base") # GPT-5.4 family, GPT-4o, and all current models
# Use cl100k_base only for older models: GPT-4, GPT-3.5-turbo, ada-002
text = "Hello! How many tokens does this sentence use?"
tokens = enc.encode(text)
print(f"Token count: {len(tokens)}") # 10
print(f"Token IDs: {tokens}")
# --- Count tokens for a full chat conversation (incl. overhead) ---
def count_chat_tokens(messages, model="gpt-5.4"):
"""Returns total token count for a messages[] array."""
enc = tiktoken.encoding_for_model(model)
tokens_per_message = 3 # every message adds ~3 tokens of overhead
total = 3 # every reply is primed with 3 tokens
for msg in messages:
total += tokens_per_message
for value in msg.values():
total += len(enc.encode(value))
return total
# Example usage before making an API call
conversation = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is machine learning?"},
{"role": "assistant", "content": "Machine learning is a subset of AI..."},
{"role": "user", "content": "Can you give me a code example?"}
]
print(f"Total tokens before API call: {count_chat_tokens(conversation)}")