Tool Cost Management Python Library Beginner Friendly

OpenAI Tokenizer & Tiktoken

← Back to all resources

Interactive Tool: https://platform.openai.com/tokenizer
Python Library: https://github.com/openai/tiktoken

~4

Chars per token (English avg)

¾

English words = ~1 token

o200k

Encoding for GPT-5.4 family

Free

Both tools are free to use

What is this resource?

This entry covers two closely related tools that address one of the most commonly misunderstood fundamentals of building with LLM APIs: token management. The first is the OpenAI Tokenizer, an interactive web tool where you can paste any text and instantly see how the model breaks it into tokens — displayed with color-coded highlighting, a total token count, and the numerical token IDs. The second is tiktoken, an open-source Python library from OpenAI that performs the same tokenization programmatically, letting you count tokens in your code before making an API call.

Understanding tokens is not optional for anyone building applications on top of LLM APIs. Every API provider charges by the token, enforces context window limits measured in tokens, and structures its responses around token usage. Without understanding what a token is and how your text maps to tokens, you cannot accurately predict costs, prevent context window overflow errors, or implement efficient conversation history management. These two tools — the visual tokenizer for building intuition and tiktoken for programmatic control — are the practical solution to those problems.

What's in it?

A token is the basic unit that language models read and generate. It is not a word, a character, or a syllable — it is a chunk of text determined by the model's vocabulary, called an encoding. Common English words are often single tokens ("the" = 1 token, "cat" = 1 token), but less common words get split across multiple tokens ("tokenization" = 3 tokens: "token", "ization", might vary). Punctuation, numbers, and code are tokenized differently from prose, often at higher token counts per character. Non-English text is typically tokenized less efficiently — a sentence in Japanese or Arabic may use 2–5× as many tokens as the equivalent English sentence of the same semantic content.

The interactive Tokenizer tool at platform.openai.com/tokenizer lets you toggle between different encodings and see exactly how the same text tokenizes differently across models. There are two encodings you'll use in practice: o200k_base — used by all modern models including the entire GPT-5.4 family (gpt-5.4, gpt-5.4-mini, gpt-5.4-nano), GPT-5.3, as well as the older GPT-4o and GPT-4.1 models — and cl100k_base, used only by legacy models like GPT-4, GPT-3.5-turbo, and text-embedding-ada-002. If you're building anything in 2026, you're using o200k_base. The o200k encoding has a larger vocabulary (200k tokens vs 100k) and tokenizes non-English text, code, and whitespace more efficiently than its predecessor.

The tiktoken library on GitHub is the programmatic equivalent. After pip install tiktoken, you load an encoding by name, call encode() on any string, and get back a list of integer token IDs. The length of that list is the token count. You can also use tiktoken.encoding_for_model("gpt-4.1") to automatically get the right encoding for a specific model — tiktoken maps model names to their encoding internally. The README explains the API and includes ready-made examples for the most common use cases: counting tokens in a plain string, counting tokens in a full chat conversation (which includes overhead tokens the message format adds), and truncating text to fit within a budget.

How is it relevant to your purpose?

Token management is the first thing that surprises developers who are new to building with LLM APIs. Most people intuitively expect to be charged per message or per word, but APIs charge by token, and the token count for a single request includes far more than just the user's most recent input. It includes the entire conversation history, the system prompt, and the model's response. In a chatbot with 20 back-and-forth turns, by the 20th turn you might be sending 5,000+ tokens per request even if each individual message is short — because all 19 previous turns are re-sent as context.

Without a token counter, this accumulation is invisible until you see an unexpectedly high API bill or — worse — get a context_length_exceeded error at runtime because the conversation grew longer than the model's context window allows. Tiktoken gives you the ability to check token counts before every API call, implement a sliding window that trims the oldest messages when the count gets too high, and log token usage per request so you can profile and optimize your application. For a developer building any kind of conversational AI feature, these aren't optional optimizations — they're architectural necessities.

Real cost example (2025 pricing)

A GPT-5.4 system prompt of 500 words is roughly 650 tokens. If your chatbot sends that on every request and you get 10,000 daily users each sending 5 messages, that's 32.5 million tokens just from the system prompt per day — about $81/day at GPT-5.4 input pricing ($2.50/1M). Switch to GPT-5.4 mini and it drops to $24/day. Enable OpenAI's prompt caching (50% discount on cached tokens) and it falls to ~$12/day. Drop to GPT-5.4 nano ($0.20/1M) and it's under $2/day. Tiktoken makes these calculations trivial — use it to run the math before you deploy, not after you see the bill.

Tips for using this resource effectively

Use the interactive tokenizer first — paste in your full system prompt plus a representative sample user conversation and look at the total. This one exercise gives you more practical intuition about token costs than reading any documentation.
Install tiktoken at the start of any project: pip install tiktoken. Add a helper function that counts tokens before every API call and logs the result, even in development. You want this data from day one, not after you've already deployed.
Use o200k_base for all current OpenAI models (GPT-5.4, GPT-5.4-mini, GPT-5.4-nano, GPT-4o, and newer). Use cl100k_base only for legacy models (GPT-4, GPT-3.5-turbo, text-embedding-ada-002). When in doubt, call tiktoken.encoding_for_model(your_model_name) and let tiktoken pick automatically.
For chat conversations, the token count isn't just the sum of all message strings — the chat format itself adds a small overhead per message. The OpenAI Cookbook on GitHub has a ready-made function (num_tokens_from_messages()) that handles this correctly; use it rather than writing your own naive counter.
Implement a message trimming strategy early. A common approach is to remove the oldest user/assistant turns until the count is under a safe threshold before each API call. Tiktoken makes it easy to check the count after each trim step.

Recommended Watch

Let's Build the GPT Tokenizer — Andrej Karpathy

A deep dive into how BPE tokenization actually works, built from scratch by Andrej Karpathy (former OpenAI). Goes far beyond how to use tokenization — explains why it works the way it does. Essential viewing for any serious AI developer.

Counting Tokens with Tiktoken

Install with pip install tiktoken. The second example shows a production-ready function for counting tokens in a full chat conversation, including the per-message overhead.

import tiktoken

# --- Basic token counting ---
enc = tiktoken.get_encoding("o200k_base")  # GPT-5.4 family, GPT-4o, and all current models
# Use cl100k_base only for older models: GPT-4, GPT-3.5-turbo, ada-002

text = "Hello! How many tokens does this sentence use?"
tokens = enc.encode(text)
print(f"Token count: {len(tokens)}")       # 10
print(f"Token IDs:   {tokens}")

# --- Count tokens for a full chat conversation (incl. overhead) ---
def count_chat_tokens(messages, model="gpt-5.4"):
    """Returns total token count for a messages[] array."""
    enc = tiktoken.encoding_for_model(model)
    tokens_per_message = 3  # every message adds ~3 tokens of overhead
    total = 3               # every reply is primed with 3 tokens
    for msg in messages:
        total += tokens_per_message
        for value in msg.values():
            total += len(enc.encode(value))
    return total

# Example usage before making an API call
conversation = [
    {"role": "system",    "content": "You are a helpful assistant."},
    {"role": "user",      "content": "What is machine learning?"},
    {"role": "assistant", "content": "Machine learning is a subset of AI..."},
    {"role": "user",      "content": "Can you give me a code example?"}
]
print(f"Total tokens before API call: {count_chat_tokens(conversation)}")

← PreviousHugging Face Next →LangChain Documentation