Why Did My AI Content Bill Suddenly Drop by Half?
Last quarter I opened my Anthropic billing dashboard and stared at it for a minute. My spend on Claude had fallen by almost half, even though I was producing more client content than the month before. I had not switched models or written shorter prompts. The only thing I had changed was turning on prompt caching in my content pipeline. That one setting did more for my margins than any pricing tweak I made all year.
If you run a Webflow practice and lean on AI to draft, edit, or research content, your token bill is quietly becoming a real line item. I work with founders who now spend more on AI than they do on hosting. The reason most of them overpay is simple: they send the same long instructions, brand guidelines, and reference material to the model on every single request, and they pay full price for it every time.
In this article I want to walk through what prompt caching is, how much it actually saves, and how I wired it into a Webflow content workflow without building a backend. I will share the real numbers I see and the mistakes that waste the savings.
What Is Prompt Caching, Really?
Prompt caching lets an AI model store the unchanging part of your prompt so it does not reprocess it on every request. You send your brand voice, style rules, and reference docs once, the model caches them, and follow-up requests reuse that cached context at a fraction of the cost. It is a discount for repetition.
Here is the mental model I use. Every prompt I send Claude has two parts. The first part is stable: my client's brand voice guide, the article rules, the examples of good output. The second part changes: the specific topic I want drafted today. Without caching, I pay full input price for both parts every time. With caching, the stable part gets stored and billed at a steep discount on every reuse.
This matters most for the exact pattern a Webflow content practice runs on. I draft dozens of pieces against the same instructions. That stable block might be three thousand tokens of rules and examples. Multiply that by every request in a month and you can see where the money goes.
How Much Does Prompt Caching Actually Save?
The savings are larger than most people expect. According to Anthropic's Claude API documentation, cache read tokens are billed at 0.1 times the base input price, which is a 90 percent discount on the cached portion. The engineering team at ProjectDiscovery reported in 2026 that prompt caching cut their total large language model costs by 59 percent, climbing to 70 percent after they tuned it.
There is a small upfront cost. Anthropic's docs note that a five minute cache write costs 1.25 times the base input price, and a one hour write costs 2 times. So the first request that creates the cache costs a bit more than normal. But it pays for itself after a single cache hit, because every later read is 90 percent cheaper.
For my own practice, the math is plain. If my stable instruction block is most of each prompt and I reuse it across an entire batch of articles, I am paying full price once and a tenth of the price every time after. That is the difference I saw on my Anthropic bill.
How Does Prompt Caching Work With a Webflow Content Workflow?
Caching fits a Webflow content workflow because that work is repetitive by design. You produce many pages against one set of rules, so you can cache the rules and only pay full price for the part that changes. The trick is structuring your prompt so the stable content always comes first.
When I brief Claude to draft a Webflow blog post, I put the client brand voice, the SEO rules, and three example articles at the top of the prompt and mark that section as cacheable. The specific topic, the target keyword, and the internal links go at the bottom. The model caches everything above the line and reuses it for the next nine articles in the batch.
This is the same principle behind how I think about an AI model router for content costs, which I covered in my guide on whether you should use an AI model router to cut Webflow content costs. Caching and routing solve the same problem from two directions: routing picks a cheaper model, while caching makes your chosen model cheaper to run.
What Should You Put in the Cached Part of Your Prompt?
Cache anything that stays the same across many requests and is large enough to matter. Brand voice guides, writing rules, JSON output schemas, and reference examples are ideal. Keep the cached block at the start and put the changing instruction, like today's topic, at the very end so the cache stays intact.
I cache four things for every client. The first is the brand voice document. The second is my standing list of content rules, the same rules that keep tone and structure consistent. The third is a JSON schema that defines the output shape, which connects directly to why JSON Schema has become the new brief format for Webflow AI workflows. The fourth is two or three example articles that show the model what good looks like.
What you must not do is shuffle that block. Caches match on an exact prefix. If you change one word near the top, the cache misses and you pay full price again. I learned to treat the cached block as frozen and edit only the tail of the prompt.
Does Prompt Caching Work the Same Way Across Claude, GPT, and Gemini?
No, the three big providers handle caching differently, and the differences affect cost. Anthropic's Claude requires you to mark cache breakpoints explicitly. OpenAI's GPT models cache automatically once a prompt passes a token threshold. Google's Gemini offers both implicit and explicit context caching. The savings are real on all three, but the controls differ.
I use Claude for most client drafting, so I rely on explicit breakpoints, which gives me precise control over what gets cached. When I run GPT-5.5 for a second opinion, caching happens on its own for long prompts, so I just make sure the stable content sits at the front. The practical lesson is the same everywhere: stable first, variable last.
But Does the Cache Expire Too Fast to Be Useful?
This is the objection I hear most, and it is fair. Short caches do expire quickly. Anthropic's default cache lives about five minutes, with a one hour option at a higher write cost. But a Webflow content batch is exactly the kind of dense, back-to-back work that fits inside that window.
When I sit down to produce a batch of articles, I am sending requests every minute or two. Each request refreshes the cache window, so the stable block stays warm for the whole session. The cache only goes cold when I walk away. For spread-out work, I use the one hour cache and still come out ahead because the write premium is tiny next to the read savings.
How Do You Set This Up Without Writing a Backend?
You do not need a server. The simplest setup is a single script that holds your cached block and accepts the changing topic as input. I run mine through Claude Code, which talks to the Anthropic API directly, so there is no backend to host or maintain. Tools like Cursor and the Claude desktop app can do similar work.
My setup is a small script with the brand voice, rules, and examples baked in as the cached prefix. I pass the day's topic as a command line argument. The script sends the request, the cache does its job, and the draft comes back. Connecting this to Webflow is the same job I described in my walkthrough on building an MCP content pipeline from Notion to Webflow, where the draft flows straight into the CMS through the Webflow MCP server.
How Do You Know If Caching Is Actually Working?
Check the usage object in the API response. Every Claude and GPT response reports token counts, including how many were cache reads versus fresh input. If your cache read count is high and your fresh input count is low after the first request, caching is working. If every request shows full input tokens, your prefix is changing and the cache is missing.
I watch two numbers. The first is the cache read token count, which should be large on every request after the first. The second is my weekly spend in the provider dashboard, which is the number that actually pays my rent. When ProjectDiscovery reported their 59 percent cut, they got there by watching these same signals and fixing prefix drift.
How to Start Using Prompt Caching This Week
Start by finding the repetition in your own prompts. Open the last ten prompts you sent an AI model for client work and highlight the parts that were identical every time. That identical block is your cache candidate. Next, restructure one prompt so that block sits at the top and your changing instruction sits at the bottom. Then turn caching on for your provider and run a small batch while watching the cache read tokens in the response.
For the cost side of this decision, my piece on whether you should use an AI model router to cut Webflow content costs pairs well with caching, and if you want the output to flow straight into your site, my guide on building an MCP content pipeline from Notion to Webflow shows the plumbing. Together they turn a messy AI bill into a predictable one.
If you want help wiring caching into your own Webflow content workflow, I am happy to walk through my setup and show you the real numbers. Let's chat.
Get your website crafted professionally
Let's create a stunning website that drive great results for your business
Read more blogs
Get in Touch
This form help clarify important questions in advance.
Please be as precise as possible as it will save our time.