Chrome 148 went stable on May 5, 2026, and the headline feature for any B2B SaaS site is the Prompt API. It exposes Google's on-device Gemini Nano model to any website with no API key, no token cost, and no data leaving the user's browser. For the 70 plus Webflow projects I have shipped through Phoenix Studio, the recurring blocker for an AI feature on a marketing page has been the same one. API key handling on a static site is awkward. Per-request token economics turn a free trial demo into a budget conversation by week three. The Chrome 148 Prompt API removes both blockers in Chrome, while leaving Safari, Firefox, and older Chrome installs without on-device AI. In this piece I walk through what shifts when on-device AI is real, what to ship today, and where the fallback path still matters.
What is the Chrome 148 Prompt API?
The Chrome 148 Prompt API is a browser-native JavaScript interface that lets any website prompt Google's on-device Gemini Nano model. It accepts text, image, and audio inputs and supports JSON schema-constrained output. It shipped to stable on May 5, 2026 across Android, ChromeOS, Linux, macOS, and Windows, and does not require any API key, token, or network round-trip.
The API is documented in the Chrome 148 release notes and the Chrome Developers blog post New in Chrome 148. The Web Machine Learning Working Group at W3C maintains the broader spec work that the Prompt API draws from. For Webflow Partners, the most important detail is that on first use, Chrome downloads the Gemini Nano model in the background, which is roughly 4 GB. After download, prompts run locally with low latency and zero per-request cost.
How does Gemini Nano run inside the browser?
Gemini Nano is a compact version of Google's Gemini model family designed to run on consumer devices. Chrome 148 ships it as a system-level browser resource that any website can call through the Prompt API. The model uses WebGPU on supported hardware for inference and falls back to CPU paths where WebGPU is unavailable. All processing happens on the user's machine.
The architectural choice matters for B2B SaaS marketing sites because it changes the trust story. Until this week, putting an AI explainer on a pricing page meant sending the visitor's prompt to a third-party API somewhere in the cloud. With on-device AI, the visitor's input never leaves their browser. For B2B audiences that read their privacy policy before signing up, that is a substantive change. The trade-off is feature ceiling. Gemini Nano is small. It is not GPT-5 or Claude Opus. Tasks that require large-context reasoning still need a server-side model.
Why does on-device AI matter for B2B SaaS marketing sites?
On-device AI matters for three reasons: zero per-request cost, zero data egress, and zero API key management on a static site. For a B2B SaaS marketing site running on Webflow, those three properties collapse the cost and security conversation that previously gated every AI feature. Features that were not viable at scale on a paid API become viable on browser-native inference, even at higher unit volume.
The cost shift is the one I see clearest in client conversations. A B2B SaaS pricing page that draws 5,000 unique visits a week with a 20 percent AI-explainer engagement rate generates 1,000 inference calls per week. At even modest token pricing on a hosted API, that is a real monthly bill. On-device, the bill is zero. The complete instrumentation discipline I described in the Cerebras IPO and inference cost piece still applies. The number you stop spending matters as much as the number you start spending.
What can the Prompt API actually do in May 2026?
In May 2026, the Prompt API handles short-context generation, summarization, classification, simple Q&A, image description, and structured output that conforms to a JSON schema. It does not handle long-document reasoning, large-codebase analysis, or tasks that require external tool calls. The practical envelope is roughly the same as a small open-source LLM running locally on a laptop.
For Webflow site features, that envelope covers a lot. A FAQ rewriter, a tone adjuster on form fields, a smart placeholder for a search box, an alt-text suggester, a simple chat affordance scoped to product positioning. All of those fit comfortably. What does not fit is a full conversational agent that reads documentation, follows links, and stitches together multi-step answers. For those features, the server-side path with a hosted API remains the right architecture, and the on-device path is a fallback or a privacy-tier alternative.
How does the Prompt API compare to OpenAI and Claude APIs for site features?
OpenAI and Claude APIs win on capability. Their models are larger, support long context, and handle tool calls. The Prompt API wins on cost, privacy, and latency. For features that fit inside Gemini Nano's envelope, the Prompt API is materially cheaper to run and faster to respond because there is no network round-trip. For features that exceed the envelope, the hosted APIs remain the right choice.
The practical pattern I am moving toward at Phoenix Studio is a tiered approach. Default to the Prompt API where it works. Fall back to a hosted API for capability that does not. Detect Chrome 148 plus on the client and serve the appropriate UI path. This is similar to the broader pattern of AI moving onto the client surface that I wrote about earlier this year. The client now has real capability, but the server still owns the heavy work, and the user gets the right experience based on what their browser can do.
Should you ship Chrome-only AI features on a production marketing site?
Yes, with a fallback. Chrome's global desktop and mobile share lets a Chrome-only feature reach a meaningful fraction of visitors, but a Safari user, a Firefox user, or a visitor on a Chromebook stuck on an older release will not see it without a graceful degradation path. The right pattern is to detect API availability, render the AI feature on capable browsers, and render the static equivalent elsewhere.
The degradation path is straightforward to build in Webflow custom code. Feature-detect window.ai or the specific Prompt API surface on page load. Conditionally show the AI UI block. Hide it otherwise. The static fallback should answer the user's underlying need without the AI feature, not just say "AI unavailable." If the AI block is a smart FAQ search, the fallback is the regular FAQ list. If the AI block is a tone adjuster on a contact form, the fallback is the plain form. The user always gets a working page. The AI is an enhancement.
How do you add the Prompt API to a Webflow site?
Add the Prompt API to a Webflow site through the Page Settings custom code section or a site-wide embed. Feature-detect window.ai.languageModel.capabilities or the current Prompt API surface, request a session if available, and call session.prompt with the user input. Render the response into a target div. The full Chrome Developers documentation provides current API signatures.
The Webflow-specific path is to add the script in the Before Body Tag section of the page where the AI feature lives, target a specific div ID created in the Designer, and bind the input from a Webflow form or a custom input field. The site does not need a CMS field for the AI output. The output is generated client-side at runtime, which means no API quota and no caching strategy. For instrumentation, count successful prompts with a lightweight analytics event so you can read engagement and detect failures over time.
Where does the Prompt API break and what fallbacks should you build?
The Prompt API breaks on non-Chromium browsers, on older Chrome versions before 148, and on hardware that cannot download or run the Gemini Nano model. It can also break on first use while the model downloads in the background. The fallback should answer the user's underlying need without the AI feature, which means designing the static path first and the AI path as an enhancement.
Two specific failure modes need handling. First, the API surface is present but the model has not finished downloading. Detect this by checking session creation latency and falling back to the static path if creation exceeds a threshold. Second, the prompt itself fails or returns an empty response. Wrap the prompt call in a try-catch and render the static fallback on error. These two failure modes account for nearly all real-world failures I have observed in early testing, and handling them keeps the perceived reliability of the page high even on the first visit before the model is fully resident.
Is the Prompt API safe for privacy-sensitive B2B audiences?
Yes, with caveats. Because prompts run on the user's device, the prompt text never leaves the browser. This is a real privacy improvement over hosted API calls. The caveats are model output quality and the model download itself. The model is downloaded from Google, which means the first visit includes a Google interaction even if subsequent prompts do not.
For B2B audiences in regulated industries, document the on-device nature of the inference in your privacy policy and on the AI feature itself. A short user-facing note like "This response runs locally in your browser" is the right disclosure. For internal AI features inside a B2B SaaS product, the same architecture works and removes a significant compliance line item. The compliance teams I have talked to in the last three months are notably more comfortable with on-device AI than with hosted-API AI for non-trivial use cases. This is a real shift, and it changes which features ship.
When will Safari and Firefox catch up on on-device AI?
Both browsers have on-device AI work underway, but neither has shipped a stable Prompt API equivalent as of May 2026. Apple's Safari roadmap includes WebKit-side machine learning work tied to Apple Intelligence on the device side, and Mozilla has experimental ML APIs behind flags. A realistic timeline for Safari and Firefox parity is six to twelve months from now, possibly longer.
The implication for B2B SaaS sites in the meantime is that the Chrome-only window is real and worth using rather than waiting on. Ship the AI feature with a clean fallback now, capture the engagement and instrumentation data over the next quarter, and refine the feature as Safari and Firefox add their equivalents. By the time cross-browser parity arrives, your Chrome-served feature will have nine months of usage data to inform the cross-browser rollout. Waiting for parity costs the data, and the data is the actual moat. As the AI freshness and citation decay piece argues, content and feature recency compound, and the longer you wait the smaller the compounding advantage.
If you are scoping an AI feature on a Webflow marketing site and want to talk through whether the Prompt API or a hosted API is the right architecture for it, drop me a line and tell me what the feature is trying to do for the visitor. I will share the decision tree I am using at Phoenix Studio this quarter. Let's chat.
Get your website crafted professionally
Let's create a stunning website that drive great results for your business
Get in Touch
This form help clarify important questions in advance.
Please be as precise as possible as it will save our time.