There is a version of the AI future where every intelligent feature in every product runs on metered API calls to a handful of hyperscaler models. The unit economics are fine until they aren’t, the data residency story is complicated, and your product roadmap has a quiet dependency on someone else’s pricing page.
Gemma 4, released April 2, 2026 under the Apache 2.0 license, is Google’s answer to that future. It is a family of four open-source multimodal models: E2B and E4B for edge and mobile, 26B Mixture of Experts (MoE) for balanced workloads, and 31B Dense for compute-intensive tasks. All four handle text and image input. The edge variants also support audio and video. Context windows run up to 256K tokens, with native support for over 140 languages.
And the whole thing is Apache 2.0. Embed it, fine-tune it, redistribute it, ship it inside your product. No royalties, no usage-based licensing, no call home.
What This Means If You Build Software
For ISVs running on Google Cloud, Gemma 4 opens two distinct opportunities. The first is operational. High-volume AI workloads that currently drain budget on per-token API calls can move to self-hosted Gemma 4 on Vertex AI or on your own infrastructure. The model handles function-calling, structured JSON output, and agentic workflows natively, so the scaffolding you built around other models largely transfers. You get Gemini-class capability without Gemini-class inference costs at scale.
The second is about what you ship. If your product serves regulated industries (healthcare, finance, defense, government), your customers have been telling you for years that they cannot send sensitive data to a cloud API. Gemma 4 changes that answer. You can now embed a frontier-capable multimodal model directly inside your product, deployed into your customer’s VPC or on-premise environment, fully air-gapped if required. Vertex AI’s Sovereign Cloud compliance and Model Garden fine-tuning toolchain handle the MLOps side without you building it from scratch.
The Competitive Reality
Meta Llama 4 is the honest comparison. Also Apache 2.0, also multimodal, also capable. If your engineers are already comfortable with Llama’s ecosystem, that comfort is real and worth acknowledging. The differentiation for GCP-based ISVs is in the deployment story: Gemma 4 on Vertex AI comes with managed fine-tuning, Sovereign Cloud compliance, and Model Garden integration out of the box. Llama 4 on AWS or Azure requires you to assemble that stack yourself.
Microsoft Phi-4 is competitive at smaller sizes but is not multimodal across all variants. Mistral’s open models are strong but lack the native GCP deployment integration that matters when your customers are already in Google Cloud.
If you are an ISV selling into regulated industries, the honest question to ask yourself is: “Which AI features have we held back because of data residency requirements or inference cost?” Gemma 4 is the answer to both at once. Ship it inside your product, in your customer’s environment, with no API dependency and no per-token bill. That is not a small thing.
