Google’s Gemma 4 Puts Frontier Models On A Single GPU

TLDR: Google shrinks frontier models to a single GPU, Microsoft quietly ships its own Foundry models, and Simon Willison ships better tooling for LLM ops.

Google’s Gemma 4 runs “frontier-class” AI on one GPU

Google’s Gemma 4 is positioned as a frontier-class reasoning model that runs on a single GPU, under an Apache 2.0 license, as of 2026-04-05. The pitch is simple: you can keep sensitive data and agent workloads on infrastructure you already own, instead of moving everything to hyperscalers.

For teams building production agents inside regulated or paranoid enterprises, this is a big unlock. Single GPU deployment means you can prototype serious reasoning systems on commodity boxes, then push them into on-prem clusters without a giant capex request. The Gemma ecosystem already logged roughly 400 million downloads for earlier versions, so there is a real developer base and tooling surface to tap into, although we still lack hard benchmarks for Gemma 4.

The next few weeks will be about real world evals: throughput on typical 24 GB cards, latency under agentic workflows, and how Gemma 4 behaves under retrieval augmented generation (RAG) and tool use. Read more →

Microsoft ships 3 in-house models on Azure Foundry

Microsoft has released three new in-house artificial intelligence models on the Microsoft Foundry platform, signaling a deliberate step away from full dependence on OpenAI as of 2026-04-05. These models cover voice, image, and text style workloads and are exposed through Azure, sitting alongside third party options.

For anyone building on Azure today, the big shift is strategic rather than purely technical. A renegotiated deal in October already gave Microsoft more independence, and these launches show they are now actively filling key workloads with their own models. That can translate into better pricing leverage, more stable long term contracts, and features tuned around Microsoft’s cloud primitives, but you also inherit a second model family to evaluate and monitor.

Expect a slow migration pattern: most teams will keep OpenAI for frontier experiments and start routing high volume, “boring but big” workloads to the new Microsoft models once they have clear benchmarks and incident history. Read more →

Simon Willison’s scan-for-secrets 0.2 improves pre-share scanning

Simon Willison released version 0.2 of the scan-for-secrets command line tool, which scans for secrets in files before you share them publicly as of 2026-04-05. The latest release now streams results as they are found, instead of waiting until the end of a scan, and it adds both multi directory support and a new file specific option.

For teams who ship agents, prompt templates, or notebooks to clients and GitHub, this type of pre flight check is low friction insurance. Streaming output matters when you point it at a messy monorepo or a shared research directory, because people are more likely to fix issues when they see them show up in real time. The new options, including multiple uses of the directory flag and a file flag, make it much easier to integrate into pre commit hooks and CI jobs.

Given how often LLM related repos accidentally leak keys, service URLs, and even customer data in logs, this sort of tool is becoming table stakes for any agent platform that syncs artifacts to the cloud. Read more →

Quick Hits

scan-for-secrets 0.1.1 Minor follow up release that documents escaping schemes and simplifies internals by removing a redundant representation path. Useful context if you are embedding scan-for-secrets into your own tooling.
Show HN: mailtrim – find what’s actually filling your Gmail inbox Local tool that ranks Gmail senders by storage impact instead of email count and offers confidence scored bulk delete with a 30 day undo window. Handy if your AI logs and alerts live in Gmail and you are running out of space.
langgraph 1.1.5 New release adds richer runtime execution information and remote build support in the langgraph deploy CLI. If you are orchestrating multi agent LangGraph workflows in production, this helps with debugging and CI integration.
Cybersecurity M&A Round-Up: Databricks launches Lakewatch agentic SIEM Databricks introduced Lakewatch, an “agentic SIEM” for defending against malicious AI agents, alongside acquisitions of Antimatter and SiftD.ai. Worth watching if you need observability and security around agent behavior on data lakes.
research-llm-apis 2026-04-04 Simon Willison is revisiting his LLM Python library and CLI to better handle evolving HTTP APIs across many model vendors. If you abstract over multiple providers today, his notes are a useful map of the changing surface area.
Show HN: Ownscribe – local meeting transcription, summarization and search Ownscribe is a Python CLI for fully local meeting transcription and search. Good fit if you want to feed meeting outputs into internal agents without sending audio to external services.
Anthropic to limit using third party harnesses with Claude subscriptions Anthropic will no longer let Claude subscription limits be used by tools like OpenClaw unless “extra usage” is enabled. If your internal tools proxy through Claude accounts, budget for separate metering.
ollama v0.20.2 Small quality of life update that changes the default app home view to open a new chat instead of the model launcher. Nice if you use Ollama as your primary local chat front end.
@ai-sdk/[email protected] Patch bumps the Vercel AI SDK dependency to [email protected]. If you ship Svelte based AI front ends, keep versions in sync to avoid subtle type and API drift.