Running Gemma 4 locally with LM Studio and Claude Code

TLDR: Local Gemma 4 is getting very real on both desktop and in-browser, and LangChain dropped a concrete playbook for continual learning in agents.

Running Gemma 4 locally with LM Studio and Claude Code

A new guide shows how to run Google Gemma 4 models entirely locally using LM Studio's new headless CLI plus Anthropic Claude Code as the control plane. The setup uses LM Studio for model hosting and inference, while Claude Code orchestrates prompts, tools, and workflows around the model. The walkthrough covers both chat and agentic patterns, all without calling a hosted API.

For anyone building security sensitive or cost conscious agents, this is a clean blueprint for a local stack that still feels "cloud native." It also demonstrates how far editor based runtimes like Claude Code have come as an agent shell. The tradeoff is that you are limited by your hardware and by LM Studio's current feature surface, so this is not a drop in for high scale production yet.

As of 2026-04-06, this is one of the clearer examples of how to wire modern small frontier models into practical local agents.
Read more →

Gemma Gem runs Gemma 4 as a Chrome extension agent

Gemma Gem is a Chrome extension that embeds Google Gemma 4 2B directly in the browser using WebGPU and exposes it as an overlay agent on every page. The model runs in an offscreen document and gets a suite of tools: read page content, take screenshots, click elements, type, scroll, and execute JavaScript against the current tab.

For agent builders, this is a concrete pattern for a fully local web agent that does not rely on an external API or backend. You are limited to a 2 billion parameter model, so this is best for simple page questions, basic automations, and experiments in tool use, not complex reasoning. The extension also includes a visible thinking mode that surfaces chain of thought, which is helpful for debugging tool calls and planner logic.

As of 2026-04-06, performance will depend heavily on client GPU and WebGPU support, so expect a wide variance across users and devices.
Read more →

LangChain outlines three layers of continual learning for agents

The latest LangChain blog post on continual learning argues that most conversations focus too narrowly on updating model weights and ignores two other layers: the harness and the context. LangChain breaks learning for AI agents into three tiers: the underlying model, the agent harness that wraps it, and the dynamic context such as memories, tools, and knowledge bases.

For people shipping production agents, this framing gives you a roadmap for improving systems over time without retraining. You can evolve your agent harness logic, routing, and evaluators, and you can refine long term memory and retrieval augmented generation (RAG) first, then worry about fine tuning. The post is opinionated but grounded in patterns they are actually seeing in customer deployments.

As of 2026-04-06, there are no new benchmarks here, but it is a useful conceptual guide if you are designing agents to learn safely under real traffic.
Read more →

Quick Hits

Show HN: DocMason – Agent Knowledge Base for local complex office files
An "agent native" knowledge base for complex research across office docs that runs inside Anthropic Claude Code with a "the repo is the app, Codex is the runtime" philosophy. Worth a look if you want local document orchestration for deep technical work.
Google's Gemma 4 Runs Frontier AI On A Single GPU
Overview of the Gemma 4 family with focus on native function calling, structured JSON output, and up to 256k context windows that enable more capable autonomous agents on a single GPU as of 2026-04-06. Also covered by: Yahoo News duplicate entry
AI has arrived in auditing. Are regulators ready?
Financial Times looks at EY's new AI audit platform and the UK Financial Reporting Council's first guidance on generative and agentic AI in audits. If you work on finance agents, this is an early signal of how regulators will frame risk.
scan-for-secrets 0.1
Simon Willison releases a Python tool to scan files for API keys and other secrets before sharing, prompted by publishing Claude Code transcripts. Handy for anyone logging or exporting detailed agent traces.
Eight years of wanting, three months of building with AI
A long form writeup on building syntaqlite, a high fidelity devtool for SQLite, as an agentic engineering project, plus context from Simon Willison on why it is a standout example.
Syntaqlite Playground
A companion note from Simon Willison revisiting syntaqlite via a playground that explores how AI assists in crafting and debugging complex SQL workflows.
Industrial policy for the Intelligence Age
OpenAI outlines a high level, people first industrial policy vision for advanced AI, focused on opportunity, shared prosperity, and resilient institutions. High level but useful background if your work brushes against policy or public sector deployments.

Running Gemma 4 locally with LM Studio and Claude Code