Needle shrinks tool-calling agents to a 26M model
For engineers, designers & product people. Stay up to date with free daily digest.
TLDR: Tiny open-source tool callers, enterprise agent stacks, and smarter document workflows all took notable steps forward as of 2026-05-13.
Needle distills Gemini-style tool calling into a 26M model
Needle is a new 26 million parameter open-source function-calling model that targets 6000 tokens per second prefill and 1200 tokens per second decode on consumer hardware, according to Cactus Compute. The authors argue that most agentic experiences reduce to retrieval and tool orchestration, so massive large language models are overkill when you only need structured tool selection.
For agent builders who care about low latency and on-device deployment, Needle hints at a different scaling path: small, specialized controllers on phones and edge devices, with heavier models in the background if needed. There are no public benchmarks against models like GPT-4o mini or Gemini Flash yet, so you should treat the performance claims as promising but early as of 2026-05-13.
Nature validates LingualAI against certified human interpreters
A new Nature study prospectively evaluates LingualAI, an AI-based real-time translation system, against certified human interpreters across 12 translation quality domains. The evaluation scores adequacy of meaning, terminology accuracy, completeness, cultural appropriateness, grammar, vocabulary, plus voice-related metrics like fluency, clarity, prosody, and pacing on 5 point Likert scales.
This is one of the more rigorous head to head tests of AI simultaneous translation in a clinical context, where mistakes have real consequences. If you are building agentic workflows in healthcare or any regulated environment, this kind of peer reviewed evidence will be what compliance and risk teams ask for. The full paper breaks down domain level scores and clinician confidence, so you can see where AI still lags humans as of 2026-05-13.
AWS adds agent-based schema generation for document workflows
Amazon Web Services introduced a multi document discovery feature for its Intelligent Document Processing (IDP) Accelerator that clusters unknown documents and auto generates schemas. The system uses visual embeddings to group documents by type, then employs agents to propose field structures that are ready to plug into the IDP Accelerator.
If your agents are stuck on brittle, hand written parsing logic for invoices, contracts, or forms, this is worth a look. It turns the messy upfront step of understanding a corpus into a semi-automated pipeline, which is especially useful for teams onboarding many customers with heterogeneous document templates. It still lives squarely in the AWS ecosystem and assumes you are fine with Bedrock plus IDP Accelerator as of 2026-05-13.
Quick Hits
Show HN: Statewright – Visual state machines that make AI agents reliable Visual state machines for structuring agent workflows so you can control transitions instead of letting an LLM freestyle everything. Good fit if your current agents are brittle and hard to debug.
SAP and Anthropic Plan to Bring Claude to SAP Business AI Platform SAP and Anthropic will build agentic workflows that operate across SAP systems via Anthropic’s Model Context Protocol, targeting industries like public sector, healthcare, and utilities.
How Amazon Finance streamlines regulatory inquiries by using generative AI on AWS Amazon Finance teams use Amazon Bedrock plus dedicated knowledge bases to triage and answer regulatory inquiries, a concrete reference for building compliant internal copilots.
Navigating EU AI Act requirements for LLM fine-tuning on Amazon SageMaker AI AWS shows how to track floating point operations (FLOPs) during fine tuning with Fine Tuning FLOPs Meter so you can classify risk levels and produce audit artifacts for EU AI Act compliance.
Google brings agentic AI and vibe-coded widgets to Android Google is pushing Gemini Intelligence deeper into Android so agents can complete tasks across apps, browse, fill forms, and even generate widgets from natural language descriptions.
Create Vercel Firewall rules with natural language Vercel’s dashboard now translates natural language descriptions into Web Application Firewall rules so you describe conditions and let an AI agent output structured security policies.
Trusted Sources for Deployment Protection Vercel introduces Trusted Sources that use OpenID Connect tokens so automated systems can access protected deployments without sharing long lived secrets.
Manage Vercel Firewall in the CLI New
vercel firewallcommands plus a Vercel Firewall skill let agents and scripts inspect and roll out WAF rules from the command line.Show HN: Agentic interface for mainframes and COBOL Hypercubic’s Hopper is an agentic development environment for mainframes, providing an AI layer over COBOL workflows and even giving you a sandbox mainframe account to experiment.
AutoScout24 scales engineering with AI-powered workflows AutoScout24 uses OpenAI Codex and ChatGPT to shorten development cycles and expand AI for code review and internal tools.
How NVIDIA engineers and researchers build with Codex NVIDIA teams pair Codex with GPT 5.5 to move from research ideas to runnable experiments and production systems more quickly.
How finance teams use Codex OpenAI outlines finance specific Codex workflows for monthly business reviews, variance analysis, and model checks using live spreadsheets and models.
More from the Digest
For engineers, designers & product people. Stay up to date with free daily digest.