Nyx targets AI agent failures with autonomous test harness

TLDR: New tools to break your agents before users do, wire your data into Sheets, and push agentic automation onto factory floors.

Nyx launches as autonomous failure-hunting harness for AI agents

Nyx is a new autonomous testing harness for AI agents that focuses on logic bugs, reasoning failures, and red‑team issues that traditional test suites miss. The team built it after repeatedly hitting agent-specific failure modes that manual QA and static benchmarks never surfaced, including instruction-following regressions, prompt injection, jailbreaks, and tool hijacking.

For anyone shipping production agents or tools that call tools, this is trying to be the equivalent of fuzzing plus security testing for LLM workflows. The key idea: use another agent to adaptively probe your system instead of static test cases. As of 2026-04-20 there are no public benchmarks, so you will need to evaluate Nyx by how easily it plugs into your current orchestrator and by the quality of the failure reports.

Schneider and Microsoft pitch agentic “Industrial Copilot” for factories

Schneider Electric and Microsoft are showcasing a next generation “Industrial Copilot” at Hannover Messe that uses specialized AI agents coordinated by an AI orchestrator to automate manufacturing workflows. Schneider Electric claims the Azure AI powered stack collapses tool and team silos into a closed-loop, software-defined workflow, where production changes that once took weeks can be implemented in hours.

For industrial engineers and systems integrators, this is a signal that large vendors are standardizing on agentic patterns, not just single chat interfaces. The orchestrator coordinates design decisions, maintains end-to-end traceability, and adapts to live plant data, at least in the marketing copy. As of 2026-04-20 this is still an announcement, so you should look for concrete APIs, safety controls for automated changes, and reference architectures before betting your next line upgrade on it.

Simon Willison wires Datasette queries straight into Google Sheets

Simon Willison published patterns for using SQL functions in Google Sheets to fetch data directly from a Datasette instance as of 2026-04-20. He shows how to combine the IMPORTDATA function, custom named functions, and Datasette’s query URLs so non-engineers can run live SQL-backed analysis from a spreadsheet.

If you are building internal agents or dashboards on top of Datasette, this gives you a lightweight way to expose vetted queries without standing up a full BI stack. The tradeoff: Sheets latency and rate limits, plus the need to lock down which queries are allowed. For many teams, though, a few named functions in a shared Sheet is enough to get product managers or analysts playing with real data while your agent stack matures.

Claude Token Counter, now with model comparisons Simon Willison updated his Claude Token Counter to compare tokenization across models, highlighting that Anthropic Claude Opus 4.7 uses a different tokenizer that can affect cost estimates and truncation behavior.
Some Way Stations In The AI 2027 Road Map A speculative roadmap piece discusses chained agents generating synthetic data for successor agents, “neuralese,” hive-mind style coordination, and iterated distillation, more as futurism than something you can ship today.
Gemma 4 just replaced my whole local LLM stack MakeUseOf reports that Google Gemma 4 is fast and capable enough on consumer hardware to displace a pile of smaller local models, although the author still prefers major cloud chatbots for high-stakes tasks.
Show HN: Run TRELLIS.2 Image-to-3D generation natively on Apple Silicon A developer ported Microsoft TRELLIS.2, a 4B parameter image-to-3D model, to Apple Silicon by replacing CUDA specific kernels with pure PyTorch implementations so you can experiment with 3D generation locally. (GitHub repo)
Show HN: A lightweight way to make agents talk without paying for API usage A lightweight approach for inter-agent communication that avoids paid API calls, likely interesting if you are prototyping conversational multi-agent systems on a budget.

Nyx targets AI agent failures with autonomous test harness

Nyx launches as autonomous failure-hunting harness for AI agents

Schneider and Microsoft pitch agentic “Industrial Copilot” for factories

Simon Willison wires Datasette queries straight into Google Sheets

Quick Hits

More from the Digest