OpenAI drops GPT-5.5 for complex coding and research

TLDR: OpenAI drops GPT-5.5 and new automation tooling, DeepSeek aims at 1M-token agents, and researchers show an autonomous hacking agent that should light a fire under your security model.

Top Signals

OpenAI launches GPT-5.5 focused on complex tool-using agents

OpenAI released GPT-5.5, described as its “smartest model yet,” tuned for complex coding, research, and data workflows across tools as of 2026-04-24. VentureBeat reports GPT-5.5 scores 82.7 on Terminal-Bench 2.0, narrowly ahead of Anthropic Claude Mythos Preview at 82.0, and significantly ahead of Anthropic Claude Opus 4.7 at 69.4 and Google Gemini 3.1 Pro at 68.5.

For agent builders, the story is that GPT-5.5 appears optimized for long, tool-heavy sessions: strong FrontierMath scores, solid OSWorld-Verified (78.7), and good performance on CyberGym (81.8) hint at more reliable structured reasoning and environment control. It is not blowing away Mythos across the board, and BrowseComp numbers show that web-browsing quality is still competitive rather than dominant, so you will want to test it against your current stack.

Expect early ecosystem churn as frameworks and gateways wire in GPT-5.5 as a default for coding agents and research copilots. If your workloads are benchmark sensitive, the gaps here are narrow enough that latency, cost, and safety tuning will probably matter more than raw scores.

Also covered by: VentureBeat on benchmarks.

OpenAI Academy adds plugins, skills, and automations training

OpenAI published new OpenAI Academy guides on plugins, skills, and automations in Codex as of 2026-04-24, aimed at wiring models into tools and repeatable workflows. The plugins and skills guide walks through connecting Codex to external services, accessing data, and defining reusable skills for multi-step tasks.

The automations guide focuses on schedules and triggers so Codex can run recurring reports, summaries, and other workflows without manual prompts. For teams already invested in OpenAI’s ecosystem, this is essentially the “official playbook” for turning chat interfaces into durable agents that orchestrate tools and data.

If you are standardizing on OpenAI for production, these docs likely reflect how OpenAI expects you to structure agent architectures over the next year. The tradeoff is obvious: you get batteries-included workflows, but you lean harder into vendor lock-in compared with framework-agnostic agent stacks.

Also covered by: OpenAI Academy automations guide.

Researchers demo Zealot, an autonomous cloud hacking agent

Researchers introduced Zealot, an AI hacking agent that autonomously breached a cloud environment and exfiltrated data with minimal human oversight as of 2026-04-24. Zealot uses a supervisor agent that delegates to three specialized sub-agents for infrastructure reconnaissance and network mapping, web app exploitation and credential extraction, and cloud security operations.

The key detail: Zealot was able to scan the network, discover connected virtual machines and services, adapt its strategy, and chain attacks without a prewritten playbook. That looks very similar to how many of you are architecting legitimate supervisor plus worker agent systems, which suggests the offensive and defensive stacks are going to co-evolve quickly.

Security leaders and infra teams running agentic systems should treat this as a near term design constraint, not a distant future risk. Start threat modeling your own multi-agent setups, harden cloud blast radii, and assume red teams will have Zealot-like kits within a year.

Also covered by: SecurityWeek.

DeepSeek-V4: a million-token context that agents can actually use DeepSeek released two mixture of experts checkpoints on Hugging Face with 1M token context: DeepSeek-V4-Pro at 1.6T total parameters with 49B active, and DeepSeek-V4-Flash at 284B total with 13B active. The focus is long-horizon agent workflows rather than state of the art benchmarks.
Deepseek V4 on AI Gateway Vercel AI Gateway now exposes DeepSeek V4 Pro and Flash with 1M token context by default, targeting agentic coding, formal math, long-horizon workflows, and long-form document generation across MCP and popular agent frameworks.
Show HN: Agent Vault – Open-source credential proxy and vault for agents Infisical launched Agent Vault, an open source HTTP credential proxy and vault for agents, to give services access without exposing raw secrets. Worth a look if you are struggling with per-agent or per-tool credential isolation.
Show HN: Run coding agents in microVM sandboxes instead of your host machine SuperHQ runs coding agents inside isolated microVM sandboxes with per-agent Debian environments, tmpfs overlays, and diff-based change approval so your host never gets touched. API keys stay outside the sandbox, which is appealing for high risk refactoring tasks.
Amazon SageMaker AI now supports optimized generative AI inference recommendations Amazon SageMaker AI adds automated inference configuration recommendations with validated performance metrics so platform teams can tune generative workloads without hand-rolling every deployment profile.
How to Use Transformers.js in a Chrome Extension Hugging Face shares practical lessons from building a Gemma 4 E2B powered Chrome extension with Transformers.js, including Manifest V3 runtime constraints, model loading strategies, and messaging patterns for local AI features.
Here’s how our TPUs power increasingly demanding AI workloads. Google walks through the evolution of Tensor Processing Units, noting the latest generation can hit 121 exaflops and double the bandwidth of previous chips, which matters if you are planning for large scale training or inference capacity.
Show HN: How LLMs Work – Interactive visual guide based on Karpathy's lecture A single page interactive explainer derived from Andrej Karpathy’s intro lecture that you can use to onboard new teammates to transformer internals or sanity check your own mental model.