ACM

Non classé

AI agents fail 63% of the time on complex tasks. Patronus AI says its new ‘living’ training worlds can fix that.

Patronus AI, the artificial intelligence evaluation startup backed by $20 million from investors including Lightspeed Venture Partners and Datadog, unveiled a new training architecture Tuesday that it says represents a fundamental shift in how AI agents learn to perform complex tasks. The technology, which the company calls “Generative Simulators,” creates adaptive simulation environments that continuously …

AI agents fail 63% of the time on complex tasks. Patronus AI says its new ‘living’ training worlds can fix that. Read More »

Why Google’s new Interactions API is such a big deal for AI developers

For the last two years, the fundamental unit of generative AI development has been the “completion.” You send a text prompt to a model, it sends text back, and the transaction ends. If you want to continue the conversation, you have to send the entire history back to the model again. This “stateless” architecture—embodied by …

Why Google’s new Interactions API is such a big deal for AI developers Read More »

Ai2’s Molmo 2 shows open-source models can rival proprietary giants in video understanding

Fresh off releasing the latest version of its Olmo foundation model, the Allen Institute for AI (Ai2) launched its open-source video model, Molmo 2, on Tuesday, aiming to show that smaller, open models can be viable options for enterprises focused on video understanding and analysis. In a press release, the company said Molmo 2 “takes …

Ai2’s Molmo 2 shows open-source models can rival proprietary giants in video understanding Read More »

OpenAI’s GPT Image 1.5 challenges Google at enterprise-grade visuals

OpenAI made its image generation offerings more precise and consistent in its latest update to ChatGPT Images, as more enterprises and brands use AI image generation to help with design visualization.  The updates will roll out to all ChatGPT users and the API as GPT Image 1.5. The company said it’s powered by GPT 5.2, …

OpenAI’s GPT Image 1.5 challenges Google at enterprise-grade visuals Read More »

Black box AI isn’t enough: Why enterprise consulting is moving to grounded models

Presented by SAP In an era where anyone can spin up an LLM, the real differentiator isn’t the AI technology itself, but the institutional knowledge it’s grounded in. Internal and partner consultants leading operational transformation can’t risk hallucinated guidance when their recommendations impact integrated processes across supply chain, manufacturing, finance, and other core functions. “Grounded …

Black box AI isn’t enough: Why enterprise consulting is moving to grounded models Read More »

Zoom says it aced AI’s hardest exam. Critics say it copied off its neighbors.

Zoom Video Communications, the company best known for keeping remote workers connected during the pandemic, announced last week that it had achieved the highest score ever recorded on one of artificial intelligence’s most demanding tests — a claim that sent ripples of surprise, skepticism, and genuine curiosity through the technology industry. The San Jose-based company …

Zoom says it aced AI’s hardest exam. Critics say it copied off its neighbors. Read More »

With 91% accuracy, open source Hindsight agentic memory provides 20/20 vision for AI agents stuck on failing RAG

It has become increasingly clear in 2025 that retrieval augmented generation (RAG) isn’t enough to meet the growing data requirements for agentic AI. RAG emerged in the last couple of years to become the default approach for connecting LLMs to external knowledge. The pattern is straightforward: chunk documents, embed them into vectors, store them in …

With 91% accuracy, open source Hindsight agentic memory provides 20/20 vision for AI agents stuck on failing RAG Read More »

Zencoder drops Zenflow, a free AI orchestration tool that pits Claude against OpenAI’s models to catch coding errors

Zencoder, the Silicon Valley startup that builds AI-powered coding agents, released a free desktop application on Monday that it says will fundamentally change how software engineers interact with artificial intelligence — moving the industry beyond the freewheeling era of “vibe coding” toward a more disciplined, verifiable approach to AI-assisted development. The product, called Zenflow, introduces …

Zencoder drops Zenflow, a free AI orchestration tool that pits Claude against OpenAI’s models to catch coding errors Read More »

Echo raises $35M to secure the enterprise cloud’s base layer — container images — with autonomous AI agents

As enterprises accelerate the deployment of LLMs and agentic workflows, they are hitting a critical infrastructure bottleneck: the container base images powering these applications are riddled with inherited security debt. Echo, an Israeli startup, is announcing a $35 million in Series A funding today (bringing its to-date total to $50 million in funding) to fix …

Echo raises $35M to secure the enterprise cloud’s base layer — container images — with autonomous AI agents Read More »

Bolmo’s architecture unlocks efficient byte‑level LM training without sacrificing quality

Enterprises that want tokenizer-free multilingual models are increasingly turning to byte-level language models to reduce brittleness in noisy or low-resource text. To tap into that niche — and make it practical at scale — the Allen Institute of AI (Ai2) introduced Bolmo, a new family of models that leverage its Olmo 3 models by “bytefiying” …

Bolmo’s architecture unlocks efficient byte‑level LM training without sacrificing quality Read More »