ACM

Non classé

You thought the generalist was dead — in the ‘vibe work’ era, they’re more important than ever

Not long ago, the idea of being a “generalist” in the workplace had a mixed reputation. The stereotype was the “jack of all trades” who could dabble in many disciplines but was a “master of none.” And for years, that was more or less true.  Most people simply didn’t have access to the expertise required …

You thought the generalist was dead — in the ‘vibe work’ era, they’re more important than ever Read More »

Testing autonomous agents (Or: how I learned to stop worrying and embrace chaos)

Look, we’ve spent the last 18 months building production AI systems, and we’ll tell you what keeps us up at night — and it’s not whether the model can answer questions. That’s table stakes now. What haunts us is the mental image of an agent autonomously approving a six-figure vendor contract at 2 a.m. because …

Testing autonomous agents (Or: how I learned to stop worrying and embrace chaos) Read More »

Three ways AI is learning to understand the physical world

Large language models are running into limits in domains that require an understanding of the physical world — from robotics to autonomous driving to manufacturing. That constraint is pushing investors toward world models, with AMI Labs raising a $1.03 billion seed round shortly after World Labs secured $1 billion. Large language models (LLMs) excel at …

Three ways AI is learning to understand the physical world Read More »

Mistral’s Small 4 consolidates reasoning, vision and coding into one model — at a fraction of the inference cost

Enterprises that have been juggling separate models for reasoning, multimodal tasks, and agentic coding may be able to simplify their stack: Mistral’s new Small 4 brings all three into a single open-source model, with adjustable reasoning levels under the hood. Small 4 enters a crowded field of small models — including Qwen and Claude Haiku …

Mistral’s Small 4 consolidates reasoning, vision and coding into one model — at a fraction of the inference cost Read More »

Scale AI launches Voice Showdown, the first real-world benchmark for voice AI — and the results are humbling for some top models

Voice AI is moving faster than the tools we use to measure it. Every major AI lab — OpenAI, Google DeepMind, Anthropic, xAI — is racing to ship voice models capable of natural, real-time conversation. But the benchmarks used to evaluate those models are largely still running on synthetic speech, English-only prompts, and scripted test …

Scale AI launches Voice Showdown, the first real-world benchmark for voice AI — and the results are humbling for some top models Read More »

Anthropic just shipped an OpenClaw killer called Claude Code Channels, letting you message it over Telegram and Discord

The hit open source autonomous AI agent OpenClaw may have just gotten mogged by Anthropic. Today, Anthropic announced Claude Code Channels, a way to hook up its own powerful Claude Code AI agentic harness to a human user’s Discord or Telegram messaging applications, letting them message Claude Code directly whenever they want while on the …

Anthropic just shipped an OpenClaw killer called Claude Code Channels, letting you message it over Telegram and Discord Read More »

Cursor’s new coding model Composer 2 is here: It beats Claude Opus 4.6 but still trails GPT-5.4

Cursor, a San Francisco AI coding platform from startup Anysphere valued at $29.3 billion, has launched Composer 2, a new in-house coding model now available inside its agentic AI coding environment, and it offers drastically improved benchmarks from its prior in-house model. It’s also launching and making Composer 2 Fast, a higher-priced but faster variant, …

Cursor’s new coding model Composer 2 is here: It beats Claude Opus 4.6 but still trails GPT-5.4 Read More »

Why enterprises are replacing generic AI with tools that know their users

The future of AI isn’t just agentic; it’s deep personalization.  Rather than simple recommender systems that correlate user behavior to identify patterns and apply those to individual workflows, large language models (LLMs) and AI agents can analyze users directly to create deeply personalized experiences.  It’s this kind of aggressive customization users are increasingly demanding — …

Why enterprises are replacing generic AI with tools that know their users Read More »

Meta’s rogue AI agent passed every identity check — four gaps in enterprise IAM explain why

A rogue AI agent at Meta took action without approval and exposed sensitive company and user data to employees who were not authorized to access it. Meta confirmed the incident to The Information on March 18 but said no user data was ultimately mishandled. The exposure still triggered a major security alert internally. The available …

Meta’s rogue AI agent passed every identity check — four gaps in enterprise IAM explain why Read More »

Xiaomi stuns with new MiMo-V2-Pro LLM nearing GPT-5.2, Opus 4.6 performance at a fraction of the cost

Chinese electronics and car manufacturer Xiaomi surprised the global AI community today with the release of MiMo-V2-Pro, a new 1-trillion parameter foundation model with benchmarks approaching those of U.S. AI giants OpenAI and Anthropic, but at around a seventh or sixth the cost when accessed over proprietary API — and importantly, sending less than 256,000 …

Xiaomi stuns with new MiMo-V2-Pro LLM nearing GPT-5.2, Opus 4.6 performance at a fraction of the cost Read More »