ACM

Non classé

Cohere’s open-weight ASR model hits 5.4% word error rate — low enough to replace speech APIs in production pipelines

Enterprises building voice-enabled workflows have had limited options for production-grade transcription: closed APIs with data residency risks, or open models that trade accuracy for deployability. Cohere’s new open-weight ASR model, Transcribe, is built to compete on all four key differentiators — contextual accuracy, latency, control and cost. Cohere says that Transcribe outperforms current leaders on …

Cohere’s open-weight ASR model hits 5.4% word error rate — low enough to replace speech APIs in production pipelines Read More »

When AI turns software development inside-out: 170% throughput at 80% headcount

Many people have tried AI tools and walked away unimpressed. I get it — many demos promise magic, but in practice, the results can feel underwhelming. That’s why I want to write this not as a futurist prediction, but from lived experience. Over the past six months, I turned my engineering organization AI-first. I’ve shared …

When AI turns software development inside-out: 170% throughput at 80% headcount Read More »

IndexCache, a new sparse attention optimizer, delivers 1.82x faster inference on long-context AI models

Processing 200,000 tokens through a large language model is expensive and slow: the longer the context, the faster the costs spiral. Researchers at Tsinghua University and Z.ai have built a technique called IndexCache that cuts up to 75% of the redundant computation in sparse attention models, delivering up to 1.82x faster time-to-first-token and 1.48x faster …

IndexCache, a new sparse attention optimizer, delivers 1.82x faster inference on long-context AI models Read More »

The consequential AI work that actually moves the needle for enterprises

Presented by OutSystems After two years of flashy AI demos, rushed agent prototypes, and breathless predictions, enterprise technology leaders are striking a more pragmatic tone in 2026. In a recent webinar hosted by OutSystems, a panel of software executives and enterprise practitioners made the case that the most consequential AI work happening now is focused …

The consequential AI work that actually moves the needle for enterprises Read More »

Intercom’s new post-trained Fin Apex 1.0 beats GPT-5.4 and Claude Sonnet 4.6 at customer service resolutions

Intercom is taking an unusual gamble for a legacy software company: building its own AI model. The 15-year-old, Dublin, Ireland-based massive customer service platform announced Fin Apex 1.0 on Thursday, a small, purpose-built AI model that the company claims outperforms leading frontier models from OpenAI and Anthropic on the metrics that matter most for customer …

Intercom’s new post-trained Fin Apex 1.0 beats GPT-5.4 and Claude Sonnet 4.6 at customer service resolutions Read More »

Mistral AI just released a text-to-speech model it says beats ElevenLabs — and it’s giving away the weights for free

The enterprise voice AI market is in the middle of a land grab. ElevenLabs and IBM announced a collaboration just this week to bring premium voice capabilities into IBM’s watsonx Orchestrate platform. Google Cloud has been expanding its Chirp 3 HD voices. OpenAI continues to iterate on its own speech synthesis. And the market underpinning …

Mistral AI just released a text-to-speech model it says beats ElevenLabs — and it’s giving away the weights for free Read More »

Oracle converges the AI data stack to give enterprise agents a single version of truth

Enterprise data teams moving agentic AI into production are hitting a consistent failure point at the data tier. Agents built across a vector store, a relational database, a graph store and a lakehouse require sync pipelines to keep context current. Under production load, that context goes stale.  Oracle, whose database infrastructure runs the transaction systems …

Oracle converges the AI data stack to give enterprise agents a single version of truth Read More »

Google’s new TurboQuant algorithm speeds up AI memory 8x, cutting costs by 50% or more

As Large Language Models (LLMs) expand their context windows to process massive documents and intricate conversations, they encounter a brutal hardware reality known as the “Key-Value (KV) cache bottleneck.” Every word a model processes must be stored as a high-dimensional vector in high-speed memory. For long-form tasks, this “digital cheat sheet” swells rapidly, devouring the …

Google’s new TurboQuant algorithm speeds up AI memory 8x, cutting costs by 50% or more Read More »

How xMemory cuts token costs and context bloat in AI agents

Standard RAG pipelines break when enterprises try to use them for long-term, multi-session LLM agent deployments. This is a critical limitation as demand for persistent AI assistants grows. xMemory, a new technique developed by researchers at King’s College London and The Alan Turing Institute, solves this by organizing conversations into a searchable hierarchy of semantic …

How xMemory cuts token costs and context bloat in AI agents Read More »