ACM

Non classé

AI denial is becoming an enterprise risk: Why dismissing “slop” obscures real capability gains

Three years ago, Chat GPT was born. It amazed the world and ignited unprecedented investment and excitement in AI. Today, ChatGPT is still a toddler, but public sentiment around the AI boom has turned sharply negative. The shift began when OpenAI released GPT-5 this summer to mixed reviews, mostly from casual users who, unsurprisingly, judged …

AI denial is becoming an enterprise risk: Why dismissing “slop” obscures real capability gains Read More »

GAM takes aim at “context rot”: A dual-agent memory architecture that outperforms long-context LLMs

For all their superhuman power, today’s AI models suffer from a surprisingly human flaw: They forget. Give an AI assistant a sprawling conversation, a multi-step reasoning task or a project spanning days, and it will eventually lose the thread. Engineers refer to this phenomenon as “context rot,” and it has quietly become one of the …

GAM takes aim at “context rot”: A dual-agent memory architecture that outperforms long-context LLMs Read More »

The ‘truth serum’ for AI: OpenAI’s new method for training models to confess their mistakes

OpenAI researchers have introduced a novel method that acts as a “truth serum” for large language models (LLMs), compelling them to self-report their own misbehavior, hallucinations and policy violations. This technique, “confessions,” addresses a growing concern in enterprise AI: Models can be dishonest, overstating their confidence or covering up the shortcuts they take to arrive …

The ‘truth serum’ for AI: OpenAI’s new method for training models to confess their mistakes Read More »

Anthropic vs. OpenAI red teaming methods reveal different security priorities for enterprise AI

Model providers want to prove the security and robustness of their models, releasing system cards and conducting red-team exercises with each new release. But it can be difficult for enterprises to parse through the results, which vary widely and can be misleading. Anthropic’s 153-page system card for Claude Opus 4.5 versus OpenAI’s 60-page GPT-5 system …

Anthropic vs. OpenAI red teaming methods reveal different security priorities for enterprise AI Read More »

Inside NetSuite’s next act: Evan Goldberg on the future of AI-powered business systems

Presented by Oracle NetSuite When Evan Goldberg started NetSuite in 1998, his vision was radically simple: give entrepreneurs access to their business data anytime, anywhere. At the time, most enterprise software lived on local servers. As an entrepreneur himself, Goldberg understood the frustration intimately. “I had fragmented systems. They all said something different,” he recalls …

Inside NetSuite’s next act: Evan Goldberg on the future of AI-powered business systems Read More »

Gong study: Sales teams using AI generate 77% more revenue per rep

The debate over whether artificial intelligence belongs in the corporate boardroom appears to be over — at least for the people responsible for generating revenue. Seven in ten enterprise revenue leaders now trust AI to regularly inform their business decisions, according to a sweeping new study released Thursday by Gong, the revenue intelligence company. The …

Gong study: Sales teams using AI generate 77% more revenue per rep Read More »

Nvidia’s new AI framework trains an 8B model to manage tools like a pro

Researchers at Nvidia and the University of Hong Kong have released Orchestrator, an 8-billion-parameter model that coordinates different tools and large language models (LLMs) to solve complex problems. In their experiments, Orchestrator achieved higher accuracy at a lower cost than much larger models in tool-use benchmarks, while also aligning with user preferences on which tools …

Nvidia’s new AI framework trains an 8B model to manage tools like a pro Read More »

AWS launches Kiro powers with Stripe, Figma, and Datadog integrations for AI-assisted coding

Amazon Web Services on Wednesday introduced Kiro powers, a system that allows software developers to give their AI coding assistants instant, specialized expertise in specific tools and workflows — addressing what the company calls a fundamental bottleneck in how artificial intelligence agents operate today. AWS made the announcement at its annual re:Invent conference in Las …

AWS launches Kiro powers with Stripe, Figma, and Datadog integrations for AI-assisted coding Read More »

Gemini 3 Pro scores 69% trust in blinded testing up from 16% for Gemini 2.5: The case for evaluating AI on real-world trust, not academic benchmarks

Just a few short weeks ago, Google debuted its Gemini 3 model, claiming it scored a leadership position in multiple AI benchmarks. But the challenge with vendor-provided benchmarks is that they are just that — vendor-provided. A new vendor-neutral evaluation from Prolific, however, puts Gemini 3 at the top of the leaderboard. This isn’t on …

Gemini 3 Pro scores 69% trust in blinded testing up from 16% for Gemini 2.5: The case for evaluating AI on real-world trust, not academic benchmarks Read More »

Workspace Studio aims to solve the real agent problem: Getting employees to use them

One problem enterprises face is getting employees to actually use the AI agents their dev teams have built.  Google, which has already shipped many AI tools through its Workspace apps, has made Google Workspace Studio generally available to give more employees access to design, manage and share AI agents, further democratizing agentic workflows. This puts …

Workspace Studio aims to solve the real agent problem: Getting employees to use them Read More »