ACM

Non classé

Context decay, orchestration drift, and the rise of silent failures in AI systems

The most expensive AI failure I have seen in enterprise deployments did not produce an error. No alert fired. No dashboard turned red. The system was fully operational, it was just consistently, confidently wrong. That is the reliability gap. And it is the problem most enterprise AI programs are not built to catch. We have …

Context decay, orchestration drift, and the rise of silent failures in AI systems Read More »

Monitoring LLM behavior: Drift, retries, and refusal patterns

The stochastic challenge Traditional software is predictable: Input A plus function B always equals output C. This determinism allows engineers to develop robust tests. On the other hand, generative AI is stochastic and unpredictable. The exact same prompt often yields different results on Monday versus Tuesday, breaking the traditional unit testing that engineers know and …

Monitoring LLM behavior: Drift, retries, and refusal patterns Read More »

CVSS scored these two Palo Alto CVEs as manageable. Chained, they gave attackers root access to 13,000 devices.

During Operation Lunar Peek in November 2024, attackers gained unauthenticated remote admin access — and eventual root — across more than 13,000 exposed Palo Alto Networks management interfaces. Palo Alto Networks scored CVE-2024-0012 at 9.3 and CVE-2024-9474 at 6.9 under CVSS v4.0. NVD scored the same pair 9.8 and 7.2 under CVSS v3.1. Two scoring …

CVSS scored these two Palo Alto CVEs as manageable. Chained, they gave attackers root access to 13,000 devices. Read More »

DeepSeek-V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4.7, GPT-5.5

The whale has resurfaced. DeepSeek, the Chinese AI startup offshoot of High-Flyer Capital Management quantitative analysis firm, became a near-overnight sensation globally in January 2025 with the release of its open source R1 model that matched proprietary U.S. giants. It’s been an epoch in AI since then, and while DeepSeek has released several updates to …

DeepSeek-V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4.7, GPT-5.5 Read More »

85% of enterprises are running AI agents. Only 5% trust them enough to ship.

Eighty-five percent of enterprises are running AI agent pilots, but only 5% have moved those agents into production. In an exclusive interview at RSA Conference 2026, Cisco President and Chief Product Officer Jeetu Patel said that the gap comes down to one thing: trust — and that closing it separates market dominance from bankruptcy. He …

85% of enterprises are running AI agents. Only 5% trust them enough to ship. Read More »

Mystery solved: Anthropic reveals changes to Claude’s harnesses and operating instructions likely caused degradation

For several weeks, a growing chorus of developers and AI power users claimed that Anthropic’s flagship models were losing their edge. Users across GitHub, X, and Reddit reported a phenomenon they described as “AI shrinkflation”—a perceived degradation where Claude seemed less capable of sustained reasoning, more prone to hallucinations, and increasingly wasteful with tokens. Critics …

Mystery solved: Anthropic reveals changes to Claude’s harnesses and operating instructions likely caused degradation Read More »

OpenAI’s GPT-5.5 is here, and it’s no potato: narrowly beats Anthropic’s Claude Mythos Preview on Terminal-Bench 2.0

After months of rumors and reports that OpenAI was developing a new, more powerful AI large language model for use in ChatGPT and through its application programming interface (API), allegedly codenamed “Spud” internally, the company has today unveiled its latest offering under the more formal name GPT-5.5. And to likely no one’s surprise, it’s hardly …

OpenAI’s GPT-5.5 is here, and it’s no potato: narrowly beats Anthropic’s Claude Mythos Preview on Terminal-Bench 2.0 Read More »

Talking to AI agents is one thing — what about when they talk to each other? New startup BAND debuts ‘universal orchestrator’

For the past eighteen months, the corporate world has been obsessed with the “builder” phase of the generative AI revolution. Enterprises have raced to deploy autonomous agents to handle everything from customer support to complex codebase refactoring. However, as these digital workers proliferate, a new, more structural problem has emerged: fragmentation. Agents built on LangChain …

Talking to AI agents is one thing — what about when they talk to each other? New startup BAND debuts ‘universal orchestrator’ Read More »

OpenAI unveils Workspace Agents, a successor to custom GPTs for enterprises that can plug directly into Slack, Salesforce and more

OpenAI introduced a new paradigm and product today that is likely to have huge implications for enterprises seeking to adopt and control fleets of AI agent workers. Called “Workspace Agents,” OpenAI’s new offering essentially allows users on its ChatGPT Business ($20 per user per month) and variably priced Enterprise, Edu and Teachers subscription plans to …

OpenAI unveils Workspace Agents, a successor to custom GPTs for enterprises that can plug directly into Slack, Salesforce and more Read More »

Are you paying an AI ‘swarm tax’? Why single agents often beat complex systems

Enterprise teams building multi-agent AI systems may be paying a compute premium for gains that don’t hold up under equal-budget conditions. New Stanford University research finds that single-agent systems match or outperform multi-agent architectures on complex reasoning tasks when both are given the same thinking token budget. However, multi-agent systems come with the added baggage …

Are you paying an AI ‘swarm tax’? Why single agents often beat complex systems Read More »