Automation6 min read

What Agentic Commerce Actually Means

The word 'agentic' is now applied to almost everything with a language model in it. Here's a working definition based on how the technology actually functions, and a clearer view of what's in production versus what's still mostly demos.

Sarah Chen

Senior Editor

—24 February 2025

I promised last month I'd write more about what "agentic" actually means in practice. Consider this me making good on that.

The word has reached the point in the hype cycle where it's being applied to almost everything with a language model in it. I saw a vendor last week describe their form-autofill tool as "agentic." It isn't. Words losing meaning like this is irritating, but it's also a useful signal: if everyone's calling their product agentic, something real must be driving the terminology.

So here's my working definition, based on how the technology actually functions rather than how it gets marketed.

An AI agent is a system that can pursue a goal across multiple steps, using tools to gather information and take actions, making decisions at each step based on what it finds. The key distinction from a standard LLM prompt is the loop: an agent can observe, decide, act, observe again, and revise its approach. It doesn't need a human to specify each step. Anthropic put it plainly in their Building effective agents guide: agents are "systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks." The useful contrast is with workflows: systems where an LLM and tools are orchestrated through predefined code paths. Both are useful. They're architecturally different.

That definition is deceptively simple. The implementation complexity is considerable.

The Stack Underneath the Word

For an agent to do anything useful in a commerce context, it needs at minimum: a language model that can reason, a set of tools it can call (APIs, databases, external systems), a goal or set of goals, and some form of memory so it can maintain context across steps.

The tooling layer is where most of the actual work is. An agent that can reason about whether to reorder stock but can't actually call your inventory system or your supplier API is an agent that can produce a very convincing recommendation document. That's not worthless, but it's not agentic in any commercially meaningful sense.

The memory problem is underappreciated. Most language models process each request within a context window, which means a long-running agent needs explicit memory infrastructure to maintain state over hours or days. The frameworks for this are improving quickly. LangChain, LlamaIndex, and Microsoft's Semantic Kernel are the names you'll see most often, but "the infrastructure exists" and "the infrastructure is production-ready for your specific use case" are not the same sentence.

One thing Anthropic's team noted from working with dozens of production teams: the most successful implementations weren't using complex frameworks or specialised libraries. They were building with simple, composable patterns. The flashiest tooling stack is rarely the one that ships.

What Works Now

I think there are roughly three tiers of agentic use cases based on current maturity:

Works, is in production, generates real value. These are tightly scoped, data-rich, well-defined tasks with clear success criteria. Category-level inventory decisions. Price monitoring and alert escalation. Customer service routing and first-response generation. Promotional performance analysis with recommended adjustments. UK retailers including Boots and Tesco have piloted AI in exactly this tier (customer service routing and inventory monitoring) where the agent's decision space is constrained and the consequences of errors are manageable. The common thread is that these look boring from the outside. They work precisely because they're boring.

Promising but fragile. More open-ended tasks where the agent has to reason across multiple data sources with less structured inputs. Demand forecasting that incorporates external signals. Personalised offer generation at individual customer level. The demos are impressive; the production reliability is variable. These are the "run it in supervised mode and review outputs" cases. See also the consumer trust piece for why the supervision question matters beyond pure technical reliability.

Mostly still demos. Fully autonomous agents that can execute end-to-end multi-system workflows without supervision. The vision is compelling: an agent that identifies a supply chain problem, finds alternative suppliers, negotiates prices, raises purchase orders, and updates your ERP. The engineering required to do that reliably, with appropriate error handling and rollback logic across a real enterprise stack, is not something most organisations have the capacity to build. The agentic payments infrastructure piece covers how the payment layer is getting there, and it paints an honest picture of how much groundwork is still being laid.

The Trust and Control Problem

There's a philosophical dimension here that I find more interesting than the technical one, which is the question of how much autonomy you're actually comfortable giving to a system that makes decisions with real consequences.

In Thinking, Fast and Slow, Daniel Kahneman draws a distinction between fast, intuitive judgements and slow, deliberate reasoning. AI agents are good at the slow reasoning part — they can follow complex logic chains and consider many factors simultaneously. What they're not good at is the fast, contextual, "something is slightly off about this" gut-check that an experienced person applies before taking action.

The practical implication: agents work best when they operate within policy guardrails set by humans who understand the business, with clear escalation paths when something falls outside those guardrails. "Agents that can act autonomously" is the exciting framing. "Agents that require careful policy design to act safely" is the accurate one. Agentic systems, as Anthropic notes, trade latency and cost for better task performance — a tradeoff that needs conscious evaluation, not just technical feasibility.

What I'd Actually Suggest

If you're trying to figure out where to start with agentic tooling, I'd resist the pull toward the most impressive-sounding use case and ask instead: where do we have a well-defined decision process that currently requires a human to gather information from multiple places, apply a consistent rule, and then take a straightforward action?

That's your pilot. It probably sounds boring. That's fine. The boring pilots are the ones that actually get deployed.

The AI commerce 2025 review showed that the agentic infrastructure built out significantly last year: payment rails, checkout integrations, memory frameworks all advancing at pace. What hasn't kept up is the combination of organisational readiness and consumer trust needed to put it in front of actual shoppers without human oversight. The organisations that close that gap will mostly be the ones that started with the boring version and learned from it.

Agentic AI is genuinely interesting technology. It also, to borrow from Douglas Adams, is perhaps slightly less real in current deployments than the marketing makes it sound. Both things are true.