A new MIT study found that 95% of enterprise generative AI pilots produced no measurable P&L impact, even as $30–40 billion was spent on them in 2025. The GenAI Divide report from MIT's Project NANDA identified three traits shared by the 5% that worked. They targeted operational, repetitive work. They were built around a specific workflow rather than a general capability. They were deployed with a vendor partner, which succeeded about 67% of the time, versus internal builds that succeeded only one-third as often. The technology is real, the production data is in, and the failure pattern is now well-documented. Most deployments simply aren't designed to clear it.
Agentic AI is the class of system that takes a goal, plans the steps, calls the tools, and executes, with the human moved into an oversight role rather than a clicking-through-screens role. This piece is for the mid-market operator (500–5,000 employees, technology budgets in the $250K–$5M range) deciding which use cases to ship in 2026 and how to sequence them. Below are six production use cases that ship inside a year, the framework to sequence them, and the governance layer that turns isolated wins into compounding advantage.
Human oversight is non-negotiable
Before the use cases, the principle that governs all of them. Every agentic AI deployment described below requires meaningful human oversight, and "meaningful" carries weight in that sentence. It is not a checkbox. It is not a quarterly audit. It is a person who is capable of evaluating what the agent did, empowered to correct or reverse it, and accountable for the outcome.
The temptation in 2026 is to treat agents as autonomous because they can be. The capability exists. The wisdom of using it does not follow automatically. The published failure modes (drift, hallucination, prompt injection, silent error accumulation, brittle escalation) are all bounded by the quality of the human review layer sitting on top of the agent. Deloitte's 2026 report found that only one in five companies has a mature governance model for agentic AI, even as usage is poised to rise sharply over the next two years. That gap is the gap.
What "meaningful oversight" looks like differs by use case. The shape it takes for customer support is different from the shape it takes for code generation, and that's different from the shape it takes for compliance work. Each section below names the specific form. The general rule across all six (a human who can evaluate, correct, and is accountable) does not vary.
1. Customer support triage and resolution
Agentic AI is now production-grade for first-line customer support. It triages incoming tickets, drafts context-aware replies, and resolves common issues end-to-end without human escalation.
Customer support has the structural ingredients agentic systems handle best. High volume. Well-documented procedures. Structured data across tickets, customer history, and the knowledge base. Clear success metrics (resolution time, CSAT, escalation rate). The agent reads the incoming ticket, pulls the customer's account context, drafts a response, optionally executes the action (refund, password reset, order status lookup), and either resolves or hands to a human with everything pre-loaded.
The production data is robust. Deloitte's 2026 State of AI in the Enterprise report cites an air carrier using AI agents to handle the most common customer transactions like flight rebooking and baggage rerouting, freeing human agents for more complex matters. The capability gap that made earlier chatbots brittle has closed. Modern agentic systems maintain context across multi-turn conversations, recover from errors, and escalate cleanly when uncertain.
- Start narrow. One channel (e.g., email or chat), one customer segment, one product tier. Don't deploy across email, chat, and voice on day one.
- Define the escalation contract explicitly. What does the agent hand off, with what context, to which human queue, with what SLA.
- Human oversight that fits this use case. A support manager who reviews a sample of agent-resolved tickets weekly, reads escalation handoffs daily, and owns the CSAT and resolution-quality metrics. The agent doesn't get to grade its own homework.
- Measure deflection AND quality. Volume reduction without CSAT discipline is a false win that surfaces two quarters later as churn.
2. Back-office finance, including invoices, expenses, and reconciliation
The highest-ROI agentic AI deployments are in back-office finance (invoice processing, AP/AR reconciliation, expense compliance, month-end close), not the sales and marketing tools where most AI budgets currently go. This is exactly the pattern we've written about in The Silent Budget Leak, where operational drag accumulates in places leadership rarely audits until it compounds.
This is the most-misallocated area of enterprise AI spending. MIT found that more than half of generative AI budgets are devoted to sales and marketing tools, yet the biggest ROI sits in back-office automation, including eliminating business process outsourcing, cutting external agency costs, and streamlining operations. The work itself is document-heavy and rules-driven, which is exactly the profile agentic systems run cleanly.
A concrete example. An invoice arrives by email. The agent extracts vendor, amount, PO reference, and line items, matches against the purchase order and goods receipt, flags discrepancies, routes for approval, codes the GL account, and updates the ERP. What was a multi-handoff, multi-day workflow becomes minutes. The same pattern applies to expense report compliance, vendor reconciliation, and the dozens of small loops that consume controllers' time during month-end close. Walmart's AI-driven supply chain shows the ceiling: its self-healing inventory system, which automatically redirects stock before shortages reach stores, has saved more than $55 million, and AI-powered route optimization has eliminated roughly 30 million driving miles; mid-market analogs are smaller in absolute terms and similar in proportion.
- Audit existing finance workflow time. Where do AP, AR, expense, and close hours actually go? The leak is rarely where leadership assumes.
- Pick one workflow with high volume and clear rules. Invoice processing is a strong starter. Travel expense compliance is another.
- Human oversight that fits this use case. A controller who reviews exception flags daily, samples agent-coded GL entries weekly, and signs off on month-end before close is committed. Materiality thresholds are set in advance and enforced. Anything above threshold goes to human approval before it touches the books.
- Insist on a full audit trail from day one. Finance has zero tolerance for black-box decisions on the GL.
3. Software engineering and legacy code migration
Agentic coding tools are now production-grade for skilled engineers doing legacy migration, automated test generation, and routine engineering work. They are not a substitute for engineering judgment, and the operators getting durable value treat them accordingly.
This is the use case where the gap between what the tools can do and what untrained users believe they can do is widest. Modern agentic coding systems read repos, understand context, generate changes, run the test suite, and iterate until the change passes. In the hands of an experienced engineer, this compresses weeks of work into days. In the hands of a non-technical operator who sees a working demo and assumes the demo generalizes, the same tools produce code that compiles, looks reasonable, and quietly carries security flaws, brittle architectural choices, or maintenance debt that surfaces six months later when the original prompt-author has moved on. The trade press calls the second pattern "vibe coding." We see it in the field often enough to flag it as the dominant failure mode of 2026.
The production examples that work are unambiguous about who's at the keyboard. Stanford's Enterprise AI Playbook, drawing on 51 successful deployments, documented a large fintech using an AI coding agent to migrate millions of lines of legacy ETL code to a modern architecture in weeks, with engineering identifying additional acceleration opportunities within days of launch. JPMorgan Chase powers over 450 AI use cases, achieving up to $2 billion in annual business value. Stanford's research also found that agentic implementations showed 71% median productivity gains, versus 40% for high-automation but non-agentic deployments. In each case the agents accelerated work that experienced engineers were already capable of doing. They did not replace the judgment that decided what to build, how to architect it, and what acceptable risk looked like.
The capability shift behind these tools is genuine, and we covered it in Claude Opus 4.6: The First Enterprise AI Model Built for Autonomous Teams. Capability is not the bottleneck. Operator judgment is.
- Keep an engineer at the keyboard. The productivity gains in the published research came from experienced engineers using agents as accelerators. The same tools used by non-technical staff to ship production systems are the failure mode, not the success pattern.
- Pilot on bounded, low-risk codebases first. Internal tools, data pipelines, test code. Not customer-facing systems, not anything touching financial transactions, PII, or core infrastructure.
- Human oversight that fits this use case. Treat agent output as PRs, not commits. Every change reviewed by an engineer capable of understanding what changed and why, with CI/CD gating non-negotiable. The reviewer is accountable for the merge.
- Track code throughput AND defect rates AND security findings. Speed without quality and security discipline creates technical debt at machine scale, which is a different and harder problem than fast technical debt.
- Resist the prototype-to-production drift. A working prototype is not a production system. The most common failure pattern we see is non-technical operators shipping prototypes to production because the prototype "works." It works until it doesn't, and the cost of fixing a system nobody on staff understands is multiples higher than building it correctly the first time.
4. Sales operations and CRM hygiene
Agentic AI runs the meeting-notes-to-CRM pipeline that sales teams chronically neglect. It captures actions from calls, drafts follow-up communications, updates opportunity records, and surfaces deals at risk based on signal patterns.
Most sales organizations have a structural data gap. Reps don't update the CRM, and what does get logged is shallow. The downstream effect is forecasting noise, missed renewals, and pipeline reviews that operate on fiction. Agents close the loop. They listen to the call (or read the transcript), extract decisions and commitments, draft follow-up emails, update the opportunity record, and flag deals where signals are deteriorating.
Deloitte cites a financial services company using agentic workflows to automatically capture meeting actions from video conferences, draft communications to remind participants of their commitments, and track follow-through. The integrations that make this work (calendar, email, conferencing platform, CRM) are all API-accessible, and most have official agentic connectors as of 2025–2026.
- Pilot with high-performing reps first, not the laggards. Better signal, better feedback, better internal champions.
- Define hard rules for what auto-fills versus what stays manual. Sales leaders care about specific fields; agents need explicit boundaries.
- Human oversight that fits this use case. The rep reviews and approves agent-drafted follow-up emails before they send to customers, full stop. CRM updates can run more autonomously, but a sales operations lead audits field-level accuracy weekly during the first quarter and monthly thereafter. Deals flagged at risk get human-routed before any customer-facing action.
- Measure forecast quality, not just CRM completeness. A perfectly logged CRM that doesn't improve forecast accuracy is a productivity theater win, not a real one.
5. Knowledge management and internal Q&A
Agentic AI gives every employee a colleague who has read every internal document, every policy, every runbook, every past project post-mortem, and answers in seconds with citations. The architectural pattern that makes this trustworthy is something we've explored in Progressive Disclosure for AI Agents.
This is the most universally deployable use case because every company has the same problem. Institutional knowledge is buried across Slack, Notion, Google Drive, SharePoint, Confluence, and old email threads. Agents that index and retrieve from that corpus replace the most common time-sink in knowledge work, which is finding the answer.
A concrete example. A new engineer asks, "What's our process for handling a security incident in the customer database?" The agent retrieves the runbook, the most recent post-mortem on a similar incident, the current escalation contacts, and a summary, with citations to each source. What used to take a Slack message, a meeting, and forty minutes becomes thirty seconds. The pattern generalizes to HR policy questions, sales enablement, engineering on-call, and customer-facing FAQ generation. Databricks' 2026 State of AI Agents report, drawing on data from over 20,000 global customers, found that 40% of the top 15 enterprise use cases focus on customer experience and engagement, with internal knowledge retrieval as a near-universal foundation.
- Start with one document corpus, not the whole company. Engineering runbooks. HR policies. Sales enablement. Pick one, prove the pattern, expand.
- Govern access controls before launch, not after. The agent must respect existing document permissions, and retrofitting this later is expensive.
- Human oversight that fits this use case. Citations are mandatory in every answer, and a content owner is assigned to each indexed corpus to monitor accuracy and flag drift. Users are trained to verify high-stakes answers (legal, compliance, customer-facing) against source documents rather than trusting the agent summary.
- Measure usage AND grounded accuracy. A heavily-used inaccurate agent is worse than a lightly-used accurate one.
6. Compliance, audit, and regulatory monitoring
For regulated organizations, the highest-leverage agentic deployment reads regulations, monitors operations, prepares audit evidence, and flags exceptions, continuously, rather than in quarterly catch-up exercises.
Compliance and audit work matches the agentic profile precisely. Large volumes of structured documents (regulations, policies, control frameworks). Repetitive evidence collection. Deterministic checks against rules. Done by humans, it's expensive, error-prone, and slow. Done by agents under proper governance, it's fast, consistent, and auditable.
A concrete example. A regional bank deploys agents to monitor regulatory filings (FFIEC, CFPB, state-level updates), map each change against the bank's internal control frameworks, draft impact assessments, and alert compliance teams to required actions. The quarterly catch-up becomes continuous monitoring with a human-reviewed change log. Deloitte's 2026 report identifies customer support as the highest-impact agentic AI domain, with supply chain, R&D, knowledge management, and cybersecurity close behind across surveyed enterprises; the compliance variant of this is now a documented industry pattern in financial services and regulated utilities.
- Govern the agent's permissions like any regulated system. Audit trail, access controls, version control, held to the same standards as the systems it monitors.
- Human oversight that fits this use case. A compliance officer reviews and approves every externally-facing output before it leaves the organization. Regulator letters, audit reports, and compliance attestations stay human-signed. The agent prepares; the human commits.
- Establish a tested fallback when the agent is uncertain. Compliance is the wrong domain for confident hallucinations. Uncertainty thresholds are defined in advance and route to humans by design.
A Practical Framework: How to Sequence the Six
Most mid-market operators try to do all six at once. Don't. The pattern that ships, based on Stanford's analysis of 51 successful deployments showing iterative, layered approaches consistently outperform top-down transformations, is sequenced and incremental.
Phase 1 (months 0–3): Knowledge management and internal Q&A. Lowest risk, broadest deployment, fastest user adoption. Builds organizational fluency with agent interfaces. Forces the document-hygiene and access-control discipline that downstream use cases require.
Phase 2 (months 3–6): Customer support OR back-office finance. Pick one based on which is the bigger pain in your P&L. Customer support if you're SaaS, e-commerce, or service-heavy. Finance ops if you're product-heavy, manufacturing, or distribution.
Phase 3 (months 6–12): Software engineering augmentation and sales operations. These run in parallel and benefit from the integration foundation established earlier.
Phase 4 (months 12+): Compliance and audit. Highest stakes. Deserves the most mature operational foundation. By this point your team has the muscle memory, the governance layer is tested, and the failure modes are understood.
Three non-negotiables across all phases. Build the governance layer before the second use case including permissions, audit trail, version control, evaluation framework. Only 20–21% of enterprises possess mature governance frameworks, and 88% of agentic AI pilots never progress to sustained production; that gap is what governance closes. Pick a partner. We sit on the buyer's side of the table and we don't sell any of the platforms in this space, so we'll say plainly what most published analyses won't. MIT's research found vendor-led implementations succeed at 67% versus 33% for internal builds. The differential is not capability. It's pattern recognition across many deployments. Maintain meaningful human oversight at every stage, designed for the specific use case rather than copied from a template. The point of human oversight is not compliance theater. It's the mechanism by which agents stay aligned to what the business actually wants.
Looking Ahead
The 95% failure rate that defines current enterprise AI reporting will not hold past 2027. The use cases that ship are now well-understood. The implementation patterns are documented across thousands of deployments. The cost of inaction is quantifiable in the categories where competitors have already deployed. Mid-market operators have a window, measured in two to four budget cycles, to establish operational fluency with agentic systems before that fluency becomes table stakes. The companies that compound advantage through 2027 will be the ones that picked two or three use cases, sequenced them carefully, built the governance layer once, kept humans meaningfully in the loop, and resisted the temptation to chase every demo that lands in their inbox.
And if you are weighing how to sequence this on your own roadmap, we are here to help.
Honra is an independent technology advisory firm based in San Juan, Puerto Rico. We provide fractional CTO and CIO services, strategy, owner's representation, and implementation across software, data, and AI. Start an engagement.



