Abstract

AI extracts the "tricks of the trade" (tacit knowledge) that exist only in the minds of experienced employees—drawing from screen operations, recorded videos, and continuously collected work logs—and automatically transcribes them into standard operating procedures (SOPs). Furthermore, AI agents themselves read these procedures and directly operate internal systems to complete the work. This new stack, composed of four layers—Screen-to-SOP, Video-to-SOP, Passive Capture, and Agentic SOP—is rapidly emerging in the first half of 2026. This article organizes the representative products in these four areas for beginners, and then provides an integrated explanation of how Silicon Valley's top VCs interpret this trend and how they are directing their funding toward it.


The Big Picture of the SOP Generation Pipeline Drawn by Four Domains

As of May 2026, the AI market surrounding standard operating procedures (SOPs) has come to be understood as a "pipeline" linking four layers. At the most downstream position is Screen-to-SOP, which observes each on-screen action that an expert performs in their daily work and produces a step-by-step procedure document on the spot. Next is Video-to-SOP, which retrospectively analyzes previously accumulated recordings and Zoom meetings and transcribes them into text SOPs. Further upstream lies Passive Capture, which continuously absorbs surrounding data even while the user works without consciously thinking about it, building a knowledge graph from raw work records "not written in any official manual"—such as email text, support tickets, Slack conversations, and file editing histories. And finally, Agentic SOP is the layer where AI agents directly "read and execute" the procedure documents extracted in this way.

These four layers are not in a competitive relationship but rather a complementary one. Both Screen-to-SOP and Video-to-SOP build procedure documents starting from an explicit "record" action, but Passive Capture is a mechanism in which information accumulates even without employees pressing a record button, so the base volume of knowledge is on an entirely different order of magnitude. And Agentic SOP is the side that consumes the "machine-readable work descriptions" generated by these three layers. As Sonya Huang and Pat Grady, partners at Sequoia Capital, repeatedly emphasized in their keynote at AI Ascent 2026 in April 2026, this sequence of flows symbolizes a transition from an era when "software merely described things" to "an era when software delivers completed work as a deliverable." They call this market "Services-as-a-Software," and argue that the global services market (estimated at $10 trillion, roughly equivalent to ¥1,550 trillion) will be opened up to AI agent companies as a TAM more than ten times that of the conventional software market (around $600 billion annually, roughly equivalent to ¥93 trillion).

Below, we dig into the substance of each of the four fields through concrete products and use cases.


Screen-to-SOP — From Real-Time Capture of Screen Operations to Automated Manual Generation

Screen-to-SOP is the category with the longest history and one that has already achieved significant commercial success. Users simply operate their business systems as they normally would, while a browser extension or resident desktop app records every click, scroll, and text entry. The moment they press the stop button, it spits out a step-by-step manual complete with screenshots. It becomes easy to picture when you put it this way: an accounting staff member just clicks through the 30 steps of issuing an invoice, and a "Invoice Issuance Manual ver. 2026-05" is automatically generated.

The de facto industry leader in this space is Scribe, based in San Francisco. On November 10, 2025, the company raised $75 million (approximately ¥11.6 billion) in a Series C round led by StepStone Group, reaching a post-money valuation of $1.3 billion (approximately ¥201.5 billion). Existing investors Amplify Partners, Redpoint Ventures, Tiger Global, Morado Ventures, and New York Life Ventures also contributed additional funding. According to TechCrunch's report on November 10, 2025, Scribe unveiled its new product "Scribe Optimize" at the point when its cumulative funding reached roughly $150 million (approximately ¥23.3 billion). Optimize goes beyond mere manual generation: it continuously mines employees' workflows in the cloud, visualizing redundancies in work, rework, and hotspots that are candidates for automation. If the company's traditional product (now rebranded as "Scribe Capture") is a product that "creates individual manuals," then Optimize is a product that "draws a map of the entire organization's operations and proposes improvements" — a clear move by Scribe to evolve from an SOP tool valued at over $1 billion into a larger process intelligence platform. Scribe currently supports only browser-based SaaS apps, but it has explicitly stated in its roadmap that it will expand to legacy apps such as mainframe and function-key-based systems.

Widely known as the runner-up to Scribe is Tango, also headquartered in San Francisco. Founded in 2020, it raised $5.7 million (approximately ¥880 million) in a seed round led by Wing VC in August 2021, and $14 million (approximately ¥2.2 billion) in a Series A led by Tiger Global in June 2022, for a cumulative total of about $19.7 million (approximately ¥3.05 billion). A distinctive feature is its lineup of strategically oriented VCs, including General Catalyst, Slack Fund, Atlassian Ventures, and GSV Ventures. According to PitchBook and Crunchbase data, it is said to be used by 25,000 teams as of May 2026. Tango's differentiating point is that it re-overlays the generated manual onto the live business screen in "Guide Me" mode, so that as a new employee follows along, a hands-on guide advances in real time. Whereas Scribe's strength lies in "building a static knowledge base," Tango leans toward "immediate on-site guidance (Digital Adoption)," and the two companies effectively form a duopoly.

Incumbent forces that have long existed in the enterprise space are also mounting a comeback in this segment. Whatfix, based in India and San Francisco, raised $125 million (approximately ¥19.4 billion) in a Series E in September 2024, bringing its cumulative funding to over $139 million (approximately ¥21.5 billion). In 2026, they introduced a suite of agents dubbed the "Authoring Agent," "Guidance Agent," and "Insights Agent," shifting toward an agentic DAP (Digital Adoption Platform) in which AI continuously rewrites guides automatically on a UI superimposed directly onto the operation screens of business systems. Meanwhile, the US-based WalkMe, long a rival of Whatfix, has steered toward optimizing the SAP ecosystem ever since SAP acquired it for approximately $1.5 billion (approximately ¥232.5 billion) in 2024, and its presence as an independent "Screen-to-SOP" player has faded.

Characteristic use cases include: (1) full rewrites of manuals accompanying ERP or CRM version upgrades, where using Scribe allows work that previously took a week to be completed in half a day; (2) new-hire training at call centers, where Tango's guides superimposed on the screen turn "training that used to take two weeks into three days"; and (3) workplaces such as medical administration or government service counters, where work procedures must be reset with every personnel transfer, and Whatfix instantly reconfigures "custom manuals for each site" via an agent.


Video-to-SOP — Transcribing existing recorded videos into work procedure manuals

Video-to-SOP appears similar to Screen-to-SOP at first glance, but there is a decisively different point. It does not require new recordings to be made for the purpose of creating procedure manuals — in other words, it is a mechanism for rescuing knowledge from video assets that have "already been filmed." Training videos, internal meetings recorded over Zoom, tutorials for SMEs (small and medium-sized enterprises) left sitting on YouTube — while such video content has increased explosively over the past decade, its searchability is extremely low, and as a form of knowledge it has been in an almost "dormant" state. With the practical deployment of multimodal LLMs, it has become possible to read the text and UI elements shown on screen, cross-reference them with the narration, and produce structured procedure manuals, and as a result this rescue of dormant knowledge is advancing rapidly.

The most closely watched funding round in this category in 2026 was that of Guidde, which has offices in Tel Aviv and San Francisco. On February 25, 2026, Guidde announced that it had completed a $50 million (approximately 7.7 billion yen) Series B round led by U.S.-based PSG Equity. According to reporting by CTech, participants in the round included task management SaaS company monday.com, existing investors Norwest, Entrée Capital, Qualcomm Ventures, and Inkberry Ventures. The defining feature of Guidde's platform is that, simply as employees go about their regular work, the AI simultaneously analyzes screen recordings and narration, and automatically generates both subtitled explanatory videos and text SOPs. As of early 2026, the company has disclosed that it has more than 4,500 customers, including Anheuser-Busch, Bayer, Nasdaq, Yahoo, and SentinelOne, that its annual revenue has tripled for three consecutive years, and that its customer retention rate exceeds 90%. The new funding is reportedly earmarked for expanding implementation partnerships with global accounting firms such as KPMG and Deloitte, hinting at a strategy of digging deep into IT transformation projects at major corporations across Japan, the U.S., and Europe.

Beyond Guidde, dedicated Video-to-SOP players are emerging one after another. India-based Trupeer specializes in simultaneously generating "polished explanatory videos" and "text SOPs" from recorded video, and is praised for its visually well-organized output, complete with brand-compliant templates. Clueso has strengths in AI-driven post-production processing such as automatic zoom, noise removal, and script cleanup, while Vidocu is a one-source, multi-output type that, simply by uploading a single video, produces subtitles, dubbing (in more than 65 languages), screenshot-equipped articles, and edited video all at once. Furthermore, Docsie's Video-to-SOP emphasizes that it performs image recognition on the text and UI elements within a video, cross-references them with the narration, and semantically interprets "what was clicked for what purpose."

Loom originally started as a "personal screen recording tool," but ever since Atlassian acquired it for $975 million (approximately 151 billion yen) in 2023, it has been deeply integrated into Atlassian's knowledge ecosystem. As of 2026, Loom AI not only automatically generates titles, summaries, chapters, and action items from recordings, but also offers five types of templates — SOPs, step-by-step guides, QA steps, PR descriptions, and code documentation — and the generated outputs can be sent into Jira tickets or Confluence pages with a single click. Its enterprise pricing tier also includes a global administrator view and native Confluence/Jira integration, and organizations that had previously regarded Loom as a simple messaging-video tool have begun to adopt it anew as a primary base for internal SOPs.

Characteristic use cases include projects such as: (1) collectively transcribing a pharmaceutical company's body of GxP validation videos into structured SOPs, compressing the labor hours required to respond to regulatory authority audits; (2) a manufacturer's headquarters quality control department using Video-to-SOP to standardize "on-site setup videos" filmed by skilled workers wearing smart glasses; and (3) a consulting firm converting recordings of past client workshops into a systematized methodology library. The reason Guidde is teaming up with major accounting firms is precisely that this third use case is extremely attractive as an intellectual property business for KPMG and Deloitte themselves.


Passive Capture — Absorbing tacit knowledge in the background

Compared with Screen-to-SOP and Video-to-SOP, Passive Capture carries a more philosophical ambition. The concept is to continuously ingest every kind of business event—email, Slack, Teams, support tickets, CRM comments, file edits, meeting recordings—without employees ever performing any conscious action such as "pressing a record button" or "uploading a video," thereby building an organization-wide "mothership of tacit knowledge." As "Tacit Knowledge Is Your Next Competitive Moat," published by California Management Review in March 2026, points out, in the age of agents competitive advantage is no longer about data or models, but is shifting toward the tacit knowledge embedded in employees' judgment.

The most talked-about funding round in this category in 2026 was that of Munich-based Interloom. As Fortune exclusively reported on March 23, 2026, Interloom secured $16.5 million (approximately ¥2.6 billion) in a seed round. The lead investor was DN Capital, with Bek Ventures and Air Street Capital participating. The company had already raised a $3 million (approximately ¥460 million) pre-seed round in March 2024, bringing its cumulative total to roughly $20 million (approximately ¥3.1 billion). Interloom's product ingests millions of "records that arise naturally in the course of work"—support emails, service tickets, call-center transcriptions, work orders—and, just as Google Maps learns the fastest route from traffic volume, continuously updates a context graph of "how problems have been solved on the ground." The company has been adopted in production at Commerzbank, one of Germany's largest banks, where it narrowed the gap between documented manuals and the practical operational knowledge of the field from about 50% to 5%; at Volkswagen for automating first-line responses to support tickets; and at Zurich Insurance for automating underwriting work.

Aiming for a similar position on the U.S. side is Workhelix, launched by Stanford's Eric Brynjolfsson and others. The company raised $15 million (approximately ¥2.3 billion) in a Series A in February 2025, led by AIX Ventures. Andrew Ng's AI Fund, Accenture Ventures, and Bloomberg Beta participated, along with angels including LinkedIn co-founder Reid Hoffman, OpenAI co-founder and current Thinking Machines Lab CEO Mira Murati, and Google DeepMind's Jeff Dean. Workhelix's approach is to break a company's operations down into more than 250,000 task-level units and score each on "whether AI can take it over" and "how much of a productivity gain can be expected if it does." In effect, this is a service that converts the "business objects made observable by Passive Capture" into an AI-adoption roadmap. Customers such as Accenture, Wayfair, and Coursera were among the early names on its roster.

Symbolizing Passive Capture by way of hardware is Limitless, the company formerly renamed from Rewind AI. CEO Dan Siroker is a serial entrepreneur known as the founder of Optimizely, and the company raised a cumulative total of more than $33 million (approximately ¥5.1 billion) from a16z. It had operated along two lines—the "Rewind" app, which constantly records the screen and audio of a Mac desktop and recalls them as searchable memory, and the $99 (approximately ¥15,000) pendant-type hardware "Limitless Pendant" worn around the neck—but according to reporting by CNBC, TechCrunch, and the SF Standard dated December 5, 2025, Meta announced its acquisition that same day (the acquisition amount was undisclosed). Meta has announced that it will discontinue the pendant business and that the desktop app "Rewind" will also fully halt its screen- and audio-recording features after December 19, 2025. It was received as a symbolic move in which Big Tech, rather than an independent startup, absorbed the most cutting-edge form of Passive Capture—always-on recording—into its own house.

In the category that has offered Passive Capture as an extension of knowledge search, Glean made an overwhelming leap in valuation. In January 2026, the company announced a $260 million (approximately ¥40.3 billion) Series E led by Altimeter and DST Global, at a valuation of $4.6 billion (approximately ¥713 billion). Then, roughly nine months later, still within 2026, it raised an additional $150 million (approximately ¥23.3 billion) in a Series F led by Wellington Management, sending its valuation soaring to $7.2 billion (approximately ¥1.1 trillion). Glean's origin is enterprise search that automatically indexes business information scattered across SaaS and databases, instantly pulling up the relevant passage when an employee merely asks for "that document on that deal," but in 2026 TechCrunch reported that the company is aiming to become "a unified layer that runs behind applications." Otter.ai, too, started from the transcription of meeting records, and in March 2025 its annual recurring revenue (ARR) surpassed $100 million (approximately ¥15.5 billion). On April 28, 2026, it announced a new platform called the Conversational Knowledge Engine, taking a step toward weaving the very utterances of meetings into a company's knowledge base in real time. Sweden's Sana, after reaching a valuation of $500 million (approximately ¥77.5 billion) with an NEA-led $55 million (approximately ¥8.5 billion) round in October 2024, was acquired by Workday on November 4, 2025, and has been relaunched as knowledge AI within the HR cloud giant.

Typical use cases include: (1) organizations that, in customer support, blithely forget "how our team handled a similar case in the past" using Interloom to make tacit resolution procedures visible; (2) the risk of a "mass outflow of know-how" caused by the retirement of veterans in manufacturing and finance being supplemented by the knowledge graphs of Workhelix and Synaply; and (3) sales organizations converting "the procedures by which that top salesperson landed a big deal" into a reproducible playbook via Glean/Otter. That said, Passive Capture is always accompanied by "employee psychological resistance to constant monitoring" and "the risk of running afoul of recording regulations." Meta's immediate discontinuation of general sales of the Limitless it acquired is also read as a decision by a corporate giant to absorb these costs of social friction.

Agentic SOP — From Procedure Manuals to Autonomous Execution Agents

Agentic SOP is the layer where the operational descriptions generated by the three layers above are not "read by humans" but "read and executed directly by AI agents." As Sequoia Capital indicated in its January 2026 paper "2026: This is AGI," they define this market as the evolution of generative AI from "AI that describes" to "AI that achieves," with the Long-Horizon Agent—which sees long-form tasks through to completion—as the protagonist. Indeed, the companies that raised the largest sums of capital in 2026 were precisely the cohort of agent companies that autonomously execute operational procedures.

A symbolic case is Sierra, led by Bret Taylor. As TechCrunch and CNBC simultaneously reported on May 4, 2026, Sierra completed a $950 million (approximately ¥147 billion) round co-led by Tiger Global and Google's GV, reaching a post-money valuation of $15.8 billion (approximately ¥2.4 trillion; Tech Startups' figure—some outlets round up to "over $15 billion"). That is roughly more than triple its valuation of $4.5 billion (approximately ¥700 billion) a year and a half earlier. Sierra also disclosed alongside this that its cumulative funding had surpassed $1 billion (approximately ¥155 billion). AI's "customer experience agents" have gone beyond the order inquiries and password resets originally envisioned, now handling heavy operations such as mortgage origination, insurance claims, subscription management, and healthcare revenue cycle management, with over 40% of the Fortune 50 as customers. Its ARR is said to have reached $150 million (approximately ¥23.3 billion).

Standing alongside Sierra as a rival is San Francisco's Decagon. According to TechCrunch, the company completed a $250 million (approximately ¥38.8 billion) Series D on January 28, 2026, co-led by Coatue Management and Index Ventures, with a valuation of $4.5 billion (approximately ¥700 billion). Cumulative funding since founding is more than $231 million (approximately ¥35.8 billion), with Andreessen Horowitz, Accel, and others continuing to back it. Decagon's technical keyword is "Agent Operating Procedures (AOP)." According to the company's explanation, AOP is a "compilable SOP" that allows natural-language operational rule descriptions and code-level guardrails to be written simultaneously—structured so that non-engineers can instantly change operational logic, while engineers can prevent mistakes with a verifiable test framework. Decagon's customers include B2C/B2B SaaS companies such as Notion, Duolingo, Substack, Bilt, Rippling, and ClassPass. The way the company declares that "SOPs are no longer documents read by people but bundles of logic interpreted by machines" shows that the concept of Agentic SOP is not merely a buzzword but a core design philosophy in implementation.

Cresta, a leader in real-time AI for enterprise call centers, completed a $125 million (approximately ¥19.4 billion) Series D in November 2024, led by World Innovation Lab (WiL) and the Qatar Investment Authority, having raised a cumulative total of over $270 million (approximately ¥41.9 billion). It uses real-time conversational AI to whisper advice into human operators' ears, and takes a hybrid form that can also switch to fully automated handling. In the compliance domain, New York's Norm Ai is a distinctive presence. It announced a $48 million (approximately ¥7.4 billion) Series B in March 2025 (cumulative $147 million = approximately ¥22.8 billion), and on February 19, 2026 it was announced that the company's foundational agents would be integrated into Microsoft Foundry, with Coatue and Blackstone among its backers. Norm Ai's concept of ingesting regulatory text and converting it into "executable AI agents for compliance SOPs" is, within Agentic SOP, a particularly typical example aimed at regulated industries.

The company that scaled up under the banner of "autonomous software engineer" is Cognition AI; as Bloomberg and SiliconANGLE reported on April 23, 2026, it is said to be in discussions to raise a round on the order of several hundred million dollars at a valuation of $25 billion (approximately ¥3.9 trillion). Because its valuation was $10.2 billion (approximately ¥1.6 trillion) as of the previous round in September 2025, that works out to the valuation swelling 2.5-fold in a little over half a year. The company has disclosed that its Devin generates ARR on the scale of $73 million (approximately ¥11.3 billion) at Goldman Sachs, Citi, Dell, Cisco, Ramp, Palantir, Nubank, Mercado Libre, and others (as of before the Windsurf acquisition), and it has entered the proof-of-concept stage as an agent that autonomously executes engineers' "SOP-like tasks."

And over the role of "supporting these vertical agent cohorts as an OS," OpenAI and Anthropic have opened a new front. On October 21, 2025, OpenAI launched a standalone browser product called "ChatGPT Atlas," offering an agent mode for Plus, Pro, and Business users. Atlas is designed on the idea that "ChatGPT sits on top of the URL bar, not behind it," and is built so that the agent completes operations taking into account the on-screen context and tab states. On March 23, 2026, Anthropic opened "Claude Cowork"—a productized version of the "computer use" feature that had previously been a research preview—to paid subscriptions, and according to a CNBC article dated March 24, 2026, it became generally available (GA) with Mac/Windows support on April 9. The enterprise edition includes role-based access control, group spending caps, a Zoom MCP connector, and more. With France's Mistral, the U.S.'s OpenAI, and the U.S.'s Anthropic all entering the precision competition for "computer use agents," the likelihood is rising that the "execution layer" of Agentic SOP will rapidly become commoditized.

As for use cases: (1) a SaaS customer success team entrusts agents from Sierra or Decagon with "automatic follow-up of customers at risk of churn" to raise retention rates; (2) a major bank automatically executes the entire KYC procedure through Norm Ai's regulatory agent, minimizing human review; and (3) the IT department of a global manufacturer automatically answers field service inquiries via Cresta. In this way, operations where "people once operated the screen with an SOP in hand" are clearly trending toward being completed entirely within the agent.


How Major Silicon Valley VCs See It — The Tectonic Shift "From Software to Services"

When the four layers from Screen-to-SOP through Agentic SOP are viewed as a single package, what Silicon Valley's top VCs consistently articulate is a macro thesis: the center of gravity of the AI market is shifting from "software itself" to "the work that software accomplishes." At AI Ascent 2026, which Sequoia Capital held in San Francisco on April 20, 2026, Pat Grady, Sonya Huang, and Konstantine Buhler raised the banner of "2026 is AGI." They flatly declared that AI will "execute 100 years of progress in 100 days," and they estimate the market that AI services can take on at $10 trillion (roughly ¥1,550 trillion). That is more than ten times the conventional "software market of roughly $600 billion." The significance of Sequoia positioning agents—as "workers that directly replace human services"—as a long-term theme is profound, and the fact that they led a $75 million (roughly ¥11.6 billion) Series C in RogoAI (a financial agent) and have continued to invest in Sierra's large funding rounds plainly demonstrates their capacity to act on it.

In its Big Ideas 2026 series at the start of 2026, Andreessen Horowitz (a16z) positioned the "Enterprise Orchestration Layer" and the "Agentic Interface" as the most important themes of the year. Their series of memos is more concrete, analyzing that "AI is moving away from chat UIs to become an entity that acts proactively," that "interfaces are being redesigned from human-facing to agent-facing," and that "business traffic is shifting from human speed to 'agent speed,' with thousands of simultaneous API calls spawning from a single goal." a16z's moves—investing in Decagon's Series C, participating in Sierra from the early stages, and continuing to back Cresta—are a textbook investment strategy for vertically locking down the Agentic SOP stack. In the "State of AI: An Empirical 100 Trillion Token Study" that a16z released in April 2026, the fastest-growing form of consumption of API traffic via OpenRouter was shown to be "agentic reasoning" (workloads that run continuously for long stretches from a single instruction), which also corroborates the real-world compute demand of Agentic SOP.

Bessemer Venture Partners, drawing on a decade of running Cloud 100, has published that "the average time for an AI startup to reach $100M ARR has shortened from 7.5 years for conventional cloud companies to 5.7 years." The combined valuation of the entire Cloud 100 surpassed $1 trillion (roughly ¥155 trillion) for the first time as of August 2025. Bessemer has set "Securing AI Agents" as a priority theme for 2026, emphasizing the view that the spread of Agentic SOP simultaneously creates a vast, untapped market for security and governance.

Lightspeed Venture Partners announced new funds totaling more than $9 billion (roughly ¥1.4 trillion) in December 2025, explicitly signaling its intent to direct the bulk of that capital into the AI agent space. They participated in a $45 million (roughly ¥7 billion) Series B for Zocks, an AI assistant for financial advisors, taking a strategy of attacking Agentic SOP as "vertical (industry-specialized) agents."

It is also notable that crossover-style funds such as Tiger Global, Coatue, Index Ventures, and Insight Partners have put money into both the Screen-to-SOP veterans (Tango, Scribe, Bardeen) and the Agentic SOP up-and-comers (Sierra, Decagon). This is evidence that they see a structural tailwind in which both "the volume of SOPs generated in the field" and "the volume of SOPs consumed by agents" expand simultaneously.

A theme frequently cited as one where VCs differ in their enthusiasm is the privacy concerns and labor-relations risks that accompany Passive Capture. While Sequoia's commentary and a16z's podcasts paint an affirmative picture of "a society where always-on recording is the norm," a recurring tone advises the enterprise side that "the Capture Layer should be attached to existing workflows and must not force employees into a separate, new workflow." Indeed, the story of how Limitless shut down its consumer hardware and was absorbed into Meta has been cited repeatedly in VC circles as a cautionary tale that independent startups cannot fully absorb social friction.

Reporting Tone of Major Media Outlets and Variations in Figures

The reporting tone of major media outlets is also clearly divided across the four layers. Screen-to-SOP and Video-to-SOP are treated as "an unglamorous but high-margin field that has entered its practical phase," handled in measured language by TechCrunch, VentureBeat, the tech columns of CTech, and Bloomberg's enterprise section. For example, a Scribe-related article in TechCrunch dated November 10, 2025, characterized it as "Scribe is finally starting to show where AI makes money on the ground," and discussed the reasonableness of its $1.3 billion valuation (approximately ¥201.5 billion) in the context of revenue multiples. Calcalist's coverage of Guidde works in regional context effectively, emphasizing the point that an Israeli-born startup has grown to a $50M scale (approximately ¥7.7 billion) with the key phrase "bridging the gap between AI and employees."

In contrast, Agentic SOP tends to be reported by CNBC, Bloomberg, the WSJ, and TechCrunch as "the cutting edge of the San Francisco AI boom," accompanied by lurid valuation language. Sierra's $950 million raise is depicted within the frame of a battle for supremacy in enterprise AI, as CNBC wrote "Bret Taylor's Sierra raises nearly $1B," TechCrunch wrote "the race to own enterprise AI gets serious," and Bloomberg wrote "triple valuation in 18 months." Regarding Sierra's final valuation, TechCrunch lists $15B, Tech Startups lists $15.8B, and CMSWire lists $15B, so there is slight variation across outlets (likely $15.8B is the post-money figure and $15B is the rounded number). Cognition's $25 billion valuation is at the "talks" stage, and it must be understood as an unconfirmed element, including the fact that Bloomberg added the caveat that "discussions are ongoing and terms may change."

In Japanese-language media, the Nikkei and Toyo Keizai have covered the topic in summary articles such as "U.S. business AI evolves toward automated SOP generation," but there are still few articles dealing with deep context such as Workhelix's Stanford connections, Interloom's European regulatory-compliance context, or Sequoia's "Services-as-a-Software" thesis. Forbes Japan continuously publishes translated articles on the Cloud 100 and Sequoia AI Ascent, providing relatively deep coverage.

Apart from reporting tone, the market-size figures also vary widely by outlet and research firm. Fortune Business Insights forecasts that the agent market will reach a scale of $13.919 billion (approximately ¥2,160 billion) in 2026, up from $914 million, and exceed roughly ¥2 trillion by 2034, indicating a CAGR of 40.5% annually. Meanwhile, other forecasts such as Joget's calculate the 2026 market size at $1.09–1.206 billion, with rapid growth to around $93 billion (approximately ¥14.4 trillion) by 2030, at an annual rate of 44–46%. Gartner forecasts, as its baseline projection, that "40% of enterprise applications will embed task-specific AI agents by the end of 2026, a sharp expansion from less than 5% in 2025," and in its best case sees the market transforming into one worth $1.45 trillion (approximately ¥225 trillion) in 2035. Separately, McKinsey has put out figures stating that 44% of U.S. labor can be performed with current AI agent capabilities, and that $2.9 trillion (approximately ¥450 trillion) in economic value will be created in the United States by 2030; while the order of magnitude differs vastly by outlet, all agree on the directional sense of "enormous."

Winners and Losers from an Investor's Perspective

Surveying the portfolios of Silicon Valley VCs reveals a structure of winners and categories at risk of being left behind. The biggest winner is the "service-completion" players in Agentic SOP—Sierra, Decagon, Cognition, Cresta, and Norm Ai—who, as of May 2026, are climbing the valuation ladder one rung at a time. Their strength lies in being able to sign commitment contracts with client companies that don't merely spit out an SOP as a document but actually "get the work done" all the way through, and as a result they can carve into labor-cost budgets (estimated by Sequoia at roughly six times the size of software budgets).

The next winners are the process intelligence companies that span Screen-to-SOP and Passive Capture. Celonis was positioned as a leader in the Process Intelligence space in Gartner's Magic Quadrant of February 2026, and declared its entry into Agentic SOP infrastructure with the combination of its AgentC suite and Process Copilots. Glean, as a "unified layer behind applications" that implements agent coordination on top of a ClickHouse-style real-time search foundation, has seen its valuation surge.

Drawing attention as challengers are Passive Capture startups such as the Europe-born Interloom, and Workhelix and Synaply, which fight cloaked in a consumer-facing skin. With a strategy of selling "company-specific knowledge graphs" rather than "industry-wide SOP standards," they answer head-on the classic enterprise objection that "we're special." The Q1 2026 news that a large U.S. corporation would adopt Interloom (Yahoo Finance, TheNextWeb) is being read as a sign that this kind of context-graph approach is beginning to gain acceptance among major U.S. players as well.

Conversely, there are also areas expected to struggle in relative terms. First, layers like the conventional DAP Whatfix that merely "display on-screen guidance" are highly likely to be swallowed up before long by "computer-use agents" such as Scribe, Guidde, and even Anthropic Cowork. WalkMe, now under SAP, has also lost its appeal as an independent acquisition candidate. Second, the "always-on recording consumer hardware" epitomized by Limitless faces a sharpening risk that regulation and social friction will outpace technological progress, and it is seen as difficult for independent startups to survive on their own. Third, legacy SaaS-style knowledge management that continues to store SOPs as simple PDFs or HTML is likely to see more occasions where it is excluded from corporate RFPs (requests for proposals) unless it can migrate to machine-readable formats optimized for AI ingestion.

And the most interesting structural shift is the position of the "veterans of software-service BPM (business process management)." The winners of the RPA era—UiPath, Automation Anywhere, and Blue Prism (under SS&C)—risk being pushed into the role of "mere execution layers" in the world of Agentic SOP, so each company is undertaking a major overhaul toward an LLM-native design. Since early 2026, UiPath has been pushing an operational framework it calls "AgentOps," and Automation Anywhere is foregrounding the "Autonomous Enterprise" to the point of becoming the subject of a Stanford Graduate School of Business case study. The debate over the death of RPA has continued since 2025, but it feels as though the matter was largely settled in the first half of 2026.

Anticipated developments from the second half of 2026 onward, and when to watch for what

Concrete milestones to watch over the roughly 12 months from May 2026 onward can be organized along several axes.

First, the "execution layer" of Agentic SOP is likely to move toward IPO readiness. As of May 2026, Sierra has annual ARR of $150 million, a valuation of $15.8 billion, and an ARR multiple of roughly 100x—extremely high—but if its ARR surpasses $500 million by the same time next year, an IPO in the second to third quarter of 2027 becomes a realistic option. Similarly, if Decagon can project ARR exceeding $100 million in the fourth quarter of 2026, a Series E within 2027—or a liquidity event centered on secondaries—comes into view. Cresta, Glean, and Cognition are also continuously named as IPO candidates across multiple media outlets.

Second, the focal point of the second half of 2026 will be whether the full-scale enterprise adoption of Anthropic Cowork and the growth of OpenAI ChatGPT Atlas squeeze the viability of independent Agentic SOP companies. Players with a thick layer that "places enterprise-grade guardrails and operational knowledge on top of foundation models"—like Decagon's AOP—will hold up, but mid-tier firms delivering value through thin wrappers are highly likely to be weeded out. Bessemer predicts that "from Q4 2026 through Q2 2027, a wave of concentrated acquisitions will occur in the agent security and governance space."

Third, the regulatory environment for Passive Capture will tighten further through the combination of the EU AI Act and EU GDPR, and there is a possibility that European-based firms such as Interloom will begin to penetrate large U.S. players as champions of "regulation-friendly Passive Capture." Conversely, approaches like the "task-level decomposition and adoption roadmap" being advanced by Workhelix are increasingly likely to win support from CIOs and CHROs from a corporate governance standpoint, and a trend toward their incorporation into the standard toolkits of consulting firms such as Accenture, Deloitte, and KPMG is beginning to emerge. Guidde's explicit declaration of implementation partnerships with KPMG and Deloitte in this funding round can be seen as a leading example of this.

Fourth, how expansion plays out in the Japanese market is also a point to observe. As of May 2026, neither Sierra nor Decagon has officially announced the establishment of a Japanese subsidiary, but given the facts that World Innovation Lab (WiL) has made a large investment in Cresta and that Qualcomm Ventures has invested in Guidde, reseller networks reaching large Japanese corporations are gradually being built out. As a challenge unique to Japan, paper-based SOPs (operational procedure manuals) are extremely thick and frequently include "hanko and seal-stamping" processes, so there is a risk that Screen-to-SOP or Passive Capture will have a low fit rate if imported as-is. Rather, a niche of converting paper SOPs into AOP (Agent Operating Procedure) format via OCR may emerge in the Japanese market. Testimony from industry sources that NRI, Fujitsu, and NEC are already advancing internal R&D in this direction has been heard at multiple industry conferences, and whether a homegrown Japanese player emerges in the second half of 2026 is an intriguing question.

Fifth is the trajectory of the market size itself. According to multiple research reports, the Agentic AI market is seen as starting at roughly $10 billion (about ¥1.5 trillion) in 2026 and expanding to a scale of $90 billion–$140 billion (about ¥14 trillion–¥22 trillion) by the early 2030s. Against this, Sequoia's "Services-as-a-Software" estimate ($10 trillion = about ¥1,550 trillion) is far too large; this figure takes the upper bound of "the entire labor expenditure to be captured," and what agent companies can actually charge for is only a fraction of it. Even so, it is all but certain to be several times the size of the software market.

Ultimately, the four layers—Screen-to-SOP, Video-to-SOP, Passive Capture, and Agentic SOP—are heading toward integration not as separate markets but as a single pipeline that "converts the tacit knowledge in the heads of experts into a machine-readable format and has agents execute it." Silicon Valley VCs are investing heavily in each layer of this pipeline, and on the enterprise side, the needs to "minimize knowledge drain from veteran retirements," "fill labor shortages," and "cut compliance man-hours" mesh together, driving rapid integration of the stack. The second half of 2026 looks set to be a decisive 12 months in which the leading companies of each layer advance to their next step (IPO, acquisition, or new service launch).


Sources