A thorough explanation of "Agentic RAG," the mainstream approach to RAG

From 2025 to 2026, the meaning of "RAG (Retrieval-Augmented Generation)" was quietly yet decisively rewritten within Silicon Valley's AI infrastructure landscape. The classic RAG approach proposed in 2020 by Meta's Douwe Kiela and colleagues — a single-pass pipeline that vectorizes a query, retrieves the top-k chunks, and feeds them directly into an LLM — had hit a serious wall in enterprise deployments by 2024. [McKinsey](https://newsify.tv/services/mckinsey)'s *State of AI Trust in 2026*, published in January 2026, reported that 58% of enterprise AI teams that had deployed RAG in production in 2024 cited "multi-step reasoning across heterogeneous data sources" as their primary constraint within the first six months. The approach that went mainstream in 2025 to fill this gap was "Agentic RAG," which incorporates autonomous agents into the retrieval loop. Leading VCs — including [Andreessen Horowitz](https://newsify.tv/investors/andreessen-horowitz), [Sequoia Capital](https://newsify.tv/investors/sequoia-capital), [Bessemer Venture Partners](https://newsify.tv/investors/bessemer-venture-partners), [Benchmark](https://newsify.tv/investors/benchmark), [Accel](https://newsify.tv/investors/accel), [Greylock](https://newsify.tv/investors/greylock-partners), [Kleiner Perkins](https://newsify.tv/investors/kleiner-perkins), [Coatue](https://newsify.tv/investors/coatue), and [Index Ventures](https://newsify.tv/investors/index-ventures) — collectively poured tens of billions of dollars over the past year into companies such as [LangChain](https://newsify.tv/services/langchain), [LlamaIndex](https://newsify.tv/services/llamaindex), [Pinecone](https://newsify.tv/services/pinecone), Contextual AI, [Cohere](https://newsify.tv/services/cohere), Hebbia, Harvey, Sierra, Decagon, and [Glean](https://newsify.tv/services/glean). This article offers an integrated, end-to-end account from a Silicon Valley engineer's perspective — covering the Agentic RAG ecosystem, the internal architectures of individual services and products, the VC reception, and the trends expected to emerge in the second half of 2026 through 2027. ---

Chapter 1: Why RAG Had No Choice but to Become "Agentic" Now

The 2020 paper *Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks*, posted to arXiv by Douwe Kiela et al. during their time at Facebook AI Research, is the origin of the term RAG as it is used today. The procedure introduced in that paper later came to be called "Naive RAG" — a simple pipeline in which a query is converted into an embedding, the k nearest-neighbor chunks are retrieved from a vector space, and those chunks are inserted into the LLM as context. Following the arrival of ChatGPT at the end of 2022, Naive RAG spread at Bugatti-like speed, but between 2023 and 2024 three walls became apparent at enterprise implementation sites.

First, a design that attempts to represent all queries within a single embedding space is structurally fragile against long queries with mixed intent and against questions requiring multi-step reasoning. Second, while vector search excels at returning documents that are "semantically similar," it cannot guarantee documents that are "factually correct with respect to the query." Third, enterprise data is distributed not only across unstructured documents but also across SQL tables, knowledge graphs, APIs, and real-time events, making the approach of cramming everything into a single vector DB inherently impractical.

Between 2023 and 2024, academia saw a succession of papers aimed at breaking through these limitations. Self-RAG by Asai et al. (arXiv:2310.11511, October 2023) introduced a mechanism called reflection tokens, by which the LLM itself judges "should I retrieve?" and "is what I retrieved valid?", opening the door to having the model dynamically decide whether to search. CRAG (Corrective RAG, arXiv:2401.15884) published by Yan et al. in January 2024 presented a design in which a lightweight retrieval evaluator returns a confidence score and, when confidence is low, triggers web search as an external extension. Adaptive-RAG (arXiv:2403.14403, accepted at NAACL 2024) published by Jeong et al. in March 2024 proposed a three-tier approach that places a small LM at the front to classify query complexity, routing simple queries to no retrieval, moderate queries to single retrieval, and complex queries to iterative retrieval. These papers are built on top of the foundational design of the ReAct pattern (Yao et al., arXiv:2210.03629, ICLR 2023) — "interleaving reasoning and action" — and by early 2025 had formed a de facto industry standard.

This trajectory was systematized in January 2025 by Singh, Ehtesham, Kumar, Khoei, Vasilakos et al. in *A Survey of Agentic Retrieval-Augmented Generation for Large Language Models* (arXiv:2501.09136), which organized the evolution of RAG into four generations: "Naive RAG → Advanced RAG → Modular RAG → Agentic RAG." The paper defines Agentic RAG as follows: "Agentic RAG dynamically manages retrieval strategies and iteratively refines contextual understanding by embedding design patterns such as reflection, planning, tool use, and multi-agent coordination."

To summarize this shift from a Silicon Valley engineer's perspective: Naive RAG is a "function call," while Agentic RAG is a "process execution." The former is a pipeline that flows unidirectionally from query to answer, while the latter has state and loops, performing retries, routing, and tool calls as needed — much closer in nature to an OS process. Anthropic's engineering blog *How we built our multi-agent research system*, published in June 2025, reported that an orchestrator-worker configuration with Claude Opus 4 as the leader and Claude Sonnet 4 as subagents outperformed a single-agent Claude Opus 4 by 90.2% on internal evaluations, quantitatively demonstrating the superiority of multi-agent configurations. At the same time, the blog candidly disclosed that "multi-agent systems consume approximately 15 times more tokens than standard chat" and that "token usage explains 80% of the performance variance in browsing evaluations" — making clear that this performance comes only at the cost of higher expenditure.

Market data corroborates this shift. Gartner projected in an August 26, 2025 press release that the share of applications equipped with task-specific AI agents — less than 5% in 2025 — would jump to 40% by the end of 2026. It also forecast that by 2028, 33% of enterprise software would incorporate agentic AI and 70% of AI applications would use a multi-agent approach. According to a MarketsandMarkets report, the RAG market alone is set to grow from $1.94 billion in 2025 to $9.86 billion in 2030 at a CAGR of 38.4%, while Grand View Research estimates it will reach $11.0 billion in 2030 at a CAGR of 49.1%. MarketsandMarkets projects the vector DB market will grow from $2.65 billion in 2025 to $8.95 billion in 2030 at a CAGR of 27.5%, and Fortune Business Insights forecasts the overall agentic AI market will explode from $9.14 billion in 2026 to $139.19 billion in 2034 at a CAGR of 40.5%. These figures have elevated Agentic RAG from "an experiment at a handful of AI labs" to "a multi-trillion-yen infrastructure market."

Chapter 2: Core Technologies and Design Patterns of Agentic RAG

If an engineer were to define Agentic RAG in a single sentence, it would be: "a system that has a loop calling retrieval as a tool, updating state and decision-making at each iteration." Embedded within this definition are at least six independent technical components.

The first component is query routing, query rewriting, query decomposition, and HyDE (Hypothetical Document Embedding). Routing refers to the process by which an agent receiving a user's query determines whether to send it to a vector DB, convert it into a SQL query, forward it to a web search, or translate it into a Cypher query for a knowledge graph. Rewriting is the process of converting a user's colloquial expression into a form optimized for retrieval, and small-scale LLMs are often used for this purpose. Decomposition is the process of splitting compound questions—such as "How does B differ from A, and how does C relate?"—into multiple sub-queries. HyDE is a method proposed by Gao et al. (2022) that has an LLM generate a hypothetical answer to a user query, then embeds that hypothetical answer text for retrieval, with the aim of bridging the style gap between queries and documents.

The second component is hybrid retrieval. Between 2025 and 2026, industry consensus converged on a three-layer architecture of "dense (dense vectors) + sparse (BM25/SPLADE) + reranking." Dense captures semantic similarity while sparse captures lexical matching, each compensating for the other's weaknesses. NVIDIA's RAG Blueprint reference implementation, published on GitHub, provides dense+sparse hybrid retrieval, multi-collection retrieval, and GPU-accelerated indexing/querying with pluggable switching between ElasticSearch and Milvus, standardizing fusion via Reciprocal Rank Fusion (RRF). Anthropic's *Contextual Retrieval*, published in September 2024, demonstrated striking results—by adding a preprocessing step that prepends a contextual explanation generated by Claude before embedding each chunk, it reduced the top-20 retrieval failure rate by 35% compared to simple embedding alone, 49% when combined with BM25, and 67% when combined with reranking—rewriting the implementation standard for RAG indexing preprocessing in one stroke.

The third component is reranking. This is the process of applying a more costly cross-encoder to the candidate set obtained from dense and sparse retrieval to reorder results. Cohere's Rerank 3, released in September 2024, features a 4K context length, and the company states that "Rerank passes only the most relevant documents to RAG pipelines and agent workflows, reducing token usage, minimizing latency, and improving accuracy." The BGE Reranker v2 family (v2-m3, v2-gemma, v2-minicpm-layerwise) published by BAAI in March 2024 has established itself as the OSS camp's standard. ColBERT/ColBERTv2, proposed by Omar Khattab et al., takes a different approach called "late interaction," achieving BERT-level quality with two orders of magnitude fewer FLOPs and two orders of magnitude higher throughput, with ColBERTv2 reducing storage footprint by 6–10x.

The fourth component is self-reflection and self-correction. Self-RAG's reflection tokens and CRAG's confidence scoring fall within this lineage. On the implementation side, the conditional edge pattern provided by LangGraph is a canonical example, where nodes are inserted to determine "whether retrieval results are sufficient" and "whether generated results are factually consistent," dynamically driving the retrieval loop. Multiple implementation benchmarks report that LangGraph-based Adaptive RAG, by combining selective routing and validation, can reduce hallucinations by up to 78% compared to static RAG.

The fifth component is GraphRAG (GraphRAG). GraphRAG, published by Edge et al. from Microsoft Research in arXiv:2404.16130 with v2 released in February 2025, performs a two-stage indexing process: using an LLM to build an entity knowledge graph from a document corpus, then pre-computing community detection and community summaries. This yields results that significantly outperform pure vector RAG in both comprehensiveness and diversity for "global sensemaking" queries (overall trend and exhaustive summarization) over corpora on the order of one million tokens. Furthermore, LazyGraphRAG, announced in November 2024, achieved parity in answer quality while reducing indexing costs to the same level as vector RAG (0.1% of full GraphRAG) and query-time costs to 1/700th of GraphRAG Global Search—simultaneously outperforming GraphRAG on both global and local query types at 4% of its query cost, realizing both cost efficiency and quality. LazyGraphRAG has been integrated into Microsoft Discovery and Azure Local during 2025, already deployed at the product level.

The sixth component is long-term memory and state management. Harrison Chase's essay *Your harness, your memory*, published in April 2026, explicitly articulated the separation of an agent's short-term memory (in-conversation messages and large tool call results) from long-term memory (cross-session recollection), and the idea of placing memory management on the agent harness side. In LangChain Deep Agents, he implemented "filesystem-based working memory," "progressive skill disclosure," and "layered security," and strongly advocates for open-sourcing the memory layer, stating that "choosing closed-source means losing control of your own data." This stands in sharp ideological contrast with OpenAI's Responses API (announced March 2025), which pursues a strategy of internalizing hosted tool primitives (file_search, web_search, function calling)—a philosophical divergence that has formed a central point of debate in the agentic AI discourse of 2026.

The implementation cost of integrating these six components is by no means trivial in 2026 production environments. According to a production guide published in early 2026 by independent consultant Jahanzaib, the standard per-query cost for retrieval is $0.06–$0.09, but for complex multi-hop Agentic RAG it can jump to $0.18–$0.31. For mid-scale deployments, the going rate is $50–$200/month for vector storage and $2,200–$3,400/month overall, with a target of keeping P95 latency within 2.5 seconds. Adding retrieval validation increases per-query latency by 2–3 seconds but improves factual consistency. Using semantic caching together yields 20–35% cost reduction for high-repetition workloads; combining it with intelligent routing reduces costs for mixed workloads by 30–45% and latency by 25–40%—such is the prevailing field estimate.

Chapter 3: Framework/Orchestration Layer — LangChain, LlamaIndex, CrewAI, and Microsoft

The framework competition for implementing the application layer of Agentic RAG reached one of its decisive points in autumn 2025. LangChain raised $125 million in a Series B led by IVP on October 20–21, 2025, achieving unicorn status with a post-money valuation of $1.25 billion. Participating investors included CapitalG, Sapphire Ventures, and strategic investors ServiceNow Ventures, Workday Ventures, Cisco Investments, Datadog Ventures, and Databricks Ventures, with existing investors Sequoia, Benchmark, and Amplify making follow-on investments. As Harrison Chase noted on Sequoia Capital's podcast *Context Engineering Our Way to Long-Horizon Agents*, "The optimal implementation has changed dramatically over three years — from simple RAG chains, through complex flows with LangGraph, and now to agent harnesses." In response to this shift, LangChain organized its product portfolio into three layers — LangChain (foundational framework), LangGraph (graph-based orchestration), and LangSmith (observability and evaluation) — and shipped LangChain + LangGraph 1.0 simultaneously with the Series B. In comments to Fortune, LangChain's PR team noted that their actual ARR is "lower than the reported range of $12–16 million, a figure that doesn't reflect where we are right now," suggesting that their actual growth trajectory has accelerated to a level consistent with the $1.25 billion valuation IVP priced. The roster of existing customers — Cisco, Replit, Clay, Cloudflare, Workday, and ServiceNow — is evidence that paid tiers for observability and evaluation are functioning as hooks for enterprise procurement.

LlamaIndex raised $19 million in a Series A led by Norwest Venture Partners on March 4, 2025 (with Greylock participating, bringing total funding to $27.5 million), simultaneously GA-releasing LlamaCloud. Founder Jerry Liu declared in a CEO blog post that "LlamaIndex is no longer just a RAG framework — it has become the foundation for Agentic Document Processing," narrowing the business scope from a catch-all abstraction library to "the best long-lasting document infrastructure." Indeed, LlamaCloud incorporates features such as LlamaParse (document parsing) and AgentWorkflow (multi-agent knowledge work over unstructured data), with over 10,000 organizations — including more than 90 Fortune 500 companies such as Salesforce, KPMG, Carlyle, and Rakuten — on the waitlist. Its 47K GitHub stars and 5.2 million monthly downloads (as of March 2026) demonstrate that a document-specialized positioning has enabled a viable coexistence with LangChain, the dominant framework. In an early 2026 newsletter, Jerry Liu observed that "coding agents (Claude Code, Cursor) are converging around the file system, with agents operating on just 5–10 core tools plus file system access" — suggesting a direction toward lightweight toolset designs like Agent Composer.

CrewAI, which went all-in on multi-agent role division, raised $18 million in a round led by Insight Partners and boldstart ventures in October 2024. As of July 2025, Latka confirmed an ARR of $3.2 million, though the platform has surpassed 2 billion agent executions over the past 12 months, with over 60% of the Fortune 500 reported as customers. The product architecture centers on role-based agents with goals and tasks — a framework that assigns roles like "marketing researcher," "analyst," and "writer" in a manner analogous to a human organization, enabling collaborative work. This approach is more abstract than LangGraph and requires less implementation code. CrewAI has GA'd its enterprise cloud offering, emphasizing cross-cloud deployment across AWS, GCP, and Azure, and LLM vendor independence.

Haystack, the OSS project led by Germany's deepset, last raised in a major round in August 2023 — a $30 million Series B led by Balderton Capital — and rebranded its product lineup as the Haystack Enterprise Platform in December 2025. Equipped with a visual pipeline editor, templates, and on-premises deployment, it maintains an enterprise customer base with a distinctly European flavor, including Airbus, The Economist, NVIDIA, and Comcast. It has established itself as the go-to option for satisfying on-premises requirements in the manufacturing and defense sectors.

Microsoft merged AutoGen and Semantic Kernel and released Microsoft Agent Framework 1.0 to production in April 2026 (public preview in October 2025, general availability in April of the following year). The framework grafts AutoGen's agent abstractions onto Semantic Kernel's enterprise capabilities (session state, type safety, middleware, telemetry), and includes graph-based workflow orchestration, multi-provider model support, and cross-runtime interoperability for A2A and MCP. AutoGen has since been placed into maintenance-only mode, with new users directed toward Agent Framework. Microsoft's consolidation has effectively made this the default harness for Azure-reliant enterprises, putting it in direct competition with LangChain's business model.

The OSS workflow builder space saw a wave of major consolidation in 2025. China-based Dify raised a $30 million pre-Series A led by HSG (Hillhouse), with over 131,000 GitHub stars and 280 enterprise customers including Maersk and Novartis, standing as the last major independent player. Meanwhile, Flowise was acquired by Workday in August 2025, and Langflow was acquired earlier that year by DataStax (now under IBM). The dynamic that a16z and Sequoia have repeatedly flagged — in which category winners become exposed to acquisition risk — has played out in textbook fashion at the workflow builder layer.

Chapter 4　Vector DB / Search Infrastructure Layer ―― Pinecone, Weaviate, Qdrant, Chroma, Turbopuffer

For Agentic RAG to run in earnest, significant evolution was also required behind the scenes of vector search. The movement in this space from 2025 to 2026 is one of the largest infrastructure investment themes in Silicon Valley.

Pinecone, as the de facto category king, was at its peak when it raised $100 million (approximately ¥15 billion) at a valuation of $750 million (approximately ¥112.5 billion) in an a16z-led round in April 2023, but faced significant pressure in 2025 from the rise of open-source alternatives and hyperscalers. Founder Edo Liberty stepped down as CEO to Chief Scientist in September 2025, announcing he would focus on "research toward next-generation agentic AI systems." The new CEO is Ash Ashutosh, a former Google global sales director and executive at AppDynamics and Actifio. Liberty's favored phrase is "over the next five years, vector DBs will transition from a technical tool to long-term enterprise memory," signaling a new positioning as the long-term memory layer for agents. In October 2025, The Information reported that Pinecone was working with bankers to explore a sale, with an expected price exceeding $2 billion (approximately ¥300 billion), and Oracle, IBM, MongoDB, and Snowflake were named as potential acquirers. According to Latka data, Pinecone's revenue as of December 2025 was $14 million (approximately ¥2.1 billion) with roughly 4,000 customers, and its ARR multiple relative to its Series C valuation is clearly struggling. On the product side, the company has rolled out Pinecone Assistant, an agent-building API, while advancing a shift to serverless architecture and launching an MCP server.

Weaviate has not announced any official additional fundraising as of April 2026 since its $50 million (approximately ¥7.5 billion) Series B led by Index Ventures in April 2023 (with participation from Battery Ventures, NEA, Cortical, Zetta, and ING Ventures), but has accelerated on the product side in the interim. In March 2025, it conducted a full-stack launch of Weaviate Agents, and in September 2025 brought Query Agent to GA. The internal architecture of Query Agent is notable — upon receiving a user query, it performs multi-collection routing, query expansion and decomposition, filter generation, and re-ranking all within a single tool call. This is a strategic abstraction of "delivering Agentic RAG via a single API," and a design that eliminates the need for external orchestrators like LangGraph. In February 2026, the company published an OSS repository called Weaviate Agent Skills, enabling coding agents such as Claude Code, Cursor, GitHub Copilot, VS Code, and Gemini CLI to generate Weaviate-optimized code. CEO Bob van Luijt has stated, "The rise of vector DBs, vector embedding services, and agentic architectures is an inflection point in the evolution of data management and data transformation," and in January 2026 the company was recognized as "Leader + Outperformer" by GigaOm and "Emerging Leader" by Gartner.

Berlin-based Qdrant raised $50 million (approximately ¥7.5 billion) in a Series B led by AVP (Advance Venture Partners) on March 12, 2026, with participation from Bosch Ventures, Unusual Ventures, Spark Capital, and 42CAP, bringing total funding to approximately $87.8 million (approximately ¥13.17 billion). The company counts Tripadvisor, HubSpot, OpenTable, Bazaarvoice, and Bosch among its implementation customers, and boasts over 250 million downloads and 29,000 GitHub stars. Its positioning as "composable vector search" — fast due to its Rust implementation and able to run in cloud, on-premises, or hybrid environments — makes it a good fit for conservative European enterprises.

Chroma raised $18 million (approximately ¥2.7 billion) at a valuation of approximately $75 million (approximately ¥11.25 billion) in a Series B on October 14, 2025, led by Astasia Myers of Quiet Capital. Its design — providing vector, full-text, regex, and metadata search through a single developer UX — and its lightweight, local-first embeddability have earned it support among indie RAG developers and agentic app prototypers.

Milvus/Zilliz has not had a major fundraising round since its Series B-II of $60 million (approximately ¥9 billion) in August 2022, yet on the product side it continues to maintain the largest OSS mindshare, having surpassed 40,000 GitHub stars and with over 10,000 production deployments at companies including NVIDIA, Salesforce, eBay, Airbnb, and DoorDash. Its influence is particularly strong in Asian markets.

Among emerging players, Turbopuffer is drawing attention. The company is an Ottawa-based startup co-founded by ex-Shopify engineers Simon Herap Eskildsen and Justin Lee, and in December 19, 2025 raised an undisclosed seed round from Lachy Groom, Thrive Capital, and multiple angels. The company counts Anthropic, Atlassian, Cursor, and Notion as customers, and offers a serverless vector DB that treats object storage such as S3, GCS, and Azure Blob as first-class citizens. It uses a centroid-optimized SPFresh index, delivering unmatched cost per GB at massive scale. In the Silicon Valley context, the very fact that "leading-edge AI infrastructure consumers like Anthropic, Cursor, and Notion have adopted it" functions as a benchmark in itself. LanceDB also raised $30 million (approximately ¥4.5 billion) in a Series A led by Theory Ventures on June 24, 2025, establishing its position as an "AI-native multimodal lakehouse" on top of the Lance columnar format.

Surveying this space from an engineering perspective, vector DBs are bifurcating into three subcategories. The first is managed serverless (Pinecone), which is exploring a sale. The second is OSS-first plus managed cloud (Weaviate, Qdrant, Chroma, Milvus). The third is object-storage-driven (Turbopuffer, LanceDB). Pinecone's exploration of a sale is being read as a signal that the entire category has entered a phase of being "squeezed between hyperscalers and open source."

Chapter 5: End-to-End / Contextual Platforms — Contextual AI, Cohere, Vectara

The approach of delivering Agentic RAG not as a "collection of individual components" but as a "single, integrally designed system" is most sharply embodied by Contextual AI. Founder Douwe Kiela is the lead author of Meta's 2020 RAG paper and also serves as a Stanford adjunct lecturer. Following a $20 million seed round in June 2023, Contextual AI closed an $80 million Series A in August 2024 led by Greycroft, Bain Capital Ventures, Lightspeed, and Lip-Bu Tan, bringing total funding to $100 million. The Contextual AI Platform, which reached GA in January 2025, is built on a design philosophy called RAG 2.0 — rather than training a retriever, re-ranker, and generation model (Grounded Language Model) separately and then connecting them, it applies end-to-end joint learning that optimizes all three together. VentureBeat reported that the Contextual GLM achieved 88% factual consistency on the FACTS benchmark, surpassing Claude 3.5 Sonnet (79.4%) and GPT-4o (78.8%). On the latest state-of-the-art RAG benchmark, it recorded 71.2%, outperforming the strongest baseline (Cohere + Claude 3.5) at 66.8%. The company has landed Fortune 500 customers including HSBC and Qualcomm, and in January 2026 launched Agent Composer, which converts enterprise RAG into production agents. However, according to CB Insights, annual revenue remains around £17.6 million, and the valuation is still estimated at approximately $150 million. This gap — "technically far ahead, but ARR hasn't caught up" — defines Contextual AI's current position, and the level of its next funding round is drawing attention.

Cohere is a Toronto-based Canadian company whose CEO is Aidan Gomez, a co-author of "Attention Is All You Need," and it has grown rapidly as a standard-bearer for enterprise LLMs. In August 2025, a $500 million raise led by Radical Ventures and Inovia Capital brought its valuation to $6.8 billion, and a $100 million second close in September pushed it to $7 billion, with participation from AMD, NVIDIA, Salesforce Ventures, and HOOPP. Total funding now exceeds $1.6 billion. Its 2025 ARR of $240 million exceeded the initial target of $200 million, the company has brought on IPO-seasoned François Chadwick as CFO, and it is widely considered a top IPO candidate for 2026. Its flagship products include Command A — a 111B-parameter, 256K-context model launched in March 2025 with 150% throughput improvement over Command R+ 08-2024, running on two A100/H100 GPUs — Command A Reasoning (August 2025) for enterprise inference, Rerank 3.5 capable of re-ranking in over 100 languages with a 4K context, and "North," an enterprise-grade agentic foundation. Cohere acquired Ottogrid (formerly Cognosys) in May 2025 and integrated research-agent capabilities into North. Its three primary differentiators are "sovereign AI," "multilingual support," and "VPC/on-premises deployment" — a configuration that resonates with heavily regulated financial and public-sector clients, as well as European and Japanese enterprises requiring multilingual support.

Vectara raised $28.5 million in seed funding and $25 million in a Series A in July 2024 led by FPV Ventures and Race Capital, with participation from Samsung Next and Fusion Fund, bringing total funding to approximately $53.5–73.5 million. It is a vertically integrated RAG API featuring Mockingbird, an LLM purpose-built for hallucination reduction, with a focus on adoption in regulated industries. No new funding has been announced since 2025, and it maintains its position as a compact, specialized implementation platform.

Chapter 6: Vertically Specialized Agentic RAG — Harvey, Hebbia, Sierra, Decagon, Glean

In parallel with the horizontal platform layer, "Vertical Agentic RAG" specialized in specific domains attracted the largest capital flows from 2025 to 2026. Tracing each company's business model and fundraising history reveals clearly where VCs are placing their emphasis.

Legal-focused Harvey, following its 2024 Series D, raised $300 million (approx. ¥45 billion) at a $5 billion (approx. ¥750 billion) valuation in June 2025 via a Series E co-led by Kleiner Perkins and Coatue, then $160 million (approx. ¥24 billion) at an $8 billion (approx. ¥1.2 trillion) valuation in a16z-led round in December 2025, and $200 million (approx. ¥30 billion) at an $11 billion (approx. ¥1.65 trillion) valuation on March 25, 2026 in a round co-led by GIC and Sequoia (Sequoia's third participation). Total funding has surpassed $1 billion (approx. ¥150 billion). Harvey's ARR as of January 2026 reached $190 million (approx. ¥28.5 billion), nearly doubling from $100 million (approx. ¥15 billion) in August 2025 in just six months. Its customers span more than half of the AmLaw 100, over 500 in-house legal teams, 50 asset management firms, and 60 countries. Sequoia's participation in the March 2026 round reaffirmed its conviction in the "defensibility of vertical use cases in Agentic AI," making Harvey the most important proxy for vertical Agentic RAG as of 2026.

Hebbia, which positions itself as focused on deep work in finance and law, reached a $700 million (approx. ¥105 billion) valuation in an April–July 2024 Series B of $130 million (approx. ¥19.5 billion) led by a16z with participation from Index Ventures, GV, Peter Thiel, Eric Schmidt, and Jerry Yang. Its ARR at the time was approximately $13 million (approx. ¥1.95 billion), implying a 54x multiple. Its customer base covers one-third of the world's leading asset managers, with cumulative AUM of approximately $14 trillion (approx. ¥2,100 trillion). Its flagship product Matrix uses a spreadsheet-grid UI natively designed to express multi-agent retrieve-ground-verify workflows; it was redesigned in June 2025, and in the same month Hebbia acquired FlashDocs to embed slide generation capabilities (currently generating over 10,000 slides per day). Hebbia is the company that most faithfully embodies a16z's thesis of "replacing deep work with AI agents."

Sierra, founded by Bret Taylor (OpenAI chairman, former Salesforce co-CEO) and Clay Bavor (formerly Alphabet), followed its October 2024 raise of $175 million (approx. ¥26.25 billion) at a $450 million valuation with a $350 million (approx. ¥52.5 billion) raise at a $10 billion (approx. ¥1.5 trillion) valuation led by Greenoaks Capital on September 4, 2025, bringing total funding to $635 million (approx. ¥95.25 billion). ARR reached $100 million (approx. ¥15 billion) in November 2025—a record for the time from founding—and the company reached a $150 million (approx. ¥22.5 billion) annual revenue run rate in its third year. Customers include hundreds of large enterprises, with more than 20% generating over $10 billion in revenue. While Sierra is anchored in customer service, its strength lies in a platform for brands that can delegate operational authority to agents at the level of actually "taking action"—such as processing refunds and issuing tickets—with Bret Taylor's brand trust forming an outstanding differentiating factor.

Decagon raised $131 million (approx. ¥19.65 billion) at a $1.5 billion (approx. ¥225 billion) valuation in June 2025 in a round co-led by Accel and a16z, then just seven months later on January 28, 2026 raised $250 million (approx. ¥37.5 billion) at a $4.5 billion (approx. ¥675 billion) valuation co-led by Coatue and Index Ventures. Its valuation tripled in just over six months, and its strategy of repositioning itself as a high-touch customer service AI under the "AI Concierge" label has proven effective. It secured over 100 new enterprise contracts during 2025, and the Series D round included Bain Capital Ventures, BOND, Ribbit, Forerunner, Avra, A*, ChemistryVC, Definition Capital, and Starwood Capital.

Glean, which extended enterprise search into Agentic AI, followed its September 2024 Series E of $260 million (approx. ¥39 billion) at a $4.6 billion (approx. ¥690 billion) valuation with a Series F of $150 million (approx. ¥22.5 billion) at a $7.2 billion (approx. ¥1.08 trillion) valuation led by Wellington Management on June 10, 2025. Khosla Ventures, Bicycle Capital, Geodesic Capital, and Archerman joined as new investors, while existing investors Altimeter, Capital One Ventures, Citi, Coatue, DST, General Catalyst, ICONIQ, IVP, Kleiner Perkins, Latitude, Lightspeed, Sapphire, and Sequoia made additional investments. Glean bundles permission-aware enterprise SaaS connectors, Glean Agents, and a work assistant, and has become a frequently cited example at the heart of Silicon Valley as a leading case of what Reid Hoffman calls "agent amplification."

Also on the vertical side, Writer.com raised $200 million (approx. ¥30 billion) at a $1.9 billion (approx. ¥285 billion) valuation on November 12, 2024, in a round co-led by Premji Invest, Radical Ventures, and ICONIQ Growth, with participation from Adobe Ventures, B Capital, Citi Ventures, IBM Ventures, Salesforce Ventures, and Workday Ventures. The company combines its proprietary Palmyra LLM family with graph-based RAG, guardrails, and a no-code agent builder to offer vertical packages for healthcare, retail, and financial services.

Sana AI, an enterprise LMS/knowledge management platform, was acquired by Workday for $1.1 billion (approx. ¥165 billion) on September 16, 2025, marking one of the largest agentic acquisitions in this space in 2025. Similarly, workflow builder Flowise was absorbed by Workday, and Langflow by DataStax (now IBM), as vertical SaaS vendors rapidly move to embed Agentic RAG within their enterprise offerings.

Chapter 7: MCP and Ecosystem Standardization — Why Anthropic Donated MCP to the Linux Foundation

The biggest architectural event of 2025 in the Agentic RAG discussion is the industry standardization of the Model Context Protocol (MCP). When Anthropic published MCP in November 2024, it was merely one company's open specification, but in March 2025 OpenAI announced its adoption in the Agents SDK, Responses API, and ChatGPT desktop, and Google DeepMind followed suit in April of the same year. Monthly downloads of the Python/TypeScript SDKs reached 97 million as of March 2026, and publicly available MCP servers exceeded 10,000. On November 25, 2025, a major spec update was released covering async operations, statelessness, server identification, and community-driven registries, and in December 2025, Anthropic donated MCP to the Agentic AI Foundation (AAIF) under the Linux Foundation. AAIF was co-founded by Anthropic, Block, and OpenAI, with support from Google, Microsoft, AWS, Cloudflare, and Bloomberg.

With MCP standardized, the competitive axis in Agentic RAG has shifted from "which vector DB to use" to "which harness/control layer to use." Pinecone provides an MCP server, Weaviate Agent Skills converse with coding agents over MCP, Anthropic itself has made thousands of data sources accessible through MCP, and both LangChain and Microsoft Agent Framework claim to be MCP-native. As a result, the connector value on the data-source side is rapidly commoditizing, and value has shifted toward orchestration and data governance. Forrester predicts that 30% of enterprise application vendors will publish MCP servers by the end of 2026.

Taking a competitive stance against this is OpenAI's Responses API, which adopts a strategy of keeping hosted tool primitives (file_search, web_search, function calling) within OpenAI's own ecosystem. The Responses API's file_search costs $2.50 per 1,000 queries, and storage costs $0.10 per GB per day, with newly created vector stores supporting up to 100 million files (expanded from 10,000 to 100 million in the November 2025 update). The Assistants API will be retired on August 26, 2026, with its functionality merged into the Responses API. The choice engineers face is clear: a fork in the road over infrastructure sovereignty — "ride a closed-in ecosystem like OpenAI/Microsoft" or "maintain your own control layer with MCP + LangChain/LangGraph/Agent Framework." Harrison Chase's statement in *Your harness, your memory* — "choose closed source and you lose control of your own data" — was a message directed precisely at this fork.

Chapter 8: Silicon Valley VC Capital Allocation and Media Narrative

In total VC investment for 2025, AI-related fundraising surged 75% from $114 billion in 2024 to over $202 billion, accounting for approximately half of all VC investment. The twin leaders in unicorn-level deals were Sequoia (51 deals / 21 unicorns) and a16z (50 deals / 20), followed by Greylock, Benchmark, Kleiner Perkins, IVP, Index Ventures, Accel, Lightspeed, Coatue, Insight Partners, and General Catalyst, all deeply involved in the Agentic RAG stack. a16z has publicly stated that it assembled over $15 billion in cumulative funds across 2025–26 and single-handedly moved 18% of U.S. VC investment. Sequoia, alongside SoftBank and Google Ventures, announced a new $7 billion fund in April 2026, declaring its intention to expand AI investment.

Reading the thought-leadership documents each VC has published, their perspective on Agentic RAG comes into sharp focus. In its January 2026 *Big Ideas 2026*, a16z selected "AI Data Transformation Layer" as a foundational startup category, pointing to RAG systems' problems with "hallucinations from contradictory or outdated sources and the subtle, costly breakdown of agentic workflows," and arguing that data quality is the true bottleneck. Sequoia declared in *2026: This is AGI* that "long-horizon agents are functionally AGI, and 2026 will be that year," and in *AI in 2026: A Tale of Two AIs* emphasized the shift from copilots to autonomous agents. Bessemer Venture Partners, in its *AI Infrastructure Roadmap: Five Frontiers for 2026*, analyzed that "inference workloads have come to match and in many cases exceed training in computational demand and economic significance," explicitly stating that it is watching agent-specific infrastructure such as TensorMesh (LMCache), RadixArk (SGLang routing), and Inferact (vLLM).

Press coverage has also converged on a consistent tone. TechCrunch covered LangChain in detail in its October 21, 2025 piece *Open-source agentic startup LangChain hits $1.25B valuation*; highlighted Sequoia's conviction around vertical Agentic RAG in its March 25, 2026 article *Harvey confirms $11B valuation, Sequoia triples down*; and tracked Sequoia's capital commitment in its April 16, 2026 piece *New leaders, new fund: Sequoia has raised $7B to expand its AI bets*. The Information broke early reporting on speculation around a Pinecone sale, Bloomberg reported the rapid growth of vertical agents in its January 28, 2026 article *AI customer support startup Decagon valued at $4.5 billion*, and CNBC tracked enterprise agent penetration in its June 10, 2025 piece *Glean raises $150M at $7.2 billion valuation*. Fortune disclosed a sense of LangChain's ARR in an exclusive interview on October 20, 2025, while VentureBeat has repeatedly featured Contextual AI's benchmark results.

Looking at VC capital allocation in 2025–2026, a clear divergence is visible. The framework layer (LangChain, LlamaIndex, CrewAI) attracted hundreds of millions in capital, but valuations remained in the low-to-mid hundreds of millions. The vector DB layer (Pinecone, Qdrant, Chroma) continued raising $50–100 million on a rolling basis, but pressure from OSS and hyperscalers has begun to emerge — category king Pinecone is reportedly considering a sale. By contrast, vertical Agentic RAG players (Harvey, Sierra, Decagon, Hebbia, Glean, Cohere) have consistently raised hundreds of millions to over a billion dollars individually, with valuations reaching $4.5–11 billion. Comparing ARR-to-valuation multiples also shows clearly higher capital efficiency on the vertical side. This is evidence that, even as OpenAI, Anthropic, and Google DeepMind monopolize intellectual dominance on the platform side, VCs read the greatest economic value as accruing to "vertical agents that capture entire business workflows on top of foundation models."

Critical perspectives have also emerged in parallel. In a June 25, 2025 press release, Gartner warned that "more than 40% of agentic AI projects will be cancelled by the end of 2027," citing a lack of governance and observability as the primary causes. McKinsey's autumn 2025 *State of AI 2025* report noted that more than 80% of enterprises have yet to see a meaningful revenue contribution from generative AI, and that fewer than 10% of organizations have succeeded in scaling AI agents in individual functions. SaaStr's Jason Lemkin repeatedly stated in the 2025 *SaaS Vibe Check* that "winners are builders and storytellers alike," while warning that "production AI agents without observability and evaluation are time bombs." The figure cited in Anthropic's multi-agent evaluation — "token consumption 15 times that of standard chat" — is repeatedly invoked as the emblematic number for this cost problem.

Chapter 9: New Developments Anticipated from the Second Half of 2026 Through 2027

When synthesizing forecasts from multiple VCs, analysts, and think tanks, the contours of developments expected over the next 12–18 months become relatively clear.

First, the race to standardize long-term memory and agent harnesses will intensify in earnest. Five major harnesses—LangChain's Deep Agents, Microsoft Agent Framework 1.0, OpenAI Responses API, Anthropic Claude + MCP, and Databricks Genie Code—will compete for the title of "OS for agents" from late 2026 through 2027. It will take years before a winner emerges, but MCP's donation to the Linux Foundation has at least neutralized the connector layer. The LangChain conference scheduled for May 13–14, 2026 in San Francisco (featuring Harrison Chase, Jensen Huang, and Andrew Ng) will serve as a rallying point for the open harness camp.

Second, vertical Agentic RAG category leaders will consolidate by industry. Harvey (legal), Hebbia (finance), Sierra (customer service), Decagon (customer service/AI Concierge), Glean (enterprise search), Writer (content), Spellbook as a potential Harvey competitor, EvenUp (insurance/personal injury), and Norm Ai (compliance)—a single dominant player will emerge in each vertical as the go-to Agentic RAG solution for that space. Sequoia, a16z, Coatue, and Index Ventures are expected to continue follow-on investments at this stage, pushing these companies toward IPO scale. Cohere remains the top IPO candidate, with Q2–Q3 2026 in sight.

Third, consolidation in the vector DB layer will accelerate. If Pinecone's acquisition closes, one of Oracle, IBM, Snowflake, or MongoDB will absorb the category king, clarifying the division between them and OSS players like Weaviate, Qdrant, and Milvus. Meanwhile, if object-storage-driven players like Turbopuffer and LanceDB advance toward native integration with Alibaba, AWS, and Azure, the "vector DB as an independent category" may dissolve entirely. Whether the future Edo Liberty describes—"vector DBs becoming enterprise long-term memory"—materializes will become clear by 2027.

Fourth, evaluation and observability infrastructure will grow explosively. Evaluation and observability platforms such as LangSmith, Arize, Galileo, TruEra, Ragas, Phoenix, and Braintrust have already been raising tens of millions of dollars from 2025 into 2026. As Gartner's warning that "40% will be cancelled" suggests, the difference between success and failure for agentic projects ultimately comes down to whether teams can observe production behavior and manage quality quantitatively. From late 2026 onward, AI Observability is expected to form its own independent category universe.

Fifth, Japanese enterprises are anticipated to make a full-scale entry into enterprise Agentic RAG. Major IT and telecom players—including Rakuten, NTT Data, Fujitsu, NEC, KDDI, SoftBank, Mercari, and LY Corporation—are widely reported to be announcing corporate Agentic RAG products in succession. Hybrid configurations combining domestic LLMs such as NTT's tsuzumi, Fujitsu's Takane, and NEC's cotomi with foreign models like Cohere and Claude are expected to become mainstream. As government procurement frameworks for sovereign AI requirements take shape, vendors like Cohere that market themselves as "sovereign AI" providers are likely to expand their presence in the Japanese market.

Sixth, agent-to-agent interoperability protocols (A2A) will become the next standardization battleground. Microsoft Agent Framework has positioned itself around cross-runtime interoperability via A2A + MCP, with other vendors following suit, and the AAIF (Agentic AI Foundation) may develop a common specification for this space. If the mechanism by which agents delegate and hand off tasks to other agents becomes standardized, Agentic RAG will evolve from "search by a single agent" to "search by an ecosystem of agents." The "multi-agent coordination" phase outlined in the survey paper by Singh et al. is most likely to materialize at the implementation level around 2027.

Chapter 10　Conclusion ―― Agentic RAG: From "Search Tool" to "Intelligent Infrastructure"

As of April 2026, Silicon Valley VC capital tells us that the concept of RAG has already evolved beyond the realm of a "search tool" into the intellectual infrastructure itself within the enterprise. The prophecy of Pinecone founder Edo Liberty — "vector DBs will transform into the long-term memory of the enterprise" — the observation of Databricks co-founder and 2026 ACM Prize in Computing recipient Matei Zaharia — "AGI is already here; it just doesn't exist in the form we evaluate it" — the philosophy Harrison Chase advocates — "harnesses and memory should be open" — and the engineering ambition embodied by Douwe Kiela — "retrieval and generation unified into a single system as RAG 2.0." At the intersection of these ideas, an ongoing infrastructure layer called Agentic RAG is rising.

Silicon Valley VCs view Agentic RAG not merely as a sub-topic of AI, but as a tectonic shift that will define the next decade of enterprise software. IVP valuing LangChain at $1.25 billion, Sequoia backing Harvey three times, a16z leading growth at Hebbia and Decagon, Coatue continuously betting on vertical agents, Greycroft/Bain/Lightspeed supporting Contextual AI's RAG 2.0, and Wellington Management — a traditional asset manager — entering Glean: this multi-layered pattern of capital allocation demonstrates that Agentic RAG is an exceptionally far-reaching theme that spans three domains — "infrastructure investment," "enterprise application investment," and "productivity investment."

At the same time, the realities of Gartner's predicted "40% cancellation rate," Anthropic's reported "15x token consumption," and McKinsey's noted "sub-10% scale track record" also suggest that a bubble correction and expectation adjustment phase is unavoidable. From the second half of 2026 through 2027, Agentic RAG should mature from the expectation of "magic that works anywhere" into "infrastructure deeply embedded in business processes, with clear ROI and observability." In that process, winners and losers will be clearly divided. What Silicon Valley engineers should be watching now is not the selection of specific frameworks or vector DBs per se, but the more fundamental architectural question: "on which control layer will we place our organization's long-term memory?" When we look back on Agentic RAG in 2027, we will remember it as "the inflection point at which the design philosophy of AI applications shifted from a pipeline model to a process model."

Chapter 10: Conclusion — Agentic RAG: From "Search Tool" to "Intelligent Infrastructure"

What Silicon Valley VC capital tells us as of April 2026 is that the concept of RAG has already evolved beyond the category of "search tool" into something that functions as the very intellectual infrastructure of the enterprise. The prophecy of Pinecone founder Edo Liberty — "vector DBs will transform into the long-term memory of the enterprise" — the observation of Databricks co-founder and 2026 ACM Prize in Computing recipient Matei Zaharia — "AGI is already here; it just doesn't exist in the form we evaluate it" — the philosophy Harrison Chase advocates — "harness and memory should be open" — and the engineering ambition embodied by Douwe Kiela — "RAG 2.0 as retrieval and generation unified into a single system." At the intersection of these ideas, Agentic RAG is emerging as a living, present-tense infrastructure layer.

Silicon Valley VCs view Agentic RAG not as a mere subtopic of AI, but as a tectonic shift that will define the next decade of enterprise software. IVP valuing LangChain at $1.25 billion, Sequoia making three rounds of investment in Harvey, a16z leading the growth of Hebbia and Decagon, Coatue continuously betting on vertical agents, Greycroft/Bain/Lightspeed backing Contextual AI's RAG 2.0, and Wellington Management — a traditional asset manager — entering Glean: this multilayered pattern of capital allocation demonstrates that Agentic RAG is an extraordinarily far-reaching theme that cuts across three domains: "infrastructure investment," "enterprise application investment," and "productivity investment."

At the same time, the realities that Gartner predicts — "40% cancellation rates" — that Anthropic reports — "15x token consumption" — and that McKinsey points to — "scale success rates below 10%" — also suggest that a bubble correction and expectation adjustment are unavoidable. From the second half of 2026 through 2027, Agentic RAG should mature from the expectation of "magic that works anywhere" into "infrastructure deeply embedded in business processes, accompanied by clear ROI and observability." In that process, winners and losers will be sharply divided. What Silicon Valley engineers should be watching now is not the selection of a specific framework or vector DB itself, but the more fundamental architectural question: "On which control layer do we place our organization's long-term memory?" When we look back on Agentic RAG in 2027, we will remember it as "the inflection point at which the design philosophy of AI applications shifted from pipeline-type to process-type."

Sources

Singh, Ehtesham, Kumar, Khoei, Vasilakos, "Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG" (arXiv:2501.09136) — https://arxiv.org/abs/2501.09136
Asai et al., "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection" (arXiv:2310.11511) — https://arxiv.org/abs/2310.11511
Yan et al., "Corrective Retrieval Augmented Generation" (arXiv:2401.15884) — https://arxiv.org/abs/2401.15884
Jeong, Baek, Cho, Hwang, Park, "Adaptive-RAG" (arXiv:2403.14403) — https://arxiv.org/abs/2403.14403
Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models" (arXiv:2210.03629) — https://arxiv.org/abs/2210.03629
Edge et al., Microsoft Research, "From Local to Global: A Graph RAG Approach to Query-Focused Summarization" (arXiv:2404.16130) — https://arxiv.org/abs/2404.16130
Microsoft Research, "LazyGraphRAG: Setting a new standard for quality and cost" — https://www.microsoft.com/en-us/research/blog/lazygraphrag-setting-a-new-standard-for-quality-and-cost/
Microsoft Research, GraphRAG project — https://www.microsoft.com/en-us/research/project/graphrag/
Microsoft Research, "BenchmarkQED: Automated benchmarking of RAG systems" — https://www.microsoft.com/en-us/research/blog/benchmarkqed-automated-benchmarking-of-rag-systems/
Anthropic Engineering, "How we built our multi-agent research system" — https://www.anthropic.com/engineering/multi-agent-research-system
Anthropic, "Introducing Contextual Retrieval" — https://www.anthropic.com/news/contextual-retrieval
Anthropic, "Introducing the Model Context Protocol" — https://www.anthropic.com/news/model-context-protocol
Anthropic, "Donating the Model Context Protocol and establishing the Agentic AI Foundation" — https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation
Model Context Protocol Specification 2025-11-25 — https://modelcontextprotocol.io/specification/2025-11-25
NVIDIA Developer Blog, "Traditional RAG vs Agentic RAG — Why AI Agents Need Dynamic Knowledge to Get Smarter" — https://developer.nvidia.com/blog/traditional-rag-vs-agentic-rag-why-ai-agents-need-dynamic-knowledge-to-get-smarter/
NVIDIA AI Blueprints, RAG reference implementation — https://github.com/NVIDIA-AI-Blueprints/rag
Cohere, "Rerank" — https://cohere.com/rerank
Cohere Docs, "Command A" — https://docs.cohere.com/docs/command-a
BAAI, "BGE Reranker v2-m3" — https://huggingface.co/BAAI/bge-reranker-v2-m3
Khattab and Zaharia, "ColBERT" (arXiv:2004.12832) — https://arxiv.org/abs/2004.12832
OpenAI, "New Tools for Building Agents" — https://openai.com/index/new-tools-for-building-agents/
OpenAI, "New tools and features in the Responses API" — https://openai.com/index/new-tools-and-features-in-the-responses-api/
OpenAI Developers, "Responses API tool orchestration" — https://developers.openai.com/cookbook/examples/responses_api/responses_api_tool_orchestration
OpenAI Platform Docs, "Deep Research Guide" — https://platform.openai.com/docs/guides/deep-research
Andreessen Horowitz, "Big Ideas 2026 (Part 1)" — https://a16z.com/newsletter/big-ideas-2026-part-1/
Andreessen Horowitz, "The Rise of Computer Use and Agentic Coworkers" — https://a16z.com/the-rise-of-computer-use-and-agentic-coworkers/
Andreessen Horowitz, "Vector Databases and the Power of RAG" — https://a16z.com/podcast/vector-databases-and-the-power-of-rag/
Andreessen Horowitz, "Investing in Hebbia" — https://a16z.com/announcement/investing-in-hebbia/
Sequoia Capital, "2026: This is AGI" — https://sequoiacap.com/article/2026-this-is-agi/
Sequoia Capital, "AI in 2026: A Tale of Two AIs" — https://sequoiacap.com/article/ai-in-2026-the-tale-of-two-ais/
Sequoia Capital Podcast, "Context Engineering Our Way to Long-Horizon Agents" (Harrison Chase) — https://sequoiacap.com/podcast/context-engineering-our-way-to-long-horizon-agents-langchains-harrison-chase/
Sequoia Capital, "LangChain: From Agent 0 to 1 to Agentic Engineering" — https://sequoiacap.com/article/langchain-from-agent-0-to-1-to-agentic-engineering/
Bessemer Venture Partners, "AI Infrastructure Roadmap: Five frontiers for 2026" — https://www.bvp.com/atlas/ai-infrastructure-roadmap-five-frontiers-for-2026
Bessemer Venture Partners, "Securing AI Agents" — https://www.bvp.com/atlas/securing-ai-agents-the-defining-cybersecurity-challenge-of-2026
Gartner Press Release, "40% of enterprise apps will feature task-specific AI agents by 2026" — https://www.gartner.com/en/newsroom/press-releases/2025-08-26-gartner-predicts-40-percent-of-enterprise-apps-will-feature-task-specific-ai-agents-by-2026-up-from-less-than-5-percent-in-2025
Gartner Press Release, "Over 40% of Agentic AI projects will be canceled by end of 2027" — https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027
Gartner, SCM Software with Agentic AI $53B forecast — https://www.gartner.com/en/newsroom/press-releases/2026-04-07-gartner-forecasts-supply-chain-management-software-with-agentic-ai-will-grow-to-53-billion-in-spend-by-2030
McKinsey, "State of AI trust in 2026: Shifting to the agentic era" — https://www.mckinsey.com/capabilities/tech-and-ai/our-insights/tech-forward/state-of-ai-trust-in-2026-shifting-to-the-agentic-era
McKinsey, "Seizing the Agentic AI Advantage" — https://www.mckinsey.com/capabilities/quantumblack/our-insights/seizing-the-agentic-ai-advantage
McKinsey, "The State of AI 2025: Agents, Innovation" — https://www.mckinsey.com/~/media/mckinsey/business%20functions/quantumblack/our%20insights/the%20state%20of%20ai/november%202025/the-state-of-ai-2025-agents-innovation_cmyk-v1.pdf
MarketsandMarkets, RAG Market Report — https://www.marketsandmarkets.com/PressReleases/retrieval-augmented-generation-rag.asp
MarketsandMarkets, Vector Database Market — https://www.marketsandmarkets.com/Market-Reports/vector-database-market-112683895.html
Grand View Research, RAG Market — https://www.grandviewresearch.com/industry-analysis/retrieval-augmented-generation-rag-market-report
Fortune Business Insights, Agentic AI Market — https://www.fortunebusinessinsights.com/agentic-ai-market-114233
LangChain Blog, "Series B Announcement" — https://blog.langchain.com/series-b/
TechCrunch, "Open-source agentic startup LangChain hits $1.25B valuation" — https://techcrunch.com/2025/10/21/open-source-agentic-startup-langchain-hits-1-25b-valuation/
Fortune, "Early AI darling LangChain is now a unicorn" — https://fortune.com/2025/10/20/exclusive-early-ai-darling-langchain-is-now-a-unicorn-with-a-fresh-125-million-in-funding/
LangChain Blog, "Your harness, your memory" — https://www.langchain.com/blog/your-harness-your-memory
LangChain Blog, NVIDIA enterprise partnership — https://blog.langchain.com/nvidia-enterprise/
LangChain Newsletter, March 2026 — https://www.langchain.com/blog/march-2026-langchain-newsletter
TechCrunch, "LlamaIndex launches a cloud service for building unstructured data agents" — https://techcrunch.com/2025/03/04/llamaindex-launches-a-cloud-service-for-building-unstructed-data-agents/
LlamaIndex Blog, "Series A and LlamaCloud GA" — https://www.llamaindex.ai/blog/announcing-our-series-a-and-llamacloud-general-availability
LlamaIndex Blog, "LlamaIndex is more than a RAG framework" — https://www.llamaindex.ai/blog/llamaindex-is-more-than-a-rag-framework
PRNewswire, "LlamaIndex Secures $19 Million Series A" — https://www.prnewswire.com/news-releases/llamaindex-secures-19-million-series-a-to-power-enterprise-grade-knowledge-agents-302390936.html
Norwest Venture Partners, "LlamaIndex" — https://www.norwest.com/blog/llamaindex-harnesses-the-power-of-enterprise-data-for-ai-agent-workflows/
Pulse2, "CrewAI Multi-Agent Platform Raises $18M Series A" — https://pulse2.com/crewai-multi-agent-platform-raises-18-million-series-a/
Insight Partners, "CrewAI ScaleUp AI story" — https://www.insightpartners.com/ideas/crewai-scaleup-ai-story/
Enterprise AI World, CrewAI $18M — https://www.enterpriseaiworld.com/Articles/News/News/$18M-in-Funding-Catapults-CrewAIs-Multi-Agentic-Platform-to-the-Enterprise-Level-166495.aspx
deepset, "Funding Announcement Balderton Capital" — https://www.deepset.ai/news/funding-announcement-balderton-capital
deepset, "Introducing Haystack Enterprise Platform" — https://www.deepset.ai/blog/introducing-haystack-enterprise-platform
Visual Studio Magazine, "Microsoft ships production-ready Agent Framework 1.0" — https://visualstudiomagazine.com/articles/2026/04/06/microsoft-ships-production-ready-agent-framework-1-0-for-net-and-python.aspx
Microsoft Learn, "Agent Framework Overview" — https://learn.microsoft.com/en-us/agent-framework/overview/
Visual Studio Magazine, "Semantic Kernel + AutoGen open source Microsoft Agent Framework" — https://visualstudiomagazine.com/articles/2025/10/01/semantic-kernel-autogen--open-source-microsoft-agent-framework.aspx
Dify — https://dify.ai/
Pinecone Blog, "Series B" — https://www.pinecone.io/blog/series-b/
TechCrunch, "Pinecone drops $100M investment on $750M valuation" — https://techcrunch.com/2023/04/27/pinecone-drops-100m-investment-on-750m-valuation-as-vector-database-demand-grows/
VentureBeat, "Pinecone founder Edo Liberty appoints Googler Ash as CEO" — https://venturebeat.com/data-infrastructure/pinecone-founder-edo-liberty-appoints-googler-ash-as-ceo
PRNewswire, Pinecone CEO transition — https://www.prnewswire.com/news-releases/pinecone-founder-edo-liberty-to-spearhead-pinecones-growing-ai-ambitions-appoints-ash-ashutosh-as-ceo-to-expand-vector-database-market-leadership-302549334.html
Calcalist Tech, Pinecone sale speculation — https://www.calcalistech.com/ctechnews/article/rz31q82b5
TechTarget SearchDataManagement, Pinecone eyes future — https://www.techtarget.com/searchdatamanagement/news/366631366/Vector-database-vendor-Pinecone-eyes-future-under-new-CEO
Pinecone Blog, "Pinecone Assistant Generally Available" — https://www.pinecone.io/blog/pinecone-assistant-generally-available/
Weaviate, "Weaviate Series B" — https://www.prnewswire.com/news-releases/weaviate-raises-50-million-series-b-funding-to-meet-soaring-demand-for-ai-native-vector-database-technology-301803296.html
Weaviate, Weaviate Goes Full Stack — https://www.globenewswire.com/news-release/2025/03/04/3036570/0/en/Weaviate-Goes-Full-Stack-With-Launch-of-Weaviate-Agents-for-AI-Development.html
Weaviate Blog, Query Agent — https://weaviate.io/blog/query-agent
Weaviate Blog, 2025 year in review — https://weaviate.io/blog/weaviate-in-2025
Weaviate, Agent Skills launch — https://www.globenewswire.com/news-release/2026/02/21/3242244/0/en/Weaviate-Launches-Agent-Skills-to-Empower-AI-Coding-Agents.html
Weaviate, GigaOm Leader / Gartner Emerging Leader recognition — https://www.globenewswire.com/news-release/2026/01/14/3218396/0/en/Weaviate-named-a-Leader-and-Outperformer-by-GigaOm-and-Emerging-Leader-by-Gartner-Market-Momentum-Accelerates-as-Nonrelational-DBMS-Segment-Grows-22-7.html
TechCrunch, "Qdrant open-source vector database" — https://techcrunch.com/2024/01/23/qdrant-open-source-vector-database/
Qdrant Blog, "Series A funding round" — https://qdrant.tech/blog/series-a-funding-round/
BusinessWire, "Qdrant Raises $50M Series B" — https://www.businesswire.com/news/home/20260312313902/en/Qdrant-Raises-$50-Million-Series-B-to-Define-Composable-Vector-Search-as-Core-Infrastructure-for-Production-AI
Chroma Company, "Seed" — https://www.trychroma.com/company/seed
Medium, "Our investment in Chroma (Astasia Myers)" — https://medium.com/memory-leak/our-investment-in-chroma-the-developer-centric-embedding-database-34277ac327e8
Salestools, Chroma Series B report — https://salestools.io/en/report/chroma-raises-18m-series-b
TechCrunch, "Zilliz relocates to SF, raises $60M" — https://techcrunch.com/2022/08/24/zilliz-the-startup-behind-the-milvus-open-source-vector-database-for-ai-applications-raises-60m-and-relocates-to-sf/
Yahoo Finance, "Milvus surpasses 40,000 GitHub stars" — https://finance.yahoo.com/news/milvus-surpasses-40-000-github-010000562.html
TechCrunch, Vespa spin-out — https://techcrunch.com/2023/10/04/yahoo-spins-out-vespa-its-search-tech-into-an-independent-company/
Vespa.ai, Blossom Capital funding — https://vespa.ai/2023-11-01-blossom-funding/
Tracxn, LanceDB profile — https://tracxn.com/d/companies/lancedb/__ie1HuEEUoPOIc3tEX5yowY9yMJz9kdNTH01mwCePxLw
BetaKit, Turbopuffer raises financing — https://betakit.com/ex-shopify-engineers-raise-fresh-financing-to-scale-turbopuffers-ai-search/
Turbopuffer, About — https://turbopuffer.com/about
SiliconANGLE, Contextual AI nabs $80M for RAG 2.0 — https://siliconangle.com/2024/08/02/contextual-ai-nabs-80m-rag-2-0-platform/
Contextual AI Blog, Platform GA — https://contextual.ai/blog/platform-ga-press-release
Contextual AI Research, Introducing RAG 2.0 — https://contextual.ai/research/introducing-rag2
VentureBeat, Contextual AI outperforms GPT-4o — https://venturebeat.com/ai/contextual-ais-new-ai-model-crushes-gpt-4o-in-accuracy-heres-why-it-matters
VentureBeat, Contextual AI Agent Composer — https://venturebeat.com/technology/contextual-ai-launches-agent-composer-to-turn-enterprise-rag-into-production
Morningstar PRNewswire, Agent Composer launch — https://www.morningstar.com/news/pr-newswire/20260127sf71236/contextual-ai-launches-agent-composerai-for-when-it-actually-is-rocket-science
TechCrunch, "Cohere hits $7B valuation partners with AMD" — https://techcrunch.com/2025/09/24/cohere-hits-7b-valuation-a-month-after-its-last-raise-partners-with-amd/
PSP Investments / Cohere funding announcement — https://www.investpsp.com/en/news/fresh-funding-enables-cohere-to-accelerate-its-global-expansion-and-build-the-next-generation-of-secure-enterprise-and-sovereign-ai-solutions/
Futurum Group, "Cohere multilingual sovereign AI moat" — https://futurumgroup.com/insights/coheres-multilingual-sovereign-ai-moat-ahead-of-a-2026-ipo/
InfoWorld, "Cohere goes North with agentic AI" — https://www.infoworld.com/article/3757962/cohere-goes-north-with-agentic-ai.html
Vectara Series A, BusinessWire — https://www.businesswire.com/news/home/20240716489550/en/Vectara-Secures-%2425-Million-Series-A-Funding-to-Advance-the-Trustworthiness-of-Retrieval-Augmented-Generation-with-New-Mockingbird-LLM
Glean, Series F $150M — https://www.glean.com/press/glean-raises-150m-series-f-at-7-2b-valuation-to-accelerate-enterprise-ai-agent-innovation-globally
TechCrunch, Glean $7.2B valuation — https://techcrunch.com/2025/06/10/enterprise-ai-startup-glean-lands-a-7-2b-valuation/
CNBC, Glean $150M Series F — https://www.cnbc.com/2025/06/10/glean-gen-ai-search-startup-raises-150-million-at-7-billion-value.html
TechCrunch, Perplexity $200M at $20B — https://techcrunch.com/2025/09/10/perplexity-reportedly-raised-200m-at-20b-valuation/
Writer.com, Series C announcement — https://writer.com/blog/series-c-funding-writer-press-release/
TechCrunch, Writer $200M at $1.9B — https://techcrunch.com/2024/11/12/generative-ai-startup-writer-raises-200m-at-a-1-9b-valuation/
TechCrunch, Hebbia $130M Series B — https://techcrunch.com/2024/07/09/ai-startup-hebbia-rased-130m-at-a-700m-valuation-on-13-million-of-profitable-revenue/
Hebbia Blog, Series B — https://www.hebbia.com/blog/hebbia-raises-usd130m-series-b
TechCrunch, Harvey $11B valuation — https://techcrunch.com/2026/03/25/harvey-confirms-11b-valuation-sequoia-triples-down/
CNBC, Harvey $200M at $11B — https://www.cnbc.com/2026/03/25/legal-ai-startup-harvey-raises-200-million-at-11-billion-valuation.html
Harvey, Series announcement — https://www.harvey.ai/blog/harvey-raises-at-dollar11-billion-valuation-to-scale-agents-across-law-firms-and-enterprises
TechCrunch, Harvey $8B December 2025 — https://techcrunch.com/2025/12/04/legal-ai-startup-harvey-confirms-8b-valuation/
TechCrunch, Bret Taylor Sierra $350M at $10B — https://techcrunch.com/2025/09/04/bret-taylors-sierra-raises-350m-at-a-10b-valuation/
TechCrunch, Sierra $100M ARR — https://techcrunch.com/2025/11/21/bret-taylors-sierra-reaches-100m-arr-in-under-two-years/
CNBC, Sierra $10B valuation — https://www.cnbc.com/2025/09/04/bret-taylor-sierra-ai-startup-salesforce-openai.html
Decagon Blog, Series D announcement — https://decagon.ai/blog/series-d-announcement
Bloomberg, Decagon $4.5B valuation — https://www.bloomberg.com/news/articles/2026-01-28/ai-customer-support-startup-decagon-valued-at-4-5-billion
Decagon Blog, Series C — https://decagon.ai/resources/series-c-announcement
BusinessWire, Decagon Series D — https://www.businesswire.com/news/home/20260128580542/en/Decagons-Valuation-Triples-to-$4.5-Billion-as-it-Ushers-in-the-Age-of-AI-Concierge
Sana Labs, Series B announcement — https://sanalabs.com/resources/announcing-sanas-series-b-round
Sana Labs, Series B extension — https://sanalabs.com/resources/sana-reaches-62m-dollars-in-series-b-funding
You.com, Series C announcement — https://you.com/resources/series-c
TechStartups, You.com $1.5B valuation — https://techstartups.com/2025/09/04/you-com-raises-100m-series-c-in-funding-at-1-5b-valuation-to-scale-ai-search-infrastructure/
TechCrunch, Databricks $4B at $134B — https://techcrunch.com/2025/12/16/databricks-raises-4b-at-134b-valuation-as-its-ai-business-heats-up/
Databricks, MosaicML acquisition — https://www.databricks.com/company/newsroom/press-releases/databricks-completes-acquisition-mosaicml
Databricks Blog, Genie Code — https://www.databricks.com/blog/introducing-genie-code
Databricks, Agent Bricks launch — https://www.databricks.com/company/newsroom/press-releases/databricks-launches-agent-bricks-new-approach-building-ai-agents
TechCrunch, Matei Zaharia wins ACM Computing Prize — https://techcrunch.com/2026/04/08/databricks-matei-zaharia-wins-acm-computing-prize-agi/
TechCrunch, Sequoia raises $7B for AI — https://techcrunch.com/2026/04/16/new-leaders-new-fund-sequoia-has-raised-7b-to-expand-its-ai-bets/
Snowflake Release Notes, Cortex Agents GA — https://docs.snowflake.com/en/release-notes/2025/other/2025-11-04-cortex-agents
Snowflake, Cortex overview — https://www.snowflake.com/en/product/features/cortex/
Snowflake, Cortex Code — https://www.snowflake.com/en/product/features/cortex-code/
SaaStr, "The 2025 SaaS Vibe Check with Jason Lemkin" — https://www.saastr.com/the-2025-saas-vibe-check-what-founders-need-to-know-right-now-with-saastr-ceo-and-founder-jason-lemkin/
Jahanzaib.ai, "Agentic RAG Production Guide" — https://www.jahanzaib.ai/blog/agentic-rag-production-guide
Suprmind, "AI Hallucination Rates and Benchmarks" — https://suprmind.ai/hub/ai-hallucination-rates-and-benchmarks/

A thorough explanation of "Agentic RAG," the mainstream approach to RAG

Chapter 1: Why RAG Had No Choice but to Become "Agentic" Now

Chapter 2: Core Technologies and Design Patterns of Agentic RAG

Chapter 3: Framework/Orchestration Layer — LangChain, LlamaIndex, CrewAI, and Microsoft

Chapter 4　Vector DB / Search Infrastructure Layer ―― Pinecone, Weaviate, Qdrant, Chroma, Turbopuffer

Chapter 5: End-to-End / Contextual Platforms — Contextual AI, Cohere, Vectara

Chapter 6: Vertically Specialized Agentic RAG — Harvey, Hebbia, Sierra, Decagon, Glean

Chapter 7: MCP and Ecosystem Standardization — Why Anthropic Donated MCP to the Linux Foundation

Chapter 8: Silicon Valley VC Capital Allocation and Media Narrative

Chapter 9: New Developments Anticipated from the Second Half of 2026 Through 2027

Chapter 10　Conclusion ―― Agentic RAG: From "Search Tool" to "Intelligent Infrastructure"

Chapter 10: Conclusion — Agentic RAG: From "Search Tool" to "Intelligent Infrastructure"

Sources

関連サービス

関連投資家

A thorough explanation of "Agentic RAG," the mainstream approach to RAG

Chapter 1: Why RAG Had No Choice but to Become "Agentic" Now

Chapter 2: Core Technologies and Design Patterns of Agentic RAG

Chapter 3: Framework/Orchestration Layer — LangChain, LlamaIndex, CrewAI, and Microsoft

Chapter 4 Vector DB / Search Infrastructure Layer ―― Pinecone, Weaviate, Qdrant, Chroma, Turbopuffer

Chapter 5: End-to-End / Contextual Platforms — Contextual AI, Cohere, Vectara

Chapter 6: Vertically Specialized Agentic RAG — Harvey, Hebbia, Sierra, Decagon, Glean

Chapter 7: MCP and Ecosystem Standardization — Why Anthropic Donated MCP to the Linux Foundation

Chapter 8: Silicon Valley VC Capital Allocation and Media Narrative

Chapter 9: New Developments Anticipated from the Second Half of 2026 Through 2027

Chapter 10 Conclusion ―― Agentic RAG: From "Search Tool" to "Intelligent Infrastructure"

Chapter 10: Conclusion — Agentic RAG: From "Search Tool" to "Intelligent Infrastructure"

Sources

関連サービス

関連投資家

Chapter 4　Vector DB / Search Infrastructure Layer ―― Pinecone, Weaviate, Qdrant, Chroma, Turbopuffer

Chapter 10　Conclusion ―― Agentic RAG: From "Search Tool" to "Intelligent Infrastructure"