What Is Data Sovereignty — And Why This Debate Is Needed Now

Data Sovereignty refers to the right and ability to control the collection, storage, and use of data based on the legal jurisdiction of the location where that data is physically stored and processed. Before the widespread adoption of cloud computing, data was stored on physical servers, and the laws of the country where those servers resided applied automatically. However, with AWS, Azure, and Google Cloud deploying regions across the globe and data crossing national borders instantaneously, the question of "who controls the data" has become extraordinarily complex.

There are three reasons why this debate carries unprecedented urgency in 2026. First, the frequent occurrence of large-scale data breaches. From 2024 through 2025, cyberattacks targeting healthcare institutions, financial institutions, and government agencies increased by 37% year-over-year, with the cumulative number of leaked records reaching billions. Second, rising geopolitical tensions. As the US-China rivalry deepens, the Russia-Ukraine conflict drags on, and instability in the Middle East persists, nation-states have begun redefining data as a "strategic asset." Third, the explosive proliferation of generative AI has simultaneously made both the value and vulnerability of data more visible. The risk of corporate confidential information being unintentionally included in LLM training data, indirect information leakage through inference APIs, and questions of sovereignty over the training data of AI models themselves have become the top priorities for CTOs and CISOs.

From an investment perspective, data sovereignty is not merely a compliance cost — it is a massive market opportunity giving rise to entirely new categories. The stricter the regulations, the greater the demand for the infrastructure, tools, and services needed to comply with them. Of the $89.4 billion in total AI-related VC investment in 2025, the allocation to the data governance and privacy tech sector surged 2.3x year-over-year, and this trend is expected to accelerate further beyond 2026.

EU — The World's Largest Regulatory Testing Ground

The European Union has been leading the world in legislating data sovereignty. The GDPR (General Data Protection Regulation), which came into force in 2018, effectively set the global standard for data protection through three innovative mechanisms: extraterritorial application, substantial financial penalties, and the right to data portability.

A landmark example of GDPR's enforcement power is the €530 million (approximately ¥88 billion) fine imposed on TikTok by the Irish Data Protection Commission (DPC) in May 2025. The penalty targeted the transfer of European users' data to servers in China, and sent shockwaves around the world as one of the largest sanctions ever levied for a data sovereignty violation. While TikTok had been working toward storing all EU user data within the EU (Project Clover), violations that occurred during the transition period were what triggered the action.

Yet GDPR is only the beginning. The EU Data Act, which entered into force in September 2025, grants users the right to access data generated by IoT devices and mandates data portability between cloud services. Manufacturing, connected cars, smart homes, and everything else that "generates data" falls within the scope of regulation, forcing companies to fundamentally redesign their data handling policies for the products they make.

Furthermore, the EU AI Act, set for full application in August 2026, establishes requirements for transparency in AI model training data, certification of high-risk AI systems, and compliance obligations for general-purpose AI models. As a result, AI developers will be required to demonstrate the origin and legal basis of the data used to train their models, extending data sovereignty concerns across the entire AI supply chain.

The EU's ambitious vision extends to infrastructure as well. Gaia-X, the initiative to build a distinctly European cloud foundation, has not progressed as hoped since its announcement in 2019, hampered by conflicts of interest among participating companies and delays in reaching consensus on technical specifications. However, the "EuroStack" concept that emerged in late 2025 is an even more ambitious proposal: building a sovereign European cloud and AI infrastructure through €300 billion (approximately ¥50 trillion) in investment. Drawing on the lessons of Gaia-X, the initiative is exploring a bottom-up approach in which a consortium of private companies sets technical standards, rather than a government-led, top-down model.

United States — The Contradiction Between Extraterritorial Application and the Patchwork of State Laws

The situation surrounding data sovereignty in the United States is characterized above all by "institutional fragmentation," in stark contrast to the EU. A comprehensive federal privacy law still does not exist, and as of March 2026, 20 states have enacted their own privacy laws. California's CCPA/CPRA, Virginia's VCDPA, Colorado's CPA, and others each differ in their scope of protection, consumer rights, and enforcement mechanisms, and the practical burden on companies operating nationwide continues to grow.

Adding a structural contradiction to this fragmentation is the CLOUD Act (Clarifying Lawful Overseas Use of Data Act), enacted in 2018. The CLOUD Act grants U.S. law enforcement agencies the authority to access data controlled by U.S. companies even when that data is stored on servers overseas. This means that even if a European company uses AWS or Azure to store data in an EU region, the U.S. government can theoretically demand access to that data — a direct collision with GDPR's cross-border transfer regulations.

The most dramatic manifestation of this contradiction was the series of events surrounding TikTok. The U.S. government demanded that TikTok's parent company ByteDance divest its operations or face a ban, citing the potential for the Chinese government to access U.S. user data. After years of legal battles and political maneuvering, the issue went far beyond a single company's regulatory problem, cementing in the global consciousness the understanding that "data is directly tied to national security." Ironically, the logic the U.S. sought to apply to TikTok is essentially the same structure as the authority the U.S. itself exercises over other nations' data through the CLOUD Act.

What investors should note is that this regulatory uncertainty itself is generating startup opportunities. Tools that support privacy compliance automation, data mapping and classification, and legal review of cross-border data transfers have become one of the fastest-growing categories within the enterprise SaaS market.

Asia — The Frontline of Multipolarizing Data Regulation

In the Asia-Pacific region, countries are rapidly developing data protection legislation based on their own contexts and priorities.

China's PIPL (Personal Information Protection Law), since its enforcement in 2021, has been progressively strengthening its effectiveness. In 2025, large-scale administrative sanctions were imposed on multiple technology companies, and the security assessment system for cross-border data transfers in particular has functioned in practice as a de facto "data localization" requirement. China's distinctive approach is characterized by positioning data protection not only as an individual right, but also as part of national cybersecurity and economic security.

India's DPDP Act (Digital Personal Data Protection Act) was enacted in 2023 and is being implemented in phases. With a population of over 1.4 billion, India's data protection legislation carries global impact by virtue of its scale alone. Of particular interest is how the country will balance the government's "critical data" localization requirements against maintaining the international competitiveness of its IT industry.

Japan's APPI (Act on the Protection of Personal Information) was amended in 2022, strengthening cross-border data transfer regulations, expanding individual participation rights, and increasing penalties. Its pragmatic approach — maintaining adequacy recognition under the GDPR while securing its position as a data hub in the Asia-Pacific region — is regarded as a model for reconciling regulation with economic growth. In the next triennial review, the addition of provisions relating to generative AI is under discussion.

South Korea's PIPA (Personal Information Protection Act) has seen a significant strengthening of its enforcement framework since the Personal Information Protection Commission (PIPC) became an independent body. Its approach — being among the first in Asia to develop guidelines on AI training data and seeking a balance between the data economy and individual rights — is becoming a reference model for other Asian countries.

This multipolarization of legal frameworks means that for companies operating globally, a single compliance strategy is no longer sufficient. The surge in investment in "geofencing," "data mesh," and "multi-cloud" architectures designed to meet the varying regulatory requirements depending on where data resides is an inevitable consequence.

The Local-First Philosophy: A Technical Argument Against Cloud Dependency

While data sovereignty is debated in the context of legal systems and geopolitics, the technology community is raising a more fundamental question: "Why does our data have to live on someone else's servers in the first place?"

A systematic answer to this question was proposed in a 2019 paper by Martin Kleppmann of the University of Cambridge and colleagues, titled "Local-First Software: You Own Your Data, in spite of the Cloud." Kleppmann put forward seven ideals of "local-first" software: (1) fast performance — no dependence on network latency; (2) multi-device support — seamless data synchronization across multiple devices; (3) offline functionality — fully operational without a network connection; (4) collaboration — real-time co-editing; (5) longevity — data is not lost when a service shuts down; (6) privacy and security — end-to-end encryption; and (7) user data ownership — users, not cloud providers, control their data.

The technology underpinning this philosophy is the CRDT (Conflict-free Replicated Data Type). A CRDT is a mathematical data structure that allows multiple devices to independently edit data offline and automatically resolve conflicts in a deterministic way when they reconnect to the network. This approach guarantees consistency in distributed environments without requiring arbitration by a central server, and has become the foundational technology of local-first architecture.

Automerge and Yjs stand as the two leading practical implementations of CRDTs. Automerge is a Rust-based library developed primarily by Kleppmann himself, optimized for distributed editing of JSON-like documents. Yjs is a JavaScript-based implementation led by German developer Kevin Jahns, and its high performance has led to adoption by numerous projects including Tiptap, BlockNote, Liveblocks Yjs, and Hocuspocus. Both projects are open source, and the vitality of their communities and the maturity of their implementations are approaching the threshold for enterprise adoption.

Applications embodying the local-first philosophy are also spreading rapidly. Obsidian has attracted millions of users as a Markdown-based knowledge management tool, storing all data as local plain-text files. Anytype is an open-source project management and knowledge base app that adopts local-first and peer-to-peer sync as core design principles, drawing attention as an alternative to Notion. Logseq is an outliner-style knowledge graph tool whose architecture — using local files as the source of truth — has won support among developer communities.

At FOSDEM 2026, held in Brussels in February 2026, a dedicated "Local-First Software" devroom (developer session track) was established for the first time. The two-day sessions drew attendance exceeding the venue's capacity, with enthusiastic discussion of CRDT optimization, peer-to-peer sync protocols, integration with end-to-end encryption, and business models for local-first applications. The very establishment of this devroom marks a milestone demonstrating that local-first has matured from a niche academic concept into a practical software design paradigm.

The Rise of Sovereign Cloud — The Challenge of European Providers

The most direct infrastructure investment for ensuring data sovereignty is sovereign cloud. Sovereign cloud refers to cloud services where the location of data storage, access permissions, and operational entities all reside entirely within a specific legal jurisdiction.

Currently, approximately 70% of Europe's cloud infrastructure market is dominated by the three major US hyperscalers (AWS, Azure, and Google Cloud). This dependency structure, combined with the potential data access risks posed by the CLOUD Act, is raising serious concerns among European policymakers and enterprise CISOs.

Challenging this situation is a group of European cloud providers. France's OVHcloud, as Europe's largest independent cloud provider, is leveraging its GDPR-native infrastructure to expand its share in regulated industries. Also from France, Scaleway — operating under the Iliad Group — is focusing on GPU cloud and AI infrastructure, establishing a clear positioning specialized in AI sovereignty use cases. Germany's Hetzner, with its strong cost-performance ratio and data center network across Europe, has gained broad support from SMEs to large enterprises.

Also noteworthy is the trend of European cloud providers forming consortiums that go beyond individual competition. Virt8ra is an industry association with participation from multiple European cloud providers, aiming to build a "European cloud ecosystem" to counter US hyperscalers through the development of common API standards and the assurance of multi-cloud interoperability.

From an investment perspective, sovereign cloud is a sector with long-term structural growth. The likelihood of regulatory requirements easing is virtually zero; in fact, national legal frameworks are moving toward greater stringency. The market is projected to grow from approximately $80 billion (roughly ¥12 trillion) in 2026 to $1.13 trillion (roughly ¥170 trillion) by 2034, representing a compound annual growth rate (CAGR) of approximately 39%.

Mistral AI — The Flagship of European AI Sovereignty

As a company symbolizing Europe's data and AI sovereignty, France's Mistral AI holds an extraordinarily significant position. In just three years since its founding in 2023, the company has established itself as the "champion" of the European AI industry.

In its 2025 Series C round, it raised $2.9 billion, reaching a valuation of $13.7 billion. This valuation ranks just below OpenAI and Anthropic, making it the highest-valued AI company in the world headquartered outside the United States.

Particularly noteworthy in Mistral AI's strategy is the "Mistral Compute" project. The company is advancing construction of a dedicated AI data center equipped with 18,000 Nvidia GPUs, powered by clean energy from nuclear power. By completing both training and inference entirely within Europe, it is building a framework in which the entire AI development process can be executed without data ever leaving European legal jurisdiction.

Furthermore, its strategic partnership with SAP is accelerating penetration into the enterprise market. Mistral's AI models integrated into SAP's ERP systems are attracting strong interest from the finance, manufacturing, and public sectors — all sensitive to GDPR compliance — as a solution that allows large European enterprises to benefit from generative AI without sending their data to US-based AI providers.

Enterprise Response — Cloud Architecture Redesign

The demand for data sovereignty is fundamentally transforming the cloud strategies of large enterprises. According to the latest report from a research firm, 94% of global companies are adjusting their cloud architecture to meet data sovereignty requirements, and 79% have positioned data sovereignty at the core of their IT strategy.

A prime example of this trend is Airbus's large-scale procurement of sovereign cloud services. The tender, reportedly valued at over 50 million euros (approximately 8.2 billion yen), aims to manage sensitive information — including aircraft design data, supply chain information, and customer data — on infrastructure outside the jurisdiction of the U.S. CLOUD Act. For Airbus, which holds defense-related contracts, data sovereignty is not merely a compliance matter but a prerequisite for business continuity.

Enterprise data sovereignty efforts are progressing in three stages. The first stage is understanding where data resides. Surprisingly, many large enterprises do not have complete visibility into which regions and services their data is stored in. Investment in automated data mapping and classification tools is the first necessity. The second stage is migrating to multi-cloud and hybrid cloud architectures. This involves moving away from dependence on a single hyperscaler and transitioning to a design that uses multiple cloud providers depending on the type of data and regulatory requirements. The third stage is building sovereign AI infrastructure. This means establishing a framework for running AI model training and inference on self-managed or regulation-compliant infrastructure.

AI Sovereignty: From Chips to Inference

The debate over data sovereignty encompasses an even more multi-layered set of issues in the context of AI.

The most fundamental challenge lies in chip dependency. Nvidia holds approximately 80% of the market share for high-performance GPUs essential to AI training and inference. Moreover, many of Nvidia's chips are manufactured by TSMC (Taiwan Semiconductor Manufacturing Company), and the geopolitical risks surrounding the Taiwan Strait cast a shadow over the entire AI infrastructure supply chain. Behind Western nations' push to promote domestic semiconductor production lies a national security motivation: escaping this chip dependency.

The question of jurisdiction over training data also remains unresolved. LLM training relies on vast amounts of text data from the internet, much of which is copyright-protected and drawn from multiple legal jurisdictions. Under the EU AI Act, transparency regarding the provenance and legal basis of training data is required — yet demonstrating the legal basis for each data point in a training corpus spanning trillions of tokens is extraordinarily difficult both technically and practically.

A practical solution gaining attention is "inference at the edge." This approach deploys trained models in local environments (edge devices, on-premises servers, sovereign clouds) so that data processed during inference never leaves the organization's control. The evolution of on-device AI — exemplified by Apple Intelligence — along with model lightweighting through quantization and distillation techniques, is dramatically improving the viability of edge inference.

Governments are also accelerating the establishment of institutions and the injection of funds to secure AI sovereignty. In 2025, the UK established a new "Sovereign AI Unit" to oversee government AI procurement and infrastructure strategy. Gartner predicts that by 2027, more than 50% of large enterprises will be strategically managing geographic constraints on their AI model training and inference environments.

Investment flows clearly reflect this trend. Of the $89.4 billion in total VC investment into AI in 2025, the scale grows even larger when including government and sovereign fund AI investments. While the United States has committed $52 billion to AI-related funds and China $62 billion, the EU has established an intra-regional AI investment fund of €7.4 billion (approximately ¥1.2 trillion). Although the EU's investment figure is substantially smaller than the US and China in absolute terms, the strategy is to combine it with regulatory advantages over data within the region — converting "regulatory walls" into competitive leverage.

The Convergence of Local-First and Sovereign Cloud

Up to this point, "local-first" and "sovereign cloud" have been described as separate trends, but from an investment perspective, the most important structural insight is that these two movements are converging.

At first glance, local-first (storing data on the user's device) and sovereign cloud (storing data in a specific country's cloud) appear to be different approaches. However, the underlying philosophy is identical: "returning control of data to the entity that generated it — whether individual, organization, or nation-state."

Technically, local-first and sovereign cloud are also complementary. CRDT-based local-first applications require server-side infrastructure for synchronization and backup. If sovereign cloud is adopted as that infrastructure, the result is the most robust data sovereignty architecture possible — one where the primary data resides on the user's device and the cloud used for synchronization is fully contained within the relevant legal jurisdiction.

In an enterprise context, this convergence is taking concrete shape as a "zero-trust data architecture." Local-first applications run on employees' devices, with end-to-end encrypted data synchronized and backed up on sovereign cloud. Cloud providers cannot access the encrypted data, and even in the event of a disclosure request under the CLOUD Act, a provider without the decryption key cannot furnish any meaningful data.

In the startup ecosystem, a new category of companies is emerging based on this convergence thesis. Collaboration platforms built on local-first CRDT technology at their core, edge AI inference services on sovereign cloud, next-generation SaaS with end-to-end encryption and data portability as standard features — a cohort of technology companies implementing data sovereignty at the architecture level is emerging as the next major investment theme.

Impact on the Industry

The data sovereignty and local-first movements will bring the following irreversible changes to the structure of the technology industry.

First, the multipolarization of the cloud infrastructure market. The structure in which U.S. hyperscalers monopolize the global market will gradually transform under regulatory pressure. Sovereign cloud providers in Europe, Asia, and the Middle East will expand their market share using regulated industries in their respective regions as a foothold, and the cloud market will shift to a multi-layered structure of "global hyperscalers + region-specific sovereign providers."

Second, the design principles of software architecture will change. A paradigm shift will occur from "cloud-first" to "data sovereignty-first." In new software projects, the storage location and portability of data will be considered in the early stages of design, and local-first CRDT technology and end-to-end encryption will be incorporated as standard components.

Third, the geographic decentralization of AI development will accelerate. Large-scale AI model training has thus far been concentrated in the United States and China, but with the enforcement of the EU AI Act and expanded investment in sovereign AI infrastructure, autonomous AI development capabilities will be built in Europe, the Middle East, and Southeast Asia. The success of Mistral AI serves as a pioneering example of this geographic decentralization.

Fourth, a new category will emerge in the M&A market. Strategic acquisitions by major technology companies will intensify across the sovereign cloud, privacy tech, and local-first tools sectors. In particular, startups whose core consists of foundational CRDT libraries (Automerge, Yjs) and region-specific providers with operational expertise in sovereign cloud will hold high value as acquisition targets.

Fifth, the demand structure for digital talent will change. Demand will surge for legal and compliance professionals well-versed in data sovereignty, CRDT and distributed systems engineers, multi-cloud architects, and privacy engineering specialists. In particular, "regulatory engineers" who can cross-functionally understand GDPR, the EU AI Act, and data protection legislation across various countries and translate that understanding into technical implementation will become the scarcest and most high-value talent over the coming years.

As the sovereign cloud market grows to exceed one trillion dollars by 2034, data sovereignty will transform from a cost center into a source of brand value called "trust." An era in which companies that genuinely respect user data are chosen by the market is steadily approaching.


References: European Commission "EU Data Act," European Parliament "EU AI Act," Irish Data Protection Commission "TikTok GDPR Decision 2025," Martin Kleppmann et al. "Local-First Software: You Own Your Data, in spite of the Cloud" (Ink & Switch, 2019), FOSDEM 2026 Local-First Devroom, Gartner "Sovereign Cloud Market Forecast 2026-2034," Mistral AI Series C Announcement (2025), Airbus Sovereign Cloud Tender (2026), EuroStack Proposal Paper (2025), UK Government "Sovereign AI Unit," Crunchbase "Global VC AI Investment Report 2025," CLOUD Act (U.S. Congress, 2018), Gaia-X European Association for Data and Cloud