What Are World Models——AI That Predicts the "Next Physical State"
World Models are systems in which AI learns internal representations for understanding how the physical world works and predicting future states.
Whereas LLMs (Large Language Models) predict the "next token (word)," World Models predict the "next physical state." LLMs learn language patterns from text data, but they cannot fundamentally understand the causal relationships of the physical world—why objects fall, the conditions under which liquid spills from a cup, the sequence of actions a robot needs to open a door. World Models learn a compressed representation of the environment (a latent space) and simulate future states within that representational space, giving AI the ability to "mentally try out outcomes before acting."
Humans do this unconsciously. We predict the trajectory of a ball before throwing it, anticipate the movements of other vehicles while driving, and intuitively adjust the angle at which we tilt a glass of water. This is the capability that cognitive science calls "mental models" or "intuitive physics," and World Models are an attempt to reproduce it computationally.
History of Development — From Dyna to the "Year One of World Models"
The history of World Models traces back to the early days of reinforcement learning.
In 1991, Richard Sutton (Professor at the University of Alberta, known as the father of reinforcement learning) introduced the Dyna Architecture. He formalized the concept that "planning is trying things out in your head," proposing an integrated architecture that interleaves real-world action, learning, model updates, and planning. This became the foundation of model-based reinforcement learning.
In 2018, David Ha and Jurgen Schmidhuber (IDSIA) published the paper "World Models," giving this field its defining name. They combined a VAE (Variational Autoencoder) and an RNN (Recurrent Neural Network) to unsupervisedly learn compressed spatial and temporal representations of the environment, demonstrating that an agent could be trained inside its own "hallucinated dream" and the learned behavior could be transferred to the real environment.
In 2022, Yann LeCun (then VP and Chief AI Scientist at Meta FAIR) published "A Path Towards Autonomous Machine Intelligence," proposing the concept of JEPA (Joint Embedding Predictive Architecture). The core idea is to perform predictions in an abstract representation space rather than pixel space — ignoring unpredictable details and understanding the world at the level of abstract features — an approach said to resemble how biological brains model their environment. LeCun publicly declared that "LLMs will never lead to AGI," arguing that World Models are the only path to AGI.
In 2023, Google DeepMind's DreamerV3 was published in Nature. A general-purpose algorithm that outperformed specialized methods across more than 150 diverse tasks, it demonstrated the ability to learn an environment model and improve behavior through imagined scenarios using a single configuration.
2024 marked a turning point. Google DeepMind announced Genie (February 2024, generating interactive 2D environments from a single image) and Genie 2 (December 2024, generating action-controllable 3D worlds). Fei-Fei Li (Professor at Stanford) founded World Labs, raising $230 million (approximately ¥34.5 billion). Meta released V-JEPA (abstract feature prediction from video). A coalition of 20 AI research institutions released Genesis, an open-source robotics simulation platform.
From 2025 to 2026, World Models entered a period of explosive acceleration. NVIDIA unveiled Cosmos at CES 2025, Google DeepMind's Genie 3 achieved real-time 3D world generation at 24fps, Meta's V-JEPA 2 achieved zero-shot robot planning with only 62 hours of training data, and Runway announced GWM-1. Then in March 2026, LeCun left Meta for the first time in 12 years to found AMI Labs with a $1.03 billion (approximately ¥154.5 billion) seed round. It was the largest seed round in European startup history and was described as the "biggest contrarian bet" against LLMs. That same month, LeWorldModel (LeWM) — with just 15 million parameters — demonstrated performance surpassing models ten times its size after only a few hours of training on a single GPU, pointing to the possibility of democratizing World Models.
Key Companies and Products — The Physical AI Ecosystem
The Physical AI ecosystem centered on World Models is rapidly taking shape.
NVIDIA provides the foundation for this field through its Cosmos platform. Three models are offered as open source: Cosmos-Predict2.5 (simulating future states of the world), Cosmos-Transfer2.5 (world simulation based on spatial control inputs), and Cosmos-Reason2 (understanding and reasoning about physical common sense). Omniverse (a digital twin platform) has been adopted by Foxconn, Delta Electronics, Siemens, and others for factory simulation, while Isaac Sim (robotics simulation) is being leveraged by Alphabet Intrinsic and others. The GR00T foundation model for humanoid robots employs a Vision-Language-Action (VLA) architecture and is being provided to robotics companies including 1X Technologies, Figure AI, and Agility Robotics.
Google DeepMind is at the cutting edge with Genie 3. It generates 3D worlds at 720p in real-time at 24fps from text prompts, enabling object interaction, physical law adherence, and prediction of other agents' behavior. SIMA 2 is an AI agent that operates within this world model, and a "boot camp" approach is being researched in which SIMA 2 solves millions of tasks in environments generated by Genie 3. CEO Demis Hassabis has stated that "two things are needed to achieve AGI: world models and automated experimentation," revealing that he spends the majority of his research time on world models.
AMI Labs (founded 2026 by Yann LeCun) specializes in developing World Models based on the JEPA architecture. Its $1.03 billion (approximately ¥154.5 billion) seed round represents an attempt to commercialize LeCun's 12 years of Meta FAIR research as an independent company. The company launched with Meta Europe VP Laurent Solly as COO, Saining Xie as CSO, and a pre-money valuation of $3.5 billion (approximately ¥525 billion). LeWorldModel (LeWM) is an ultra-lightweight model with just 15 million parameters, yet encodes each frame into a single 192-dimensional token (1/200th the token count of conventional approaches), achieving a 48x speedup in planning.
World Labs (founded by Fei-Fei Li) focuses on "Spatial Intelligence," building AI that understands and reasons about 3D worlds. Its first product, "Marble," generates and edits persistent 3D environments from text, images, video, and 3D layouts. The company has raised a cumulative $1.23 billion (approximately ¥184.5 billion) with a valuation of approximately $5 billion (approximately ¥750 billion). Its major investors include AMD, Autodesk (which invested $200 million), NVIDIA, and Fidelity.
Runway unveiled GWM-1 (announced December 2025), a world model that accounts for the laws of physics, with a vision of "a general-purpose world model capable of simulating every possible world and experience." In February 2026, the company raised $315 million (approximately ¥47.25 billion), reaching a valuation of $5.3 billion (approximately ¥795 billion).
Waymo has built the Waymo World Model on top of Google DeepMind's Genie 3, using it to generate rare, safety-critical "long-tail" scenarios. Wayve is advancing end-to-end autonomous driving simulation with GAIA-3 (15 billion parameters) and, together with Uber and Nissan, is planning a robotaxi trial operation in Tokyo in the second half of 2026.
The Giants of Robotics——The Greatest Beneficiaries of World Models
The area where the evolution of World Models is most directly transforming industry is robotics.
Skild AI raised $1.4 billion in a Series C in January 2026, reaching a valuation of over $14 billion. Total funding exceeds $2 billion. "Skild Brain" is a foundation model for all robots, generating approximately $30 million in revenue from scratch within months in 2025. SoftBank and NVentures are the lead investors.
Physical Intelligence (Pi) raised $600 million in November 2025 at a valuation of $5.6 billion. As of March 2026, it is in discussions for a new round of approximately $1 billion, with its valuation expected to exceed $11 billion. CapitalG, Lux Capital, and Jeff Bezos are the lead investors.
Figure AI raised $1 billion in a Series C at a valuation of $39 billion. The company is developing its third-generation humanoid, Figure 03, with plans to ship 100,000 units over four years. Intel, NVIDIA, and Qualcomm are among its investors.
1X Technologies' NEO robot (weighing 66 lbs with a lifting capacity of over 150 lbs) is equipped with the "1X World Model AI" and is set to begin shipping in the United States in 2026 at $20,000. Agility Robotics' Digit is the only commercially deployed humanoid robot, with a track record of moving over 100,000 totes at GXO facilities.
Toyota Research Institute (TRI) has developed Diffusion Policy (mastering over 60 dexterous skills) and Unified World Models (UWM, an integrated framework for video and action data), accelerating its research through a partnership with Boston Dynamics (October 2024).
Component Technologies — From JEPA to 3D Gaussian Splatting
The underlying technologies supporting World Models are diverse.
The core of the JEPA architecture is performing predictions in representation space rather than pixel space. An encoder maps frame observations to low-dimensional latent representations, and a predictor models environment dynamics in latent space. LeWM consists of a ViT-Tiny encoder (~5 million parameters) and a Transformer predictor (~10 million parameters), totaling just 15 million parameters.
Video prediction models function as implicit world models. OpenAI explicitly positioned Sora as a "world simulator," defining the process by which video generation learns physical laws from data as a form of World Models. NVIDIA's Cosmos and Runway's GWM-1 follow the same approach.
3D representation techniques are also evolving rapidly. NeRF (Neural Radiance Fields) represents scenes as continuous 5D functions, while 3D Gaussian Splatting represents scenes as a collection of anisotropic Gaussians. The latter enables fast rendering and has become a key technology for AR/VR and robotics since 2025. GWM (Gaussian World Models) is a world model for robot manipulation based on 3D Gaussian Splatting representations, enabling action-conditioned 3D video prediction.
Innovations in physics engines are also noteworthy. Genesis is 10–80× faster than conventional GPU-accelerated simulators, enabling training at 10,000× real-world speed (compressing 10 years into one hour). It can generate scenes, tasks, rewards, and physically accurate videos from language prompts.
Application Fields — From Autonomous Driving to Digital Twins
The applications of World Models span a wide range of fields, with autonomous driving leading the way.
Autonomous driving is the most mature application domain. The Waymo World Model is used to generate rare "long-tail" scenarios, and Wayve's GAIA-3 is used for end-to-end driving evaluation. The robotaxi market is projected to grow from approximately $2 billion (around ¥300 billion) in 2024 to $40–104 billion (approximately ¥6 trillion–¥15.6 trillion) by 2030 (CAGR exceeding 60–90%).
Industrial digital twins are being led by NVIDIA Omniverse. Foxconn, Siemens, and Delta Electronics have adopted it for full-factory simulation, leveraging it for production line optimization, failure prediction, and design verification of new lines. The digital twin market is expected to expand from $21–33 billion (approximately ¥3.15 trillion–¥4.95 trillion) in 2025 to $49–150 billion (approximately ¥7.35 trillion–¥22.5 trillion) by 2030.
Scientific simulation has seen NOAA begin full-scale operation of an AI-driven global weather forecasting model, achieving global simulation at 2.5 km resolution with the ICON model (2025 Gordon Bell Prize). The hybrid physics + AI approach significantly reduces computational costs.
Game and virtual world generation features Google DeepMind's Project Genie (publicly released in January 2026), which generates interactive worlds from text, and World Labs's Marble, which provides generation and editing of persistent 3D environments.
Silicon Valley VC Perspective — "Physical AI is the Next Mega-Trend"
Silicon Valley VCs are positioning World Models as the next investment theme after LLMs.
a16z (Andreessen Horowitz) formed a new fund in January 2026 including $15 billion (approximately ¥2.25 trillion), reaching over $90 billion in assets under management. They are focusing on the "deployment gap" in Physical AI——where cutting-edge research is advancing rapidly, but the robots actually deployed are still "classical"——and analyzing that fine-tuning from building general-purpose capabilities to specific tasks is the key.
Sequoia Capital assessed that "step-function changes are being seen in voice, video, and robotics," and invested in Skild AI and Physical Intelligence. They invited NVIDIA's Jim Fan (Head of GEAR Lab) to their podcast to discuss the theme "Robots Thinking Fast and Slow."
Khosla Ventures saw Vinod Khosla himself declare that "AI will transform not only the digital world, but the physical world as well," co-leading a $51 million Series A in BrightAI (Physical AI) and leading a $750 million Series C in Waabi (autonomous trucking). They clearly recognize the potential of AI models beyond LLMs.
Of the 189 new unicorns in 2025, 47 (25%) are AI-native companies, and World Model-related fundraising ranks in the top 3% of CB Insights market rankings.
Celebrity Views——"The Only Path to AGI"
Notable figures' views on World Models show an unusual degree of consensus.
Yann LeCun (CEO of AMI Labs) takes the strongest stance: "The industry's current obsession with LLMs is misguided. They will ultimately fail to solve many important problems." He argues that JEPA-based systems learn representations of the world by predicting abstract features of sensory input — an approach closer to the biological brain. His departure from 12 years at Meta FAIR to go independent at a valuation of $1.03 billion speaks to the depth of this conviction.
Jensen Huang (CEO of NVIDIA) declared at CES 2026: "The ChatGPT moment for Physical AI has arrived — the moment when machines begin to understand, reason about, and act in the real world." He released Cosmos as open source, positioning it as "a game changer for robotics and industrial AI."
Demis Hassabis (CEO of Google DeepMind) stated: "Two things are needed to achieve AGI. A world model — for AI to truly understand physics and space. And automated experimentation — for AI to solve fundamental problems in materials, fusion, and more, hands-on." He predicts AGI is "5 to 10 years" away.
Fei-Fei Li (Stanford Professor, founder of World Labs) defines spatial intelligence as "the ability to reason about how the 3D world works, rather than relying on 2D data," and is driving applications in gaming, VFX, VR, and robotics with $1.23 billion in funding.
Jim Fan (Head of NVIDIA GEAR Lab) predicts that "2026 will be the first year Large World Models lay the foundation for robotics and chart a new course toward multimodal, embodied AGI."
World Models by the Numbers——A Rapidly Expanding Market
Market data related to World Models/Physical AI shows rapid expansion.
The Physical AI software platform market is projected to grow from $2.1 billion in 2025 to $17.2 billion in 2030 (CAGR 42%). The humanoid robotics market is expected to expand from $1.9–2.9 billion in 2025 to $4.0–15.3 billion in 2030. The digital twin market is forecast to reach $49–150 billion in scale by 2030.
Corporate valuations have also surged dramatically. Figure AI ($39B), Skild AI (over $14B), Physical Intelligence ($5.6B → $11B in negotiations), Runway ($5.3B), World Labs (~$5B), AMI Labs ($3.5B) — in just two years from 2024 to 2026, World Models-related unicorns have proliferated.
Japan's Physical AI market is projected to grow from $307 million in 2025 to $6.76 billion in 2035 (CAGR 36.2%). The Japanese government approved its first National AI Basic Plan in December 2025 and announced $6.34 billion (1 trillion yen) in AI support measures over five years starting in fiscal year 2026. Japan, with its tradition in manufacturing and robotics, could become a priority market for Physical AI in the transition from "precision to intelligence." With a projected shortage of 11 million workers by 2040, demand for robotics is structurally unavoidable.
Challenge — The Wall to Overcome
The future of World Models is bright, but challenges remain to be overcome.
Computational cost is the biggest bottleneck. Transformers and Diffusion Networks are powerful but inference-heavy, which conflicts with the real-time control requirements of robotics. LeWM's 15-million-parameter model outperforming models ten times its size is a promising answer to this challenge.
The Sim-to-Real gap——the problem of policies trained in simulation degrading in performance in the real world——remains a fundamental challenge. Learning that "exploits" inaccurate dynamics within the simulation can occur. Countermeasures such as domain randomization and Real-to-Sim-to-Real pipelines are being researched.
Evaluation metric issues are also serious. Existing metrics such as FID and FVD prioritize pixel fidelity but do not measure physical consistency, dynamics, or causal relationships. A standard evaluation framework for Physical AI has yet to be established.
Data requirements are also a limiting factor. There is a lack of unified large-scale datasets spanning the diverse domains of robotics (navigation, manipulation, autonomous driving, etc.). However, synthetic data generation platforms like Genesis are beginning to alleviate this challenge.
Future Outlook——Shift in Center of Gravity from LLMs to World Models
Industry leaders are optimistic about the future of World Models.
2026 is positioned as the "Year One of World Models." AMI Labs and World Labs launch in earnest, real-time 3D world generation (Genie 3) becomes a reality, and Hassabis predicts that "agentic systems will reach a truly impressive and reliable level." A Tokyo robotaxi pilot operated by Wayve/Uber/Nissan is planned for the second half of 2026.
2027–2028 will see the start of mass production of humanoid robots, with Figure AI planning to ship 100,000 units and Agility Robotics scaling to thousands of units per year.
By 2030, the Physical AI software market is projected to reach $17.2 billion, the robotaxi market $40–104 billion, and robotaxi services are expected to operate in more than 200 cities.
The most important trend is the convergence of LLMs and World Models. The fusion of next-token prediction for text and next-state prediction for physical states is advancing, with multimodal models (vision + language + action) accelerating this convergence. If Jensen Huang's "ChatGPT moment for Physical AI" proves correct, 2026 will be remembered as the starting point.
Impact on the Industry
First, the rise of World Models is shifting the center of gravity in AI research from text/language models toward understanding the physical world. LeCun's claim that "the obsession with LLMs is misguided" may sound extreme, but the massive investments in AMI Labs ($1.03B), World Labs ($1.23B), and Skild AI (over $2B) indicate that the VC market has reached a degree of consensus around this view.
Second, the robotics industry is emerging as the primary beneficiary of World Models. The valuations of Figure AI ($39B), Skild AI (over $14B), and Physical Intelligence ($5.6B → $11B in ongoing negotiations) have reached levels comparable to LLM startups. If the commercialization of humanoid robots accelerates in earnest around 2027–2028, the labor structures of manufacturing, logistics, and service industries will change fundamentally.
Third, platforms such as NVIDIA Cosmos, Google DeepMind Genie 3, and Genesis (open source) are democratizing the development infrastructure for World Models, lowering the barriers to entry for startups. The fact that ultra-lightweight models like LeWM, with just 15 million parameters, have demonstrated performance exceeding models ten times larger suggests the possibility of an approach fundamentally different from LLMs' "scaling-above-all-else" paradigm.
Fourth, Japan is positioned to become a priority market for Physical AI, given its tradition in manufacturing and robotics, a structural labor shortage of 11 million workers, and a government AI support package of 1 trillion yen. Signs of this include SoftBank's acquisition of ABB's robotics division, the Tokyo robotaxi plans involving Wayve, Uber, and Nissan, and the growing number of Japanese companies adopting NVIDIA Omniverse.
References: Yann LeCun, "A Path Towards Autonomous Machine Intelligence" (2022); Ha & Schmidhuber, "World Models" (arXiv: 1803.10122, 2018); Sutton Dyna Architecture (ACM, 1991); DreamerV3 (Nature, 2025); LeWorldModel (arXiv: 2603.19312, 2026); AMI Labs $1.03B Seed Round (TechCrunch, 2026/3); AMI Labs LeCun New Venture (MIT Technology Review, 2026/1); NVIDIA Cosmos Launch (NVIDIA Newsroom, CES 2025); NVIDIA Cosmos Major Release (NVIDIA Newsroom, 2026); World Labs $1B Funding (AI Insider, 2026/2); World Labs Marble Launch (TechBuzz); Google DeepMind Genie 2 Blog (2024/12); Google DeepMind Genie 3 Blog (2025/8); Project Genie Public Launch (Google Blog, 2026/1); Waymo World Model Blog (2026/2); Wayve GAIA-3 Launch; Runway $315M Raise (TechCrunch, 2026/2); Runway GWM-1 Release (TechCrunch, 2025/12); Skild AI $1.4B Series C (BusinessWire, 2026/1); Physical Intelligence $600M (Robot Report, 2025/11); Physical Intelligence $11B Talks (Bloomberg, 2026/3); Figure AI $1B Series C (Robot Report); 1X NEO Robot; Agility Robotics 100K Totes; TRI Diffusion Policy & Unified World Models; GR00T N1 Paper (arXiv: 2503.14734); Genesis Open Source (SiliconANGLE, 2024/12); Jensen Huang CES 2026 (Axios); Hassabis World Models & AGI (Humanoids Daily, JA Lookout); Jim Fan Sequoia Podcast; Fei-Fei Li Spatial Intelligence; a16z Physical AI Deployment Gap; a16z Big Ideas 2026; Sequoia AI in 2026; Khosla BrightAI Investment; Physical AI Software Market (MarketIntelo); Digital Twin Market (MarketsandMarkets); Humanoid Robot Market (MarketsandMarkets); Robotaxi Market (Grand View Research); AV Market (Goldman Sachs, Morgan Stanley); Japan AI Plan (Asia Tech Daily); Japan Physical AI Market (Acumen Research); Japan Robotics Intelligence Shift (Nichiboku); NOAA AI Weather Models; Scientific American World Models Revolution; V-JEPA (Meta AI Blog); Sora 2 (OpenAI); OpenAI Video Generation as World Simulators