The tweet was deleted by the author.
But we saved everything 🙂.
Artificial intelligence is moving beyond text and images—it is gradually learning to create virtual environments and act within them. This approach is known as world models: systems that recreate space, objects, and rules of interaction, where every action has a consequence. This paradigm could become the key to robotics, autonomous transport, and complex AI agents—but there is a catch that is currently slowing progress.
Most modern models are excellent at analyzing data and generating responses, but they lack an “intuition” for space and cause-and-effect relationships. They can describe what should be done but often do not understand what will happen after an action: where exactly an object will end up, what will collide, or how the environment will change.
World models close this gap. They give AI a training ground where decisions can be tested safely, routes can be planned, mistakes can be avoided, and outcomes can be predicted. For robotics, autonomous vehicles, and AI agents, this is not a bonus—it is a foundation, the basis on which reliable behavior in the real world is built.
In practice, two main approaches are used today. The first is real-time dynamic simulation. In this case, the environment is not stored in advance. It is generated frame by frame as a user or agent moves through space, changes viewpoint, or interacts with objects. The model continuously predicts how the state of the environment should change, taking physics and object behavior into account.
This approach offers high flexibility and allows environments to be created without rigid, predefined scenarios. At the same time, it requires significant computational resources, which is why the stability of such simulations is currently limited to just a few minutes.
This is the path Google is taking with its research platform Genie 3, which creates short-lived but logically consistent 3D environments. A similar approach is used by Meta in its Habitat 3 platform, designed for training physical AI agents and robots.
The second approach focuses on persistent, saved environments. Here, the model converts text, images, or video into a full-fledged three-dimensional scene with geometry, digital objects, and metadata describing physical processes. Such a world can be saved, imported into other software environments, and reused.
This direction is being developed by World Labs under the leadership of Fei-Fei Li. Their Marble model is aimed at creating portable 3D environments suitable for engineering, scientific, and design tasks, where stability and reproducibility of results are critical.
Developing all of these models requires substantial capital expenditure, and this is already reflected in the strategies of major technology companies.
Meta Platforms plans to increase capital investments to $135 billion, betting on AI as the core infrastructure of its future products. After restructuring its AI division, the company is preparing new models and platforms, while strong financial performance in its advertising business allows it to fund these investments. The market has responded positively to this strategy.
Tesla and Elon Musk’s xAI have chosen a different approach. The company plans to spend around $20 billion on AI, autonomous driving, and robotics, with additional investments into xAI. Musk has publicly emphasized the need for proprietary semiconductor infrastructure, underscoring his bet on full control over the entire stack—from models to computation.
For both strategies, world models are not an end product but a training environment without which further progress in autonomous systems slows down or becomes too risky.
For the market, world models are neither a standalone product nor a new consumer AI segment. Investors view them as an infrastructure layer that will determine companies’ competitiveness in the next development cycle of the industry.
This is a long-term bet. Companies that are first to teach AI to work with space, motion, and cause-and-effect relationships will gain an advantage across all autonomy-related fields—from robotics to industrial applications and transportation. That is why the market today is willing to tolerate sharp increases in capital expenditures and the absence of quick returns.
Investor reaction to Meta’s plans is telling. Despite massive AI investments, the company’s shares rose after earnings—markets believed that the core business could finance these costs without losing stability. In this case, world models are considered an extension of an existing platform rather than a risky experiment.
Musk’s bet carries a different risk profile. Tesla investors are effectively financing not only AI development but also an attempt at vertical integration—from models to chips. This strategy is pricier and more complex, but if successful, it gives the company full control over the key components of autonomous systems.
In the end, the market is not betting on a specific technology but on an approach. Investors are assessing whether a company can endure a long investment cycle and whether it has a business capable of funding the development of world models without putting pressure on short-term profitability.