COSIMO’s Geometric Video Aims to Fix AI’s Perception Problem
- 12.4 percentage points higher accuracy in AI models using Geometric Video vs. legacy video
- 78.5% fewer model parameters required with Geometric Video
- 27 times less GPU memory needed during inference with Geometric Video
Experts would likely conclude that COSIMO's Geometric Video represents a potentially transformative approach to AI perception, offering significant efficiency gains and cost savings over traditional methods, though real-world validation will be crucial.
COSIMO's Geometric Video Aims to Fix AI's Perception Problem
IRVINE, CA – May 13, 2026 – In a move that challenges the foundational principles of how machines see the world, Irvine-based startup COSIMO has launched a new technology it calls "Geometric Video." The company claims its new video format, purpose-built for artificial intelligence, can slash costs, boost accuracy, and dramatically accelerate the deployment of real-world AI systems like robotaxis and humanoid robots.
The announcement targets a core frustration within the multi-billion-dollar "Physical AI" industry. Despite immense investment in bigger models, more powerful GPUs, and vast data centers, the timelines for truly autonomous systems continue to slip. As COSIMO's launch materials pointedly note, "The Physical AI industry has been reduced to an anecdote," with delayed product launches and underwhelming demonstrations becoming commonplace. The industry's prevailing strategy has been one of brute-force scaling, but COSIMO argues the problem isn't the scale, but the source: the video itself.
A New Primitive for AI Perception
At the heart of COSIMO's argument is a simple yet profound observation: legacy video was never designed for AI. Digital video formats, from MPEG to H.264, were engineered to compress visual information for the human eye, prioritizing perceptual quality for people, not informational clarity for algorithms. For an AI, this pixel-based data is inherently noisy and inefficient. It forces models to expend massive computational resources just to infer the fundamental properties of the physical world—like shape, motion, and object permanence—from a stream of colored dots.
This inefficiency has led to an arms race. Companies developing autonomous systems have been forced to rely on increasingly complex and power-hungry solutions. These include sensor fusion—combining camera data with other sensors like LiDAR and radar—and deploying ever-larger neural networks on specialized, expensive hardware. While these methods have yielded progress, they contribute to the ballooning costs and development timelines that have plagued the industry. The brute-force approach requires more data, more powerful chips, and more energy, creating a cycle of escalating costs and complexity that has yet to deliver on the promise of widespread, reliable physical AI.
COSIMO proposes a paradigm shift by tackling the problem at its root. Instead of trying to build a better brain to interpret flawed data, the company has created what it claims is better data for the AI brain.
From Pixels to Pure Geometry
COSIMO's solution is a proprietary "Physics Engine" that transforms raw sensor data into Geometric Video. Unlike traditional video, which captures a scene as a grid of pixels, Geometric Video encodes the underlying geometry of shapes and their motion directly into the data stream. The process, according to the company, "strips out the noise" to represent objects and their movement in a "deterministic, mathematically pure form."
The technical core of this transformation is the COSIMO Deterministic Structural Transform (DST) kernel, which produces a Sparse Geometric Matrix (SGM). The company notes that this kernel is stateless, uses efficient fixed-point integer math, and does not require learned weights, suggesting it can operate with high speed and low computational overhead. By pre-processing the visual world into a language of geometry that is native to mathematics and physics, the technology provides AI models with a much cleaner, more direct representation of reality.
This fundamentally alters the task for the AI. Instead of needing to learn the physics of the world from scratch by analyzing millions of hours of pixelated video, the AI is fed a stream that already describes the world in terms of its essential geometric properties. This could dramatically reduce the size and complexity of the AI models required for perception tasks.
The Economic and Performance Equation
The true test of any new technology lies in its performance, and COSIMO has released a set of striking benchmark figures to back its claims. The company tested Geometric Video against a legacy video baseline using the UCF-101 dataset, a widely recognized academic benchmark for action recognition in video.
Across five separate training runs on NVIDIA L4 hardware, the results were significant. AI models using Geometric Video reportedly achieved +12.4 percentage points higher accuracy than those using legacy video. More impressively, they did so with 78.5% fewer model parameters and required 27 times less GPU memory during inference. The company also highlights a run on a five-year-old MacBook Pro, where the system processed frames at 1.17 milliseconds each while consuming under one watt of power, demonstrating its potential for low-power edge devices.
Perhaps most critically for developers, the results showed 3 times tighter accuracy clustering across the test runs. This suggests a level of stability and predictability that is often absent in the complex world of deep learning, where slight changes can lead to unpredictable performance. COSIMO asserts this stability makes the system "stable enough to debug like source code," a claim that will resonate with any engineer who has wrestled with the stochastic nature of AI training. To bolster these claims, the company states that all test runs are publicly available and cryptographically verified via its website.
The economic implications, if these numbers hold up in real-world applications, are staggering. COSIMO projects that a Tier-1 Physical AI company could save $8 to $10 billion annually in compute, storage, and power costs. On a micro level, it estimates savings of $2,700 per edge device. Beyond direct financial savings, the company claims its technology can accelerate time-to-market by 6 to 12 months—an invaluable advantage in a competitive landscape.
A Potential Paradigm Shift for Physical AI
COSIMO's technology enters a field dominated by tech giants. NVIDIA's Metropolis platform, Google's Gemini-powered robotics initiatives, and Tesla's vision-only Full Self-Driving (FSD) system are all tackling the challenge of AI perception with immense resources. These established players have focused on building more powerful hardware, like custom AI chips, and more sophisticated software models to process traditional video and sensor data.
Tesla, for example, has famously bet its future on using a fleet of cameras and powerful neural networks to solve autonomous driving, eschewing other sensors like LiDAR. This vision-only approach hinges on the ability of its AI to extract all necessary information from conventional video streams. Google and others often employ sensor fusion, combining data from multiple sources to build a more robust model of the world.
COSIMO's Geometric Video does not necessarily replace these efforts but rather reframes the problem. It could be seen as a foundational layer that makes all subsequent processing more efficient. An autonomous vehicle using Geometric Video might still employ sensor fusion, but the information from its cameras would be far richer and less computationally demanding to interpret. It could enable Tesla's vision-only approach to run on cheaper hardware or achieve a higher level of reliability.
By creating a video primitive that is inherently machine-readable, COSIMO is betting that the industry will choose a more elegant and efficient solution over the current brute-force approach. If Geometric Video delivers on its promise of higher accuracy with dramatically fewer resources, it could lower the barrier to entry for new players and allow existing ones to reallocate massive capital from data centers to deployment. The industry has long been waiting for a breakthrough to move past incremental gains, and redefining the very data that AI sees may be the fundamental shift it needs.
📝 This article is still being updated
Are you a relevant expert who could contribute your opinion or insights to this article? We'd love to hear from you. We will give you full credit for your contribution.
Contribute Your Expertise →