Moreh and Tenstorrent Challenge NVIDIA with Cost-Efficient AI Inference

📊 Key Data
  • Moreh's MoAI Inference Framework on Tenstorrent's Galaxy systems matches or surpasses NVIDIA's DGX A100 performance in LLM inference.
  • Disaggregated serving architecture reduces reliance on expensive high-bandwidth memory (HBM), lowering infrastructure costs.
  • Tenstorrent's open-source Metalium software stack enables high-performance, production-grade LLM inference.
🎯 Expert Consensus

Experts view this collaboration as a significant step toward breaking NVIDIA's vendor lock-in, offering enterprises cost-efficient, flexible AI infrastructure alternatives with validated performance.

1 day ago
Moreh and Tenstorrent Challenge NVIDIA with Cost-Efficient AI Inference

Moreh and Tenstorrent Challenge NVIDIA with Cost-Efficient AI Inference

SANTA CLARA, CA – May 01, 2026 – By Sam Lidman

In a move poised to disrupt the AI hardware landscape, AI infrastructure software company Moreh has announced a significant breakthrough, demonstrating production-ready large language model (LLM) inference on Tenstorrent's Galaxy systems that rivals the performance of NVIDIA's widely used DGX A100. The achievement, powered by Moreh's proprietary 'MoAI Inference Framework,' signals the arrival of a viable, cost-effective alternative in a market long dominated by a single player.

The results, unveiled at Tenstorrent's TT-Deploy event in San Francisco, showed that across leading Mixture-of-Experts (MoE) models like GPT-OSS and DeepSeek, the combination of Moreh's software and Tenstorrent's hardware matched or surpassed the established NVIDIA benchmark. This validation is more than a technical milestone; it represents a strategic challenge to the status quo, promising enterprises a path away from vendor lock-in and toward more flexible, economically sound AI infrastructure.

A Strategy Against Vendor Lock-In

The core of Moreh's value proposition lies in its MoAI Inference Framework, a sophisticated solution designed to orchestrate a mix of processing hardware from different manufacturers. For years, enterprises have invested heavily in AI, often finding themselves locked into a single vendor's ecosystem, most notably NVIDIA's CUDA platform. This dependency limits hardware choice, stifles negotiation power, and can lead to spiraling costs. Moreh's framework directly confronts this issue by enabling the unified operation of heterogeneous hardware—including GPUs from NVIDIA and AMD alongside Tenstorrent's AI-specific processors—all within a single, cohesive cluster.

This heterogeneous approach allows businesses to adopt a more pragmatic and resilient infrastructure strategy. Instead of a complete hardware overhaul, companies can integrate newer, specialized chips like Tenstorrent's alongside their existing GPU investments. The MoAI framework abstracts the underlying hardware complexity, allowing data scientists and ML engineers to deploy models without needing to re-engineer their software for each specific chip. This flexibility is critical for long-term strategic planning, giving IT decision-makers the freedom to choose the best processor for a given task and budget, rather than being constrained by a proprietary software stack.

Redefining the Economics of AI Inference

Beyond performance parity, Moreh's announcement places a strong emphasis on improved cost efficiency. The company has implemented a 'disaggregated serving architecture,' an innovative approach that tackles one of the biggest cost drivers in modern AI: high-bandwidth memory (HBM). LLM inference involves two main stages: a compute-intensive 'prefill' phase to process the initial prompt and a memory-intensive 'decode' phase to generate subsequent tokens. Moreh's architecture cleverly splits these tasks, using Tenstorrent's Wormhole chips as dedicated prefill accelerators.

By offloading the prefill stage to Tenstorrent's processors, the system reduces the burden on the primary GPUs and, crucially, lessens the demand for expensive HBM attached to them. This disaggregation not only lowers the overall infrastructure cost but also allows for more efficient use of all available hardware. According to Moreh, this strategy, combined with cluster-level optimizations like prefix-cache-aware routing and smart auto-scaling, can deliver significantly higher throughput with fewer servers. The ability to integrate and maximize the utility of cost-effective AMD GPUs or even older-generation hardware further enhances the total cost of ownership (TCO) advantage, ensuring that every processor in the cluster contributes effectively to performance.

Tenstorrent's Moment: A Major Ecosystem Validation

This collaboration marks a pivotal moment for Tenstorrent. Led by legendary chip architect Jim Keller, the company has been developing its RISC-V-based AI processors as a fundamental alternative to traditional GPUs. However, new hardware is only as powerful as the software that supports it. Moreh's success in running production-grade LLMs on Tenstorrent's Galaxy system provides the critical third-party validation needed to build confidence in its ecosystem.

Tenstorrent's 'Networked AI' architecture, which unifies compute, memory, and networking into a single system, is designed for massive scale-out performance. Its commitment to an open-source software stack, Metalium, stands in stark contrast to the closed, proprietary nature of its main competitors. Moreh, a major external contributor to Metalium, has demonstrated that this open approach can yield tangible, high-performance results. This is particularly appealing for organizations focused on sovereign AI, defense, or financial services, where transparency and control over the full technology stack are paramount.

As a strategic partner, Moreh is helping to build out the software layer that makes Tenstorrent's hardware accessible and powerful for real-world applications. Moreh CEO Gangwon Jo stated, "Achieving production-grade LLM inference performance and stability on Tenstorrent-based systems marks a significant milestone." He added, "We will continue to enhance performance through deeper optimization across heterogeneous architectures and closer integration with Tenstorrent NPUs."

Navigating a Shifting Competitive Landscape

While the Moreh-Tenstorrent partnership presents a compelling new option, it enters a fiercely competitive market. NVIDIA continues to set the pace with its H100 and H200 GPUs, which offer formidable performance, especially in memory-bandwidth-intensive scenarios. The power of NVIDIA's incumbency lies not just in its hardware but in its deeply entrenched CUDA software ecosystem, which represents a significant barrier to entry for any rival.

At the same time, the industry is clearly trending toward heterogeneous computing as a solution to the economic and technical challenges of scaling AI. Industry observers note that while the performance figures from the TT-Deploy event are promising, the market will be watching closely for independent, third-party benchmarks and evidence of successful, at-scale deployments in real-world data centers. The ability to seamlessly integrate with existing orchestration and model-serving stacks will be another critical factor for adoption.

Moreh's strategy, which also leverages its experience with AMD GPU environments and its own LLM subsidiary, Motif Technologies, positions it as a key enabler in this new multi-vendor era. By providing the software glue that binds diverse hardware together, the company is not just validating a single new chip but is championing a more open, flexible, and sustainable future for AI infrastructure.

Sector: AI & Machine Learning Software & SaaS Fintech
Theme: Generative AI Large Language Models Industry 4.0 Geopolitics & Trade
Event: Partnership Joint Venture
Product: ChatGPT ETFs Mutual Funds
Metric: Revenue

📝 This article is still being updated

Are you a relevant expert who could contribute your opinion or insights to this article? We'd love to hear from you. We will give you full credit for your contribution.

Contribute Your Expertise →
UAID: 29233