Untether AI – Where Performance Intersects with Efficiency – by Moor Insights & Strategy

The market hype surrounding AI has ushered in yet another gold rush within the semiconductor industry. With new AI semiconductor startups being regularly announced, and existing startups regularly receiving additional rounds of funding, the industry is filled with a wealth of startups all promising to deliver a better mouse trap. On the other hand, mature semiconductor companies are actively repositioning their portfolios so that they too can cash in on the AI wave. Both startups and mature companies alike offer the bold promise that their solution will uniquely and forever disrupt the AI landscape.

A recent report published by Moor Insights & Strategy looks closely at the AI landscape both from the perspective of ML training and AI inference and draws out some very key conclusions regarding both areas. In the area of inference, there is a need for a solution that has been optimized to address the essential requirements of high compute performance, low latency, and low-power. The Moor Insights & Strategy report concludes how the combination of a unique at-memory compute architecture, combined with the strength of the underlying engineering team, makes Untether AI a clear standout in AI inference solutions.  

Moor Insights & Strategy indicates that it is expected that inference will represent over 70% of the AI market. Inference demands a very different set of requirements when compared to training, which is primarily focused on performance without real consideration of power, latency, or cost. Training typically employs accelerators, typically GPUs, to address complex computations through parallelism. These GPUs are housed in large server racks that reside in a centralized data center where an expansive power and cooling infrastructure exists. As a result, product cost and latency are generally lower-order considerations. 

The high latency and low power efficiency of GPUs and CPUs based upon a traditional von Neumann architecture are not practical for AI inference applications which demand real-time performance with modest power budgets. This gives rise to the need for customized architecture to overcome the limitations of these traditional processor solutions. To be clear, the most significant consumer of power in inference is rooted in data movement. 90% of power consumption can be attributed to the task of data transfer to an external memory. Inference solutions are expected to deliver results that exceed those of most GPUs, but do so in a significantly lower power budget, thus reinforcing the need for an alternative architecture that is optimized for inference.

Moor Insights concluded that: “The key to Untether AI’s performance and efficiency is its implementation of at-memory architecture. Breaking from the legacy of von Neumann removes the bottleneck associated with latency and power. By moving compute to where data resides (at-memory), the company claims incredible reduction (up to 6x) in latency and power consumption.” 

Furthermore, Moor Insights stated: “The speedAI family has an affinity for generative AI applications as well as convolutional neural networks (CNNs) and its emerging alternative for image recognition and classification, vision transformers (ViT). This is important to call out as it demonstrates the Untether engineering team’s awareness in designing an open architecture that has wide applicability across the variety of use cases. This combination of vision, experience, and proven delivery is why MI&S is so bullish about the prospects of Untether AI.”

To read the complete report, download here.

Contributed by Robert Bielby, Untether AI