Colossus is responsible for training various AI models for the company X (formerly Twitter). First of all, it serves the Grok 3 chatbot, available only to X Premium subscribers.
Elon Musk's new expensive project, the xAI Colossus AI supercomputer, has been presented in detail for the first time. Tom's Hardware writes about it.
YouTube channel ServeTheHome gained access to the servers and removed several parts of the server, giving a chance to see it in action. The device received GPU servers — Nvidia HGX H100s containing eight H100 GPUs each. The HGX H100 platform is packed with a universal 4U GPU liquid cooling system from Supermicro. Servers are loaded into racks, — 64 GPUs per rack. 1U headers are located between each HGX H100, providing the liquid cooling required by the servers. At the bottom of each rack is another Supermicro 4U unit, this time with a backup pump system and a rack monitoring system.
Racks are combined in groups of eight, which is 512 graphics processors per array. Each server has four redundant power supplies, and the rear of the GPU racks shows 3-phase power supplies, Ethernet switches, and a rack-sized header that provides all liquid cooling.
There are more than 1,500 racks of GPUs in the Colossus cluster, or about 200 rack arrays. According to Nvidia CEO Jensen Huang, the GPUs for these 200 arrays were fully installed in just three weeks.
200% Deposit Bonus up to €3,000 180% First Deposit Bonus up to $20,000Each HGX H100 server processes 3.6 terabits of information per second. The entire cluster runs on Ethernet rather than InfiniBand or other connections that are standard in the supercomputer industry.
The xAI Colossus supercomputer is the largest AI supercomputer in the world, Nvidia claims. Many supercomputers are used by various companies and research institutes, but Colossus is responsible for training various AI models for company X (formerly Twitter). It primarily serves the Grok 3 chatbot, available only to X Premium subscribers. ServeTheHome also reported that Colossus is training AI “of the future” — models whose uses and capabilities are likely beyond the capabilities of today's flagship AI.
The first phase of Colossus construction is complete and the cluster is fully operational, but not everything is ready yet. The Memphis supercomputer will soon be upgraded to double its GPU power by adding 50,000 additional H100 GPUs and 50,000 next-generation H200 GPUs.
This will also more than double its power consumption, which is already too much for Musk's 14 diesel generators. added to the site in July. It also falls short of Musk's promise of 300,000 H200 inside the Colossus, although that could be the third phase of the upgrade.
The Cortex supercomputer with 50,000 graphics processors at Tesla's “Giga Texas” factory also belongs to Musk's company. Cortex is involved in training Tesla's unmanned AI technology using only camera and image detection, as well as Tesla's autonomous robots and other AI projects.
Tesla will also soon see construction of the Dojo supercomputer in Buffalo, New York, a project worth 500 million dollars. With industry speculators such as Baidu CEO Robin Le predicting that 99% of AI companies will collapse when the bubble bursts, it remains to be seen whether Musk's record spending on AI will pay off or backfire.