crashbandicoothogwild| Musk and his "super computing power factory"

editor2024-05-26 11:31:2015News

Transferred fromCrashbandicoothogwildThere is a new Newin

It will be four times the size of today's largest GPU cluster and will be operational by next autumn.

According to the latest report from The Information, Musk said in a presentation to investors in May that he hopes the supercomputer will be up and running by the fall of 2025 and will be personally responsible for delivery on time; upon completion, it is expected that the chipsets connected will be at least four times the size of the largest GPU cluster today, such as the GPU cluster built by Meta Platforms to train its AI model.

Musk has said publicly that xAI will need as many as 100000 GPU   to train and run its next version of Grok. To make chatbots smarter, Musk recently told investors that xAI plans to connect all these chips into a supercomputer, or computing superfactory (Gigafactory of Compute).

XAI may work with Oracle to develop this supercomputer. XAI has been discussing with Oracle executives the possibility of spending 10 billion dollars renting cloud servers over the next few years. Currently, xAI has rented about 1 from Oracle.CrashbandicoothogwildWith .60,000 H100 chip servers, it is also the largest customer of such chips in Oracle.

The supercomputer is expected to cost billions of dollars and generate enough power to catch up with better-funded competitors who plan to launch a similarly sized AI cluster next year and a larger cluster in the future.

Clustering refers to a large number of server chips connected by cables in a single data center so that they can perform complex calculations at the same time in a more efficient way. Leading AI companies and cloud providers believe that having larger clusters with more computing power will lead to stronger AI.

XAI's office is located in the San Francisco Bay area, but the most important factor determining the location of the AI data center is the power supply. It is reported that a data center with 100000 GPU may require 100MW of dedicated power.

This will require much more power than traditional cloud computing centers, comparable to the energy needs of AI centers that cloud providers currently operate and build to accommodate multiple clusters, which are increasingly built in remote or non-traditional locations, where electricity is cheaper and more plentiful.

Earlier, it was reported that Microsoft and OpenAI were building a large data center in Wisconsin independent of $100 billion supercomputers at a competitive cost of about $10 billion, while Amazon Web Services was building some AI data centers in Arizona.

According to Musk's schedule, xAI still lags behind its rivals. By the end of this year or early next year, OpenAI and its main backer Microsoft may have a cluster of the size Musk envisioned. OpenAI and Microsoft also discussed developing a $100 billion supercomputer that would be several times the size of Musk's vision and contain millions of Nvidia GPU.

Nvidia CFO Colette Kress has added xAI to its list of six customers who, along with OpenAI, Amazon, Google and other companies, will be the first to use Nvidia's next-generation flagship chip Blackwell.

Currently, xAI is training Grok 2.0 on 20,000 GPU. The latest version can handle documents, charts, and objects in the real world. In the future, the model will also be extended to audio and video. In addition, Mr Musk said on a conference call with investors in April that Tesla also had 35000 Nvidia H100s to train him on autopilot and planned to more than double the number by the end of the year.

crashbandicoothogwild| Musk and his "super computing power factory"

HOT
yy777 ph
Ph365 com
    Ads