Fujitsu A64FX, Japanese CPU “Fujitsu A64FX” threatens Nvidia, Intel and AMD in cloud computing,

Japanese CPU “Fujitsu A64FX” threatens Nvidia, Intel and AMD in cloud computing

Sandia National Laboratory announces that it will be the first U.S. Department of Energy laboratory to use the Fujitsu A64FX, a unique ARM-based processor designed for HPC projects and supercomputers.

According to Tech Radar, Fujitsu is primarily known for its laptops, tablets, and desktops, but Fujitsu will be the “giant” among processors manufacturers, an industry that has been around for more than half this century.

Launched in 2019, this CPU has 48 cores, achieves the highest performance of 3.38 TFLOPS, runs at 2.2 GHz, and has 32GB of HBM2 memory.

The Fujitsu A64FX is ideal for the HPC market because it offers much higher bandwidth performance between memory and CPU – up to 1TB/s. So far, moving data to and from the CPU has been the biggest obstacle, what researchers call exascale computing.

What makes A64FX even more interesting is that Fujitsu wants this technology to be used gradually in large cloud computing companies, which can benefit the public.

The A64FX, which is based on the ARM architecture, can run (and already runs) Linux versions and even Microsoft Windows.

A64FX outperforms both Nvidia and AMD GPUs in terms of the extremely important performance per watt. In fact, a prototype with 768 CPUs tops the Green500 list – a chart for supercomputers that deliver the most performance per watt.

If you know a little about the chip structure, it’ s easy to see that there are 52 cores in total, 10 in a row at each end, and 12 in two rows in the middle, with 8 more scattered between them, and caches in multiple locations.

The Fujitsu A64FX, manufactured using TSMC’s 7nm FinFET process, integrates 8.786 billion transistors but only 596 signal pins, has 52 internal cores, including 48 computing cores, four auxiliary cores (all identical), is based on the ARMv8.2-A instruction set, supports SVE 512-bit wide SIMDs, and has a peak performance of 2.7T flops.

All cores are divided into four groups of 13 cores, each sharing 8MB of secondary cache.

The interconnect bus uses 6D/Torus Tofu, dual links, 10 ports, 28Gbps bandwidth, 16 PCIe 3.0 inputs and 16 outputs, and four sets of 32GB HBM2 memory with 1TB/s peak read/write bandwidth.

Apparently the A64FX, which is supposed to power the successor to the Japanese main supercomputer K, has been switched off since August 2019.

The new supercomputer K – Fugaku replacement supercomputer – is expected to be 100 times faster when it is launched later this year, run on a Linux distribution called McKernel and reach 400 petaflops. The goal is for it to become the first supercomputer to achieve an exaflop when fully deployed with half a million processors.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *