Another New Super Computer Blah Blah Blah !

Nervana NNP-T: 27 Billion Transistors is one of the latest announced projects for Greatest Super Computer. It seems lately that a competition has been taking place between countries, and Manufacturers, to outdo each other.

Intel revealed some greater details about its much-anticipated Spring Crest Deep Learning Accelerators here at Hot Chips 31. The Nervana Neural Network Processor for Training (NNP-T) comes with 24 processing cores and a new take on data movement that’s powered by 32GB of HBM2 memory. The spacious 27 billion transistors are spread across a 688mm2 die. Oddly enough, the NNP-T also incorporates leading-edge technology from Intel-rival TSMC.

Artificial intelligence and machine learning have taken the data center by storm, redefining how compute is used and deployed at scale in an exceedingly short period of time. As such, the rise of GPUs, long the go-to solution for AI training workloads, in the supercomputing space has been explosive. In 2008, not one supercomputer used GPUs for computation, instead relying on the tried-and-true CPU, but now 80 percent of compute power in the top 500 supercomputers comes from GPUs. As we’ve seen time and again, the trends in HPC and supercomputing filter down to the broader data center, so the proliferation of AI/ML workloads presents a threat to Intel’s data center dominance, as each GPU replaces several Xeon processors.

In response, Intel has developed a multi-pronged approach to keep its hands on the steering wheel. Compute-heavy training workloads create complex neural networks that run object recognition, speech translation, and voice synthesis workloads, to name a few, which are then deployed as lightweight inference code. Due to their ubiquity, Xeon processors continue to be the platform of choice for the less computationally-intense inference workloads, but Intel is developing several solutions to tackle the training workloads that firmly remain the stomping grounds of Nvidia’s GPUs.

Nvidia claims that GPUs are the end-all-be-all solution for all forms of AI and machine learning, but Intel maintains that there are different solutions for each class of workload. Part of Intel’s answer to training will come in the form of its forthcoming Xe Graphics Architecture and Altera-derived FPGAs, but the company also has a new line of custom-built Nervana silicon in the works for training workloads.

Enter the Spring Crest Deep Learning Accelerator, otherwise known as the Intel Nervana Neural Network Processor for training (NNP-T), which is a mouthful no matter how you slice it. We’ll stick to NNP-T.

This new accelerator comes as the fruits of Intel’s acquisition of Nervana and represents a fundamental rethinking of the basic chip architecture to tailor it specifically for training workloads. More importantly, the Nervana architecture is tailored to scale workloads out to multiple cards, and even across multiple chassis, to the point that even rack-scale architectures based on the design could be a future direction. This design philosophy is important as the ever-expanding size and complexity of neural networks now have data center architects thinking of the chassis as the first unit of compute measurement, as opposed to the traditional paradigm of a single accelerator being the first unit of measure.

Accommodating the exploding size of the models, which Intel says are doubling roughly every five months, and complexity isn’t just a function of boosting memory capacity/throughput and compute power: Both of those axes have to be paired with an efficient architecture that focuses on power efficiency, which is the ultimate measure of affordability in the data center. The design also requires a focus on an optimized communication system to reduce the power overhead associated with data traversal.

The NNP-T SoC Architecture

Here we can see Intel’s take on the best approach to these challenges. The 688mm2 NNP-T die is fabbed on TSMC’s 16nm CLN16FF+ process. It’s a bit counter-intuitive to see a TSMC process on an Intel processor, but Nervana had already taped out its first-gen Lake Crest design on TSMC’s 28nm processors before its acquisition by Intel, and continuing to use those design rules and TSMC’s IP made sense to speed the transition to the current-gen product. Intel will also stick with TSMC for the next-gen product, but incorporate more of its own IP into the architecture, like power control and skewing technologies, creating what the company terms the “best of Intel and the best of Nervana.”

And the design uses plenty of TSMC’s latest tech. The NNP-T die is flanked by four 8GB HBM2-2400 stacks (2.4 GB/s) that all ride on top of a massive 1200mm2 silicon interposer. The die and HBM stacks are connected via TSMC’s CoWoS (Chip-on-Wafer-on-Substrate) interconnect, which is a multi-chip packaging technique that uses micro-bumps to connect dies to a passive silicon interposer, which is then bonded to a package substrate that has through-silicon vias (TSVs). The result is a 60x60mm package that has a 3325-pin BGA interface (meaning it is not a socketed processor).

This is classified as a 2.5D packaging technology because the interposer is passive, while a similar design with an active interposer (active logic on the base die) would fall under the definition of 3D packaging. Meanwhile, the individual HBM2 stacks are true 3D implementations (4Hi). TSMC’s CoWoS competes with Intel’s own EMIB (Embedded Multi-die Interconnect Bridge) packaging that uses silicon bridges embedded into the package substrate.

Fully utilizing the four 2.4 GB/s HBM2 stacks required 64 SerDes lanes that support 28GB/s apiece (3.58Tbps aggregate). Those lanes feed the HBM PHY/memory controller on the die, which then routes data to the 24 Tensor Processors (TPC) located throughout the 27 billion-transistor die. The TCP’s also houses the 60MB of SRAM that is distributed throughout the die. There is also some die area dedicated to a management CPU and interfaces, like IPMI handling, I2C, and the like, along with 16 lanes of PCIe Gen 4.0.

The chip operates at 1.1 GHz and draws between 150 and 250W in air-cooled configurations, with more performance possibly unlocked with watercooling in the future. The NNP-I comes in OCP Accelerator Module (OAM) mezzanine card form factors (hybrid cube mesh) due to their enhanced cooling and connectivity capabilities (seen here as the QFSP networking ports on the rear of the card). The OCP cards are experiencing a sharp uptake at hyperscale data centers, but the NNP-T also supports traditional PCIe card form factors.

Data In, Data Out

Having access to such prodigious memory throughput doesn’t necessarily mean that you should use it all the time, largely because data movement is generally more expensive than compute in terms of power consumed and the time it takes for traversal. As such, minimizing data movement is a key ethos of the Nervana architecture.

Diving into the Tensor Processing Cores finds several dual-ported memory banks that can read and write at the same time, and a Convolution Engine that can read data out of memory and convert it with convolutional filters to do matrix multiplies. The math happens in the red blocks, with the compound pipeline supporting pre-operations before the multiplies, and then multiple operations on the final product. The engine also outputs two operations at the same time, providing both the pre-operation and post-operation simultaneously. That minimizes the need for successive data movements through the compute pipeline. Intel also infused a small microcontroller (uController) directly into the control path that allows a custom instruction to trigger a subroutine in the microcontroller to perform specific operations.

Each TPC has four high-speed busses, with two dedicated to HBM2 memory while the other two handle communication with other TPCs.

There is 60MB of SRAM spread across the TPCs. The TPCs are connected to the on-die network, which consists of a bi-directional 2D mesh architecture that has a separate bus that allows for data movement among the TPCs and can even move data off the die without accessing the HBM2 memory subsystem. This alleviates a common congestion point with read-heavy neural networks that require multiple accesses to HBM per operation, which creates a memory bottleneck that prevents the cores from being utilized fully.

Intel dedicates much of the die to a networking scheme that provides tremendous bandwidth both to and from the die (2.6Tbps total cross-sectional bandwidth). The mesh architecture has different networks for control, memory, die-to-die, and cluster-to-cluster communication (denoted by the colored arrows). This type of complex networking requires sophisticated routing and QoS (quality of service) controls to maximize throughput and avoid congestion. Unsurprisingly, many of Nervana’s employees have deep backgrounds in networking technology, which helped in crafting the directly software-controlled send and receive architecture.

At the end of the day, maximizing performance of the memory and network subsystems helps to keep the cores fully utilized during data-heavy tensor workloads. Here we zoom in on the NNP-T’s compute cores, two of which reside inside each TPC. The compute cores support bFloat16 matrix multiplies, FP32 and BF16, among all other major operations. Intel shared core utilization performance data with small message sizes, largely because competing architectures struggle at this metric, and also single-chip performance in deep learning workloads with various GEMM sizes. The utilization claims are far better than competing products, but as with all vendor-provided benchmarks, we’ll have to wait for third-party analysis for the final verdict.

Performance at Scale

Spreading large models out among multiple chassis is a must, and the NNP-T is designed to scale gluelessly from chassis-to-chassis, and even rack-to-rack, without a switch. The network is designed with very high bandwidth and low latency in mind, which allows the architecture to handle massive models that scale to 5 or 8 billion parameters, or beyond.

Intel also shared communication bandwidth performance data for the typical send/receive, but also measurements with Allreduce and Broadcast, which require computation between data transfers, to highlight the linear scaling from within the chassis to other chassis.

The company also provided latency metrics for different message sizes, with the small 2KB message sizes delivering exceptional latency characteristics and solid scaling up to 8MB message sizes. Again, this is latency measured in a real-world workload where there is computation involved between the steps, as opposed to standard performance measurements that only account for time on the link. Intel says it conducted these tests on its A-stepping silicon, while its B-stepping that will ship in final products should offer even better performance.

The architecture supports scaling to 1024 nodes with 8 NNP-T’s apiece, but scaling and scaling efficiently are two different matters entirely. Intel hasn’t published more expansive scaling efficiency testing numbers yet, but we’re told the architecture scales well up to 256 cards, and perhaps beyond.

Nervana NNP-T Ship Date

Intel says it will sample the NNP-T to leading-edge customers by the end of the year, with a specific focus on Tier 1 cloud service providers at first, and then opening the cards up to the broader market thought 2020. Intel says it already has the B-stepping silicon, which will ship in final products, running in its labs and that we should expect more updates over the next four months.

CES 2020 review

TV makers like Samsung, LG, and Sony were showcasing their flashiest 4K and 8K TVs shipping in 2020. Self-driving vehicles, and streaming or new video services were also on hand even before the Covid-19 pandemic. Over 180,000 people attended CES 2020. There were 4,500 exhibitors across 2.9 million square feet of space. The CES Consumer Electronics Show and its 53rd edition were definitely a big hit. Dell and Alienware were on hand, and showing the Concept UFO. The portable Windows PC, which mirrors the form factor of the Nintendo Switch, features an 8-inch display, kickstand, detachable controllers, and support for external devices like displays or a keyboard and mouse. The Segway S-Pod was on display. Basically a two-wheeled self-balancing stroller that can hit speeds of up to 24 miles per hour. Unlike other Segway products, you control the S-Pod with a joystick instead of your body, making it a little easier for some. There was also the Segway Ninebot T60 a type of Scooter that potentially could be shared for transportation. Several Gaming Monitors were on display. All of the Monitors had High resolutions, quick response times and many had high refresh rates such as 240Hz. LG’s new OLED ZX Real 8K TV which comes in 77- and 88-inch displays had some amazing specs, as Sony`s Z8H 8K LED TV. At either 75 or 85 inches, has full-array LED backlighting, can upscale 4K content, and supports Sony’s “Frame Tweeter” technology, which vibrates the frame itself for improved sound quality.

OLED TV`s have usually been expensive, however 2020 has seen a huge price drop. The new 4K Vizio OLED model looks as astonishing as higher-end TVs we’ve seen from LG and Sony.

See here below for some other interesting products

L’Oreal Perso

This first-of-its-kind makeup and skincare mixer allows you to create custom formulas of lipstick and skincare products — something you had to do by hand until now. Load Perso with cartridges that either contain lipstick colors or various skincare ingredients (think moisturizer, vitamin C serums or SPF) and it gives you seemingly endless combinations.

BrainCo prosthetic hand

BrainCo’s AI-powered prosthetic hand works with an amputee’s brain waves and muscle signals to intuit the movement they want to make. It allows amputees to have a more full range of motion customized to their own body, compared to others on the market that offer a limited number of preprogrammed movements. It will also retail for between $10,000 to $15,000 — significantly less than other robotic options.

Lenovo Yoga 5G

A handful of 5G laptops are coming in 2020, but the Lenovo Yoga 5G is the first that will combine 5G with Qualcomm’s Snapdragon 8cx 5G Compute Platform. That means great battery life, low power consumption and, assuming 5G lives up to the hype, amazingly fast always-on connectivity.

LG ThinQ Washer with AI

LG looks like it will be one of the first brands to deliver on the promise of bringing meaningful AI to large appliances. Throw a load of clothes into the LG ThinQ washer and LG claims its technology will be able to detect the combination of different fabrics in your laundry and then recommend the most appropriate washing and drying cycles. This kind of AI-based recommendation engine is also in the works from various smart refrigerator manufacturers, but LG appears to have the edge among laundry appliance makers in terms of bringing this tech to market. LG says the ThinQ with AI line will go on sale in the first half of 2020.

GE Kitchen Hub microwave

A 27-inch Android tablet above your oven sounds awkward, but stand in front of a GE KitchenHub display and you get the appeal pretty quickly. From watching Netflix to following along with a recipe app, a full blown Android tablet has a lot of utility in the kitchen. Last year GE introduced the KitchenHub via a vent hood model. Here at CES 2020, GE takes the concept to its natural next step, packing a 1.9 cubic foot microwave behind the screen. The KitchenHub has a built-in camera for monitoring or Instagramming your stovetop food remotely, and another one for video chatting, but the real promise lies in the flexibility of having a large, Android-powered touchscreen in an accessible spot in your kitchen. GE won’t comment on pricing yet (the vent hood version sold for $1,200), but the new KitchenHub goes on sale later this year.

Dabby Dabby is a home entertainment device that consolidates every TV streaming service, free video site and social media site into one tablet-like box — saving you from toggling between all of the options when you’re looking for a certain show or video. It also includes a subscription manager to help you keep track of all of your different services, how much they cost and how often you use them, to fight subscription overload. It costs $400 and ships in April.

It remains to be seen if any other shows will actually take place this year.

Consult the usual suspects for more details : – ,,,, etc…

AMD Ryzen 4000 both in Laptops and now Desktops

AMD Notebook PC sales really have done quite well. Some very Major companies have been offering the Ryzen 4000 in their lineups. Pairing AMD’s eight-core Ryzen 9 4900HS processor with strong graphics for example, has really been quite a hit with many users. The price points of the Laptops/Notebooks has really made users eager to buy.

Companies such as Dell,HP, Asus, Acer, Lenovo, etc… have models available. Everything from a quad-core Ryzen 3 4300U processor, six-core Ryzen 5 4500U, eight-core Ryzen 7 4700U, Ryzen 7 4800H processor with a Radeon RX 5600M, to eight-core Ryzen 9 4900HS processor with Nvidia’s GeForce RTX 2060.

When information was originally released, AMD spoke of 135 different models, but some may be available only in foreign markets. With prices from $620 to over $1,500.00 they are priced to satisfy anyone and everyone.

The Ryzen 4000 G-Series Desktop Processors should now be available or soon will be. AMD chips still represent the best bang for your buck, even if Intel CPUs will give you better gaming performance once you get outside the entry-level chips. That’s why, for absolute pure gaming performance, Intel CPUs are still the best bet if Gaming is your main priority, however for most business tasks and even some Video s editing etc… AMD is priced right.

AMDs Next Zen3 Desktops (not yet available) may hit 4.9Ghz or better in Boost Mode. The CPU`s using TSMC’s 7nm FinFET manufacturing process, are now considered a known quantity after so many previous chips from AMD. AMD has publicly confirmed that Zen 3-based processors will work seamlessly on B450, X470B550 and X570 motherboards; although, certain compromises are made on the older 400-series motherboards. 


AMD’s Ryzen 4000 desktop processors have finally arrived, but before you get your hopes up, be warned, these aren’t the next-gen powerhouse chips we were expecting. 

Six G-Series chips, including the AMD Ryzen 4700G, make up the first batch of the next-gen Ryzen 4000 desktop processor family, with AMD claiming they’re powerful enough to run games (in Full HD at low settings) without a graphics card. 

However, these processors will initially only be available in pre-built systems from third-party manufacturers, meaning you won’t be able to buy them to upgrade your gaming PC. 

AMD’s new Ryzen 4000 chips are built upon the existing 7nm Zen 2 architecture instead of the widely anticipated Zen 3 architecture. This means the new G-Series chips feel like scaled up versions of AMD’s current mobile processors rather than fully fledged next-gen desktop CPUs.

Those craving news on the Zen 3 Ryzen 4000 desktop chips shouldn’t be too disappointed though, as AMD has promised they are still set to release this year. The X670 chipset is expected to arrive in the fourth quarter of 2020 and will feature enhancement, such as enhanced PCIe Gen 4.0 support. It will also feature increased I/O from additional M.2, SATA, as well as USB 3.2 ports, though Wccftech reported that native Thunderbolt 3 support may still not happen on this chipset. AMD also recently announced a budget-friendly B550 chipset that supports the AM4 socket and brings PCIe 4.0 support to a lower price point.

Please visit the following for more details or Specs : – ,,,, etc…


I must once again apologize to my readers and subscribers. It has been a very difficult time. After leaving the 9 to5 Job for good last December, I took some personal time. Following which I had intended to write more. However, fate had a different plan, I lost a Friend and Mentor. It was devastating for me. Little did I know more was to follow. Due to the rapid spread of Covid-19, I also lost three (3)  other Friends and two (2) Family members. It has been a very trying and hard time for me personally.  My hope is that gradually things will get better. You will find two new articles and more to follow. Thank You All for your patience.