AI Datacenter Design Part 1: Shift in datacenter architecture in the Age of A.I

Cisco, Datacenter & Servers, Home, Network Automaton, Network Engineering, Network Security, Projects, Python

AI Datacenter Design Part 1: Shift in datacenter architecture in the Age of A.I

Let’s discuss AI Datacenter Design Part 1: why the shift in datacenter architecture? Before we discuss, what an ultra low latencyA.I Datacenter network Design looks like, let’s ask our self, why does it need a more sophisticated DC design? Why isn’t a modern spine and Leaf DC design that’s served for many years no longer…

Taha Yusuf

13th Nov 2025

2–3 minutes

Let’s discuss AI Datacenter Design Part 1: why the shift in datacenter architecture?

Before we discuss, what an ultra low latency
A.I Datacenter network Design looks like, let’s ask our self, why does it need a more sophisticated DC design?

Why isn’t a modern spine and Leaf DC design that’s served for many years no longer efficient for A.I?

We can utilise ultra high-performance network devices such as the Cisco Nexus 9800’s with future ready 800Gbps line cards. However, we find ourselves in an inefficient DC design for the A.I age.

So why? The answer to that question is simple.

Traditional DC were designed around north-south or east-west traffic users accessing services, VMs talking to VMs, storage I/O, etc.

Ultimately traditional DC are CPU-based!
But AI data centres are completely different, they are GPU-based and subsequently latency-bound

So the next question most will have in mind, why does A.I DC use GPU and not CPU?

This is based on the underlying architectural differences behind CPU and GPU.

GPU uses many cores, so do CPU’s?

GPU’s use SIMD stands for Single Instruction, Multiple Data. It’s a compute model where one instruction is executed on many pieces of data at the same time.

A simple example here, I will use python list for this.
Let’s say you want to add two list:

A = [1, 2, 3, 4]
B = [5, 6, 7, 8]

A CPU would add those indexes sequentially:

C[0] = A[0] + B[0]
C[1] = A[1] + B[1]
C[2] = A[2] + B[2]
C[3] = A[3] + B[3]

A SIMD based compute will perform all four additions at once.
GPUs have thousands of lightweight cores and utilise SIMD to achieve parallel execution.

To conclude CPU might process a single operation sequentially.
A GPU using SIMD processes thousands of operations in parallel, this exactly what A.I deep training needs.

It doesn’t need a compute that’s great at thinking deeply about one process at a time.

Deep A.I training requires SIMD model, which is great at performing a million simple process at once.

NVIDIA have their own hybrid modern version of SIMD called SIMT (Single Instruction, Multiple Threads)think of this as the bigger, badder version of SIMD.

Now it’s important to note, while both modern CPUs and GPUs use parallel instruction models, GPUs scale that concept significantly.

I won’t delve deeper into these topics, as it’s beyond the scope of A.I Datacenter design – but it’s good to know that we have captured the reasons behind the shift of Datacenter architecture.

If you like to further read(source for this post) on these topics, I highly recommend: