Like many working in AI and deep learning, I’m constantly testing the limits of my hardware. Speed is everything—especially when you're experimenting with large datasets and iterating over models with fine-grained control. For a while, I was using an external GPU (eGPU) connected via Thunderbolt 4, thinking it would be a sleek and flexible solution for model training.

Reality Check: Thunderbolt Isn’t Magic

Although Thunderbolt 4 offers a theoretical maximum bandwidth of 40 Gbps, it soon became evident that this ceiling was a bottleneck for my workload.

The realization came unexpectedly while reviewing the technical specifications of the new MacBook Pro, which is the Thunderbolt 5 support—capable of delivering up to 120 Gbps using Bandwidth Boost. This significant jump in throughput underscored the limitations of my current system, which is constrained to Thunderbolt 4.

Every second of latency adds up when you're training across 250,000 images, (384x384x3+8)x250000=110.594 GByte, multiple epochs, and trying to fine-tune interpolative regressors under monotonic constraints.The conclusion was clear: I needed to ditch the eGPU setup and go internal.

The Upgrade: PCIe Power

So, I took the eGPU out of its gigabyte case (called gaming box) and installed it directly into my desktop via PCIe. That move immediately unleashed its full potential. PCIe lanes on modern desktops can easily hit 500+ Gbps, orders of magnitude faster than what Thunderbolt offers.

The performance boost was undeniable:

Training time dropped significantly, to be honest I started using RAM Driver as well...
Inference latency shrank
GPU utilization finally hit peak levels, I've never seen 380W utilization of my NVIDIA RTX 4090 before...

Memory Matters

I also upgraded my system’s RAM to a massive 192 GB (though I can see the limiation already).

Now, I can:

load my training / validation images into a RAM driver
I can start thinking to use cached tensors for smaller datasets; 1,6MByte / image stored as a tensor limiting me to use max ~ 50000 training and 15000 validation pictures in memory.
Maybe I should consider to run trainings simultaneously in the future to maximize the gpu utilization as with small models I still have enough GPU memory and free capacity...

From Bottleneck to Beast: How I Supercharged My Deep Learning Rig

Reality Check: Thunderbolt Isn’t Magic

The Upgrade: PCIe Power

Memory Matters

Recent Posts

Comments