Guest Blog By Alan Russo – Independent Data Centre Consultant and Advisor to IES
I've spent a lot of time at industry events over the years. But walking into NVIDIA's GTC 2026 in San Jose felt different. More than 30,000 developers, engineers, and infrastructure leaders packed the SAP Center, and the energy in the room reflected something much bigger than a product launch. It felt like an industry collectively reckoning with the scale of what comes next.
I attended on behalf of my consulting practice and in my advisory capacity with IES, specifically to understand the infrastructure implications of NVIDIA's roadmap announcements. What I came away with has significantly shaped how I think about the near-term challenges facing data centre owners, operators, and the engineers who design for them. Here's my read on what matters most.
The Year of Inference Is Here
Jensen Huang has a gift for framing a moment, and his GTC 2026 keynote was no exception. He declared 2026 the "year of inference", a tipping point where AI workloads shift decisively from training to real-time token generation. His words were direct: "AI need to think now... Thinking requires inference, and inference requires generating a large number of tokens.”
The numbers behind that statement are staggering. NVIDIA now projects at least $1 trillion in cumulative demand for Blackwell and Vera Rubin systems through 2027 - double last year's forecast. Computing demand, Huang said, has jumped 10,000 times over the last two years alone. Whether you apply a healthy analyst discount to those figures or not, the direction of travel is unmistakable.
The core productivity metric for this new era is tokens per second per megawatt. This is not a chip metric. It's a system level metric – and that distinction matters enormously for anyone designing or operating the facilities that house these systems. Systems of GPU, CPU, LPU, networking, storage, memory, liquid cooling, air cooling, and an architecture of power delivery from the substation to the chip all need to work in coordination to maximize against this metric.
The AI Factory: A Fundamental Architectural Shift
The "AI factory." is the most important concept to emerge from GTC, and one that will reshape how we think about data centre design. Jensen Huang used this term deliberately. These are not server rooms. They are not even traditional high-performance computing facilities. They are tightly coupled - co-designed industrial systems whose sole purpose is the production of tokens at scale.
The Vera Rubin platform sits at the heart of this vision. It is a full-stack architecture built around seven chips and five rack-scale systems designed to operate together as a single AI supercomputer. The NVL72 configuration pairs 72 Rubin GPUs with 36 Vera CPUs via next-generation NVLink interconnects, achieving what NVIDIA claims is an up to35x improvement in token throughput compared to Hopper-generation systems at equivalent power.
Paired with this is the Groq 3 LPX rack, a product of NVIDIA's multi-billion-dollar licensing agreement of Groq, whose Language Processing Unit (LPU) was purpose-built for ultra-low latency inference. The Groq LPX rack holds 256 LPUs and is designed to sit alongside the Vera Rubin system, handling the "serving" side of token generation while Rubin handles the heavy processing. Together, the combined architecture can theoretically scale token output for a 1-trillion-parameter model by an extraordinary margin compared to previous-generation systems.
Also unveiled was Kyber, NVIDIA's next major rack architecture, featuring 144 GPUs in vertically oriented compute trays for greater density and lower latency. Kyber forms the basis for Vera Rubin Ultra, expected in 2027. The roadmap beyond that points to the Feynman generation in 2028. The cycle of step-change hardware evolution taking place every 12 to 18 months is not slowing down. If anything, it's accelerating.
For those of us working on the infrastructure side, these are not abstract announcements. Each generational leap resets the design assumptions for power density, cooling strategy, and physical architecture.
The Infrastructure Consequences No One Has Fully Solved
This is where the conversation gets harder - and more interesting for practitioners.
Liquid cooling is no longer optional. The Vera Rubin architecture and the Kyber rack are designed for predominantly liquid cooling. The highest-density compute platforms of today and tomorrow cannot be adequately served by air. That said, liquid cooling does not eliminate air-side loads entirely. Depending on system design and margin assumptions, a 200–300kW cabinet can still generate meaningful residual air heat load, a design reality that is often underappreciated. And while OEMs define acceptable coolant temperature ranges, there is as yet no accepted facility-level standard for how that cooling infrastructure should be specified and operated.
800VDC is becoming a reality in the white space. Legacy data centres largely run on 208VAC at the rack level which is sufficient for CPU-era loads, but fundamentally impractical for 150–300kW racks. 415VAC has been strongly emerging over the last 12–18 months as a response to rising densities, but it too reaches its practical limits at the densities the AI factory era demands. 800VDC, and OCP’s +/- 400VDC - with their higher voltage, fewer conversion steps, and lower distribution losses - is emerging as the architecture for megawatt-scale compute blocks. The challenge is that no established standard yet exists for connector specifications, protection schemes, rack-level interfaces, or safety codes. The vendor-driven noise around "the right answer" is significant, and operators are being asked to make major capital commitments in the absence of that clarity.
No stable hardware target exists beyond 18 months. This is perhaps the most consequential challenge. Owners of AI compute infrastructure want to deploy capacity in enormous blocks, at speeds the industry has never attempted before. But the equipment they want to deploy is novel - to them, and to the data centre owner hosting it. The OEMs will each have their own specific implementations of the NVIDIA reference architecture, with their own thermal signatures and power infrastructure requirements. There is no single standard for how heat is rejected or power is distributed at the facility level. And by the time a facility is designed, built, and commissioned, the hardware target may have shifted again.
Simulating Reality Before Committing Capital
This is the part of the conversation that, as someone who advises on both infrastructure strategy and the tools that support it, I find most compelling.
The question facing data centre operators is no longer simply "will this design meet spec?" The question is "will this system perform to the expected level, as a system - not as an aggregation of components?” That is a fundamentally different question. It requires scenario testing before capital is committed. It requires validating vendor performance projections in a modelled environment before they are trusted. It requires understanding the behaviour of power, cooling, compute, and networking as an integrated whole.
NVIDIA itself signalled the importance of this approach at GTC with the announcement of the Vera Rubin DSX AI Factory Reference Design and the Omniverse DSX Blueprint - a physically accurate digital twin environment designed to support the planning, buildout, and operation of large-scale AI factories. The DSX Air platform takes this further, enabling full logical simulation of NVIDIA hardware infrastructure before a single server is installed - with customers reportedly reducing time to first token from weeks to hours.
Industry partners including Switch, Trane Technologies, Vertiv, and Eaton (which recently acquired Boyd Cooling and Flexnode) are all contributing to this reference design ecosystem, reflecting a broad industry recognition that pre-deployment simulation is no longer a luxury. It is a competitive necessity.
This is precisely the domain in which IES brings distinctive capability. Where platforms like Omniverse DSX focus on network topology and ROI metrics, IES's modelling tools operate at the thermal, HVAC, and mechanical level - providing the facility-level physics that sit beneath the IT layer. For operators trying to understand whether an existing building can handle next-generation AI loads, whether a cooling strategy will perform under real operating conditions, or how a mixed OEM environment will behave across a multi-tenant space, that level of simulation fidelity is critical.
The feedback I heard at GTC from hyperscalers - including teams managing some of the world's largest deployments — was that they are frequently making infrastructure decisions under supply chain pressure, accepting whatever transformers and cooling equipment they can source, and analysing performance after the fact rather than during design. The trend, as one contact put it, is "build first, analyse later." That is an understandable response to the current moment. But as capital becomes more constrained and the costs of being wrong continue to rise, the window for that approach is closing.
What This Means for IES and Its Clients
GTC 2026 confirmed what many of us working in this space have suspected: infrastructure is no longer simply designed. It is modelled. The organisations that can simulate reality before committing capital will have a decisive advantage in an era where the margin for error has never been smaller.
For IES clients — whether existing data centre operators navigating the shift to AI factory architectures, real estate owners evaluating conversion potential, or co-location providers planning for a new generation of AI-native tenants - the value proposition is clear. The complexity of next-generation AI infrastructure, combined with the absence of established standards and the relentless pace of hardware evolution, makes pre-deployment modelling not just useful, but essential.
The age of AI factories is not coming. After GTC 2026, it is clear it is already here.
Alan Russo is an independent data centre consultant and advisor to IES. He works with data centre owners, operators, and technology vendors on infrastructure strategy, capacity planning, and the integration of advanced modelling tools into the design and operations lifecycle.