Nemotron 3 Nano vs Gemma 3 27B: 2x Faster, Half the Power Consumption

NVIDIA just released its new open-source LLM, the Nemotron 3 Nano.

They claim it's the most efficient and accurate model they've released. The Nemotron 3 Nano is a 30-billion-parameter model that delivers 4x the throughput of the Nemotron 2 Nano, enabling more tokens per second for multi-agent systems at scale. You can read more about the Nemotron 3 Nano and the rest of the Nemotron 3 family of models in the full press release here: NVIDIA Press Release.

My tests of the Nemotron 3 Nano are run on an Ollama server. The machine specifications are below.

  • Dual AMD 6700 XT GPUs (24GB VRAM)

  • AMD 5800x CPU

  • 32GB DDR4 RAM 2400Mhz

  • Ubuntu Server 24.04

  My first look at the Nemotron 3 Nano made me concerned that I would have enough VRAM on my GPUs to run it. Ollama.com reports a size of 24 GB. While that's not the most accurate way to determine whether it will load on a GPU only, it's usually close enough to give you an idea. Once you add the context window, I was guessing it would force at least some of the LLM layers onto the CPU

  What did I do? I tried it anyway.

  I logged in to Open WebUI, which is connected to my Ollama server, and downloaded the Nemotron 3 Nano model. The anticipation of being among the first on release day to test was building. As I watched the download percentage tick up, seconds felt like minutes. I forced myself to make some coffee to kill some time. After only a few minutes and a fresh cup of coffee, the download was complete. 

  The first prompt was going to tell me if 24GB of VRAM was enough.

  I started with a simple prompt, "Write me a story." Nothing too detailed; I only wanted to confirm whether it would run on my GPUs without offloading some to the CPU. After sending the prompt, I watched it spin for a second, then the "Thinking" bubble appeared, and soon after, I watched it generate text into a story.

  To my surprise, that first run came in just over 34 tokens per second.

  Success! I have now confirmed that the Nemotron 3 Nano can run on 24GB of VRAM. It does get pretty close to the limit, using about 94% of the VRAM on each GPU, including the default context window. This tells me 24GB is the minimum you'll need to run this model.

Given NVIDIA's claims about efficiency improvements, I wanted to verify power usage. According to the ROCm System Management Interface, it averages about 54 watts per GPU. That's impressively efficient.

Now let's compare its performance to a similarly sized model.

The Gemma 3 27-billion-parameter model is the closest commonly used model in size. The Nemotron 3 Nano is a 30-billion-parameter model, so it makes sense to use the Gemma 3 27B to compare.

I loaded the Gemma 3 model and used the same prompt. On the dual 6700 XT graphics cards, it comes in at 15.5 tokens per second. That puts it at approximately half the speed.

What about the power usage of Gemma 3?

According to the ROCm System Management Interface, it consumes approximately 120 W per GPU while running. That’s over double the power!

NVIDIA's claims are all based on comparisons between the Nemotron 2 and Nemotron 3 models, but when running on my dual AMD 6700 XT Ollama server, it seems they have broader arguments to make. It runs twice the speed and at half the power of the Gemma 3 27B model.

There is so much more that can go into testing any LLM. Each one has its own uses.

I'll need to do more testing to assess accuracy and how well it integrates into my current workflows. Maybe even compare it to other models, such as GPT-oss.

In conclusion, NVIDIA delivered a model that I’ll use and further experiment with. Its speed and efficiency are essential to someone like me who enjoys running LLMs at home but wants to keep the power bill down. Who knows? It may end up replacing one I use in my current workflow when I need a quick first-draft editor or an idea sparring partner.

Have you tried the Nemotron 3 Nano? Tell me about your experiences and the hardware you use in the comments.

Previous
Previous

The Self-Taught Revolution: Why Formal Education Isn't Enough

Next
Next

Personal Renaissance, It's Okay to Start Over