QWEN 3 Coder Next - First Look From a Non-Dev

What if you could build a basic web game from a local AI model?

No token limits. No subscriptions. The trade-off? You need a machine with enough memory and power to run it.

Alibaba’s Qwen team just released a new coding LLM, “QWEN 3 Coder Next,” and they built it to run on less powerful hardware than you’d expect. It’s an 80-billion-parameter model, and they provide a 4-bit-quantized version. That means it should run more efficiently on a wider range of hardware.

Let's dive into it.

After playing with Devstral Small 2 from Mistral, it was cool to see it build something from just a few prompts. I was excited to see what a larger local model could do with coding.

The QWEN 3 Coder Next has more than twice as many parameters as the Devstral Small 2 model. In theory, it should be able to do more.

It’s quite a large model, even for my Radeon AI Pro R9700 GPU with 32GB of VRAM, but I have enough system RAM (64GB) that the remainder can be offloaded there and still run. I downloaded the model and gave it a prompt similar to the one I used with Devstral Small 2, but with a little more context and structure to the idea.

The Prompt:

"I need to build a web-based game that is a clone of the original Mario Brothers. It needs to be 8-bit graphics, and the main character is a white lion. It needs a start button for when you’re ready to play, a score that increases as you capture coins that populate as the screen side scrolls, and enemies that come in that you need to jump over. When the game ends from an enemy touching you, a pop-up appears with your score and a button under it to let you play again. Are you able to help me with this? If you can, ask me any questions you need to help me build the best possible version."

I sent the prompt, and the model loaded into the GPU and system RAM, then started thinking. After a moment, text and code started to generate in my chat. To my surprise, it ran at around 10 tokens/s. I attribute the speed to using the 4-bit-quantized model. Normally, when models are offloaded to system memory, performance drops to around 4 tokens/s or less because they also need to run on the CPU.

The output was exciting to see.

It delivered what I asked for: a working 8-bit game with characters and an environment to match. The controls worked, along with the start and play-again buttons.

Unfortunately, it still had issues.

Everything moved too quickly. Enemies moved like Flash, scrolling on the playfield so fast that you were overtaken within a second or two. If you did avoid an enemy, you still couldn't jump high enough to grab any coins.

Even with the issues, we were off to a great start.

The next question: how long can I ask for changes before it starts to struggle?

Since it started off so well, I followed up with another prompt. I requested that the game be slowed down to make it more playable and to allow the main character to jump high enough to grab coins on the playfield.

It gave me a fresh iteration of the game, and it was legitimately playable. The controls worked; I could jump higher and grab the coins. With each coin, I gained points, and the enemies were avoidable without making the game too easy. The model also provided all the code in a fresh set, so I didn't need to dig for specific pieces to swap out.

The game worked so well that I could have considered it a success and stopped there.

Instead, I kept prompting for more. I wanted to see how far we could take the model before something broke.

In my next prompt, I requested that the enemies spawn in at more random intervals. This was to make it feel less predictable. Several minutes later, it gave me a new version of the game and even said that it made the enemies look "meaner". I copied the code into a new HTML file and opened it. I started the game, then it froze 2 seconds later. I refreshed the page, hit start, and it froze again in the same spot.

I figured this was where everything was going to go south, but I kept trying to push it.

The layout of the world it created was a little more complicated and unique than in previous versions. I followed up with a prompt to fix the game freezing, and it provided new lines of code for me to try.

After pasting it into a fresh HTML file, I opened it in my browser, and it didn't freeze this time. It added several layers of land that you could jump to that spawned in at different intervals. The new "meaner" looking enemies did look meaner, and they spawned in at different intervals as I requested.

It fixed it!

That was my first thought until I finally got hit by an enemy. The "Game Over" screen popped up, showing my score, and when I hit "Play Again,” nothing happened.

Yep, it fixed one thing, but broke something else.

At this point, the issues started to build. After a couple more iterations of me trying to get it to work, I gave up. I'm pretty sure I hit the context window limit, so the model is struggling to give me anything reliable in return.

There are positives, though...

The model's consistency was its saving grace. After all the changes and iterations, the character designs didn't change, at least without warning, unless I asked for it. Other models I've tested often changed the characters' appearances with each iteration. Even as the QWEN 3 Coden Next struggled, I felt I could take it further with a fresh chat window and provide more context upfront, giving it a better chance of success on the first try or two.

In short, the QWEN 3 Coder Next is the best local LLM I've tested with coding to date.

Those of you who use tools like this regularly may not be surprised, but as a non-developer who also tries to avoid cloud tools for security and privacy, I find it to be impressive.

Have you had a chance to try any local models like the QWEN 3 Coder Next? If you haven't, give it a go; you may just have fun.

And let me know your thoughts in the comments.

Next
Next

How to Rebuild Any Creative Identity (Lessons from a Writer’s Comeback)