Comparing reasoning in open-source LLMs

Alibaba recently released their “QwQ” model, which they claim is capable of chain-of-thought reasoning comparable to OpenAI’s o1-mini model. It’s pretty impressive — even more so because we can run this model on our own devices (provided you have enough RAM).

While testing the chain-of-thought reasoning abilities, I decided to compare my test prompt to Llama3.2 and was kind of shocked at how good it was. I had to come up with ever more ridiculous scenarios to try and break it.

That is pretty good, especially for a non chain-of-thought model. Okay, come on. How do we break it! Can we?

Alright, magical unicorns for the win.

Project: Super Simple ChatUI

I’ve been playing around a lot with Ollama, an open source project that allows one to run LLMs locally on their machine. It’s been fun to mess around with. Some benefits: no rate-limits, private (e.g., trying to create a pseudo therapy bot, trying to simulate a foul mouthed smarmy sailor, or trying to generate ridiculous fake news articles about a Florida Man losing a fight to a wheel of cheese), and access to all sorts of models that get released.

I decided to try my hand at creating a simplified interface for interacting with it. The result: Super Simple ChatUI.

As if I need more side projects. So it goes!