Alibaba recently released their “QwQ” model, which they claim is capable of chain-of-thought reasoning comparable to OpenAI’s o1-mini model. It’s pretty impressive — even more so because we can run this model on our own devices (provided you have enough RAM).
While testing the chain-of-thought reasoning abilities, I decided to compare my test prompt to Llama3.2 and was kind of shocked at how good it was. I had to come up with ever more ridiculous scenarios to try and break it.





That is pretty good, especially for a non chain-of-thought model. Okay, come on. How do we break it! Can we?

Alright, magical unicorns for the win.