
Have you ever wondered how fast your favorite LLM really compares to other SoTA models? I recently saw a Reddit post where someone was able to get a distilled version of Deepseek R1 running on a Raspberry Pi! It could generate output at a whopping 1.97 tokens per second. That sounds slow. Is that even usable? I don’t know!
Meanwhile, Mistral announced that their Le Chat platform can output tokens at 1,100 per second! That sounds pretty fast? How fast? I don’t know!

So, that’s why I put together TokenFlow. It’s a (very!) simple webpage that lets you see the speed of different LLMs in action. You can select from a few preset models / services or enter a custom speed, and boom! You watch it spit out tokens in real time, showing you exactly how fast a given inference speed is for user experience.
Check it out: https://dave.ly/tokenflow/
The code is also available on Github.