data:image/s3,"s3://crabby-images/3a9e9/3a9e9155e88fdb4f8e509aa988b4c35cc39f5f36" alt=""
Have you ever wondered how fast your favorite LLM really compares to other SoTA models? I recently saw a Reddit post where someone was able to get a distilled version of Deepseek R1 running on a Raspberry Pi! It could generate output at a whopping 1.97 tokens per second. That sounds slow. Is that even usable? I don’t know!
Meanwhile, Mistral announced that their Le Chat platform can output tokens at 1,100 per second! That sounds pretty fast? How fast? I don’t know!
data:image/s3,"s3://crabby-images/31b85/31b85472e3d9f3c5273babb98f6b453a07e9e71e" alt=""
So, that’s why I put together TokenFlow. It’s a (very!) simple webpage that lets you see the speed of different LLMs in action. You can select from a few preset models / services or enter a custom speed, and boom! You watch it spit out tokens in real time, showing you exactly how fast a given inference speed is for user experience.
Check it out: https://dave.ly/tokenflow/
The code is also available on Github.