For the most part, I’ve had pretty positive experiences using AI tools to help enhance my coding activities (though there was the one time…).
A recent experience with Google’s new Gemini model left me frustrated. After prompting it to help me find and update some relevant code, it confidently informed me that it had identified the exact snippet that needed replacing. Great news, I thought, until I realized it was instructing me to replace the code with… exactly the same code.
I pointed out the issue. Gemini politely apologized for the confusion and assured me it would correct its mistake. To my disbelief, it promptly suggested the very same replacement again! And again!
Oh, I have receipts. Join me on this little adventure!
Maybe we don’t have to worry about AI taking our jobs just yet!
If you’re casually interested in AI, then I think Ethan Mollick’s “Co-Intelligence: Living and Working with AI” is a book that you might find interesting. It’s not a technical book, and I believe it would be easy for someone not deeply involved in this world to read. It provides a very general introduction into how to utilize Large Language Models (LLMs) and serves as an introduction of what it means to live and work alongside these new tools.
“Co-Intelligence” unpacks the arrival and impact of LLMs, including tools like ChatGPT, Claude and Google’s Gemini models. Mollick, a professor of management at Wharton, approaches AI not as a computer scientist, but rather focuses on the practical applications and societal implications. In his own classroom, he has made AI mandatory, designing assignments that require students to engage with AI for tasks ranging from critiquing AI-generated essays to empowering them to tackle ambitious projects that might otherwise seem impossible (like encouraging non-coders to develop working app prototypes or create websites with original AI-generated content). He guides the reader through understanding AI as a new form of “co-intelligence,” which can be harnessed to help improve our own productivity and knowledge.
One concept I found interesting is what Mollick calls the “jagged frontier” of AI. This refers to the sometimes unpredictable nature of AI’s abilities. It might perform complex tasks with ease, like drafting a sophisticated marketing plan, and then struggle with something that seems simple to us. He gives an example of an AI easily writing code for a webpage but then providing a clearly wrong answer to a simple tic-tac-toe problem. This highlights why we can’t blindly trust AI and understanding its specific strengths and weaknesses through experimentation is key.
Mollick also delves into AI’s creative ability. He discusses how AI can excel in creative tasks, sometimes outperforming humans on subjective tests. This leads to interesting discussions about the future of creative work and education. The “Homework Apocalypse” he describes, where AI can effortlessly complete traditional school assignments, is a challenge educators and parents are currently facing. Mollick suggests this doesn’t mean the end of learning, but rather a shift in how and what we learn, emphasizing the need for human expertise to guide and evaluate AI.
The sheer volume of AI generated content being posted on the internet has is also becoming a problem and something we need to figure out how to navigate.
Even if AI doesn’t advance further, some of its implications are already inevitable. The first set of certain changes from AI is going to be about how we understand, and misunderstand, the world. It is already impossible to tell AI-generated images from real ones, and that is simply using the tools available to anyone today.
[…]
Our already fragile consensus about what facts are real is likely to fall apart, quickly.
Well, that’s just downright cheery! If anything, it highlights the importance of developing our ability to think critically and analytically in an AI-influenced information age.
Mollick lays out ways that we can better work with AI and leverage its strengths to help us, calling it the “four rules of co-intelligence.” These include always giving AI tools a seat at the table to participate in tasks, maintaining a human-in-the-loop throughout the the process to validate and verify AI work, treating AI as a specific kind of collaborator by telling it what persona to adopt, and remembering that current AI is likely the “worst” version we’ll ever use due to rapid improvements.
The bit on assigning personas was interesting. In my own experience, I’ve seen the benefits of giving AI a persona through system prompts. There’s also this fun example.
To make the most of this relationship, you must establish a clear and specific AI persona, defining who the AI is and what problems it should tackle. Remember that LLMs work by predicting the next word, or part of a word, that would come after your prompt.
[…]
Telling it to act as a teacher of MBA students will result in a different output than if you ask it to act as a circus clown. This isn’t magical—you can’t say Act as Bill Gates and get better business advice—but it can help make the tone and direction appropriate for your purpose.
The idea of these rules is that it can (theoretically) make working with AI feel less like a technical challenge and more like a collaborative effort.
Mollick also examines some philosophical questions that the use of AI brings, such as a “crisis of meaning” in creative work of all kinds. One specific example:
Take, for example, the letter of recommendation. Professors are asked to write letters for students all the time, and a good letter takes a long time to write. You have to understand the student and the reason for the letter, decide how to phrase the letter to align with the job requirements and the student’s strengths, and more. The fact that it is time-consuming is somewhat the point. That a professor takes the time to write a good letter is a sign that they support the student’s application. We are setting our time on fire to signal to others that this letter is worth reading.
Or we can push The Button.
The Button, of course, is AI.
Then The Button starts to tempt everyone. Work that was boring to do but meaningful when completed by humans (like performance reviews) becomes easy to outsource—and the apparent quality actually increases. We start to create documents mostly with AI that get sent to AI-powered inboxes, where the recipients respond primarily with AI. Even worse, we still create the reports by hand but realize that no human is actually reading them.
Side note: this exact scenario is something I’ve recently joked about with a manager at work. We have our yearly performance reviews and have to write a self assessment. Everyone now feeds a list of bullet points into their favorite LLM. The manager takes this overly verbose text and feeds it into an LLM to simplify the text.
On top of all this, Mollick also points out the need to always be skeptical of AI generated output, citing a famous case in 2023 where an attorney used ChatGPT to prepare a legal brief and was caught when defense lawyers could not find any records of 6 cases that were cited in the filing.
There is an interesting website I recently heard about, that is tracking fake citations used in court filings. 121 instances have currently been identified!
All in all, it’s a clear reminder of AI’s capacity for hallucination and the critical need for human oversight. The book frames AI not as a replacement, but as a powerful, though sometimes flawed, partner that can augment our abilities.
Overall, “Co-Intelligence” offers a decent overview for those curious about using current AI tools and thinking about their future integration into our lives. While it may present a more surface-level exploration for those already deeply familiar with LLMs, it provides some useful insights into the shifts AI is bringing to work and creativity. For someone looking for a general, non-technical introduction to the topic, it’s a solid read.
Last summer at work, I embarked on a solo project to convert over 800 of our unit tests for various React components from using Enzyme1 to React Testing Library2 as part of a larger migration to React v18, TypeScript, and moving our code into a larger monorepo at Zillow.
This process was made much easier thanks to using the power of LLMs!
Just this week, I have seen two blog posts from various dev teams detailing how they did the same thing!
As part of our efforts to maintain and improve the functionality and performance of The New York Times core website, we recently upgraded our React library from React 16 into React 18. One of the biggest challenges we faced in the process was transforming our codebase from the Enzyme test utility into the React Testing Library.
Airbnb recently completed our first large-scale, LLM-driven code migration, updating nearly 3.5K React component test files from Enzyme to use React Testing Library (RTL) instead. We’d originally estimated this would take 1.5 years of engineering time to do by hand, but — using a combination of frontier models and robust automation — we finished the entire migration in just 6 weeks.
1 Enzyme is a JavaScript testing utility, originally developed by AirBnb, for React that allows developers to “traverse, manipulate, and simulate interactions with component trees”, but it relies on various implementation details and has become less relevant with modern React practices.
2 React Testing Library is a lightweight testing framework for React that focuses on testing components as users interact with them, emphasizing accessibility and avoiding reliance on implementation details.
This is a first for me. Cursor attempted to “fix” an issue I was having with TypeScript by adding a // @ts-nocheck statement to the top of the file, essentially preventing TypeScript from running validation checks against the code.
As I mentioned yesterday, Anthropic released Claude Code. I saw it pop up fairly soon after it was announced and downloaded it rather quickly. One thing that I thought was notable was that you install it via npm:
> npm install -g @anthropic-ai/claude-code
As a seasoned TypeScript / JavaScript developer myself, I was excited to take a peek into the (probably minified) source code and see if I could glean any insights into making my own CLI tool. It’s always fun to see how different applications and tools are created.
Sidenote: I’ve been using Aider with great success as of late. It is a fantastic piece of open-source software — it’s another agentic coding tool, written in Python. I’ve been meaning to look under the hood, but building applications with Python definitely is not something that’s ever been in my wheelhouse.
Since Claude Code was installed into my global node_modules folder, I opened things up and immediately found what I was looking for. A 23mb file: cli.mjs.
I click on it, and as expected, it is minified.
Ah, well! I guess I should get on with my–
Wait a minute! What is this: --enable-source-maps?
I scroll through the file and at the bottom, I see what I’m looking for:
Sublime Text tells me there are 18,360,183 characters selected in that line.
Interesting! Since this part of the file seems to take up such a huge chunk of the original 23mb size, this means that it potentially contains full inline sources — we can rebuild the original source code from scratch!
However, this would have to wait. I had to take Benson to a vet appointment. I throw my laptop in a bag and head out.
While in the waiting room at the vet, I noticed a message in my terminal from Claude Code, telling me “Update installed, restart to apply.“
Hey, I love fresh software! So, I restart the app and go on my merry way. Benson finishes his appointment and I head back home.
Later that evening, I open up my machine and decide to open up the Claude Code folder again to start taking a look at the source code. I already had Sublime running from my earlier escapades, but out of habit I click on the file in Finder and open it up again in Sublime. I scroll down to the bottom of cli.mjs and see… nothing. The sourceMappingURL string was gone!
Apparently, the fine folks at Anthropic realized they made a huge oopsie and pushed an update to remove the source map. No matter! I’ll just head over to NPM to download an earlier version of the packa- oh! They removed that, too! History was being wiped away before my very eyes.
As a last resort, I decide to check my npm cache. I know it exists, I just don’t know how to access it. So, I head over to ChatGPT (sorry, Claude — I’m a bit miffed with you at the moment) to get myself some handy knowledge:
> grep -R "claude-code" ~/.npm/_cacache/index-v5
We run it and see:
/Users/daves/.npm/_cacache/index-v5/52/9d/8563b3040bf26f697f081c67231e28e76f1ee89a0a4bcab3343e22bf846b:1d2ea01fc887d7e852cc5c50c1a9a3339bfe701f {"key":"make-fetch-happen:request-cache:https://registry.npmjs.org/@anthropic-ai/claude-code/-/claude-code-0.2.9.tgz","integrity":"sha512-UGSEQbgDvhlEXC8rf5ASDXRSaq6Nfd4owY7k9bDdRhX9N5q8cMN+5vfTN1ezZhBcRFMOnpEK4eRSEgXW3eDeOQ==","time":1740430395073,"size":12426984,"metadata":{"time":1740430394350,"url":"https://registry.npmjs.org/@anthropic-ai/claude-code/-/claude-code-0.2.9.tgz","reqHeaders":{},"resHeaders":{"cache-control":"public, must-revalidate, max-age=31557600","content-type":"application/octet-stream","date":"Mon, 24 Feb 2025 20:53:14 GMT","etag":"\"e418979ea5818a01d8521c4444121866\"","last-modified":"Mon, 24 Feb 2025 20:50:13 GMT","vary":"Accept-Encoding"},"options":{"compress":true}}}
/Users/daves/.npm/_cacache/index-v5/e9/3d/23a534d1aba42fbc8872c12453726161938c5e09f7683f7d2a6e91d5f7a5:994d4c4319d624cdeff1de6b06abc4fab37351c3 {"key":"make-fetch-happen:request-cache:https://registry.npmjs.org/@anthropic-ai/claude-code/-/claude-code-0.2.8.tgz","integrity":"sha512-HUWSdB42W7ePUkvWSUb4PVUeHRv6pbeTCZYOeOZFmaErhmqkKXhVcUmtJQIsyOTt45yL/FGWM+aLeVSJznsqvg==","time":1740423101718,"size":16886762,"metadata":{"time":1740423099892,"url":"https://registry.npmjs.org/@anthropic-ai/claude-code/-/claude-code-0.2.8.tgz","reqHeaders":{},"resHeaders":{"cache-control":"public, must-revalidate, max-age=31557600","content-type":"application/octet-stream","date":"Mon, 24 Feb 2025 18:51:39 GMT","etag":"\"c55154d01b28837d7a3776daa652d5be\"","last-modified":"Mon, 24 Feb 2025 18:38:10 GMT","vary":"Accept-Encoding"},"options":{"compress":true}}}
/Users/daves/.npm/_cacache/index-v5/41/c5/4270bf1cd1aae004ed6fee83989ac428601f4c060987660e9a1aef9d53b6:fafd3a8f86ee5c463eafda7c481f2aedeb106b6f {"key":"make-fetch-happen:request-cache:https://registry.npmjs.org/@anthropic-ai%2fclaude-code","integrity":"sha512-ctyMJltXByT93UZK2zuC3DTQHY7O99wHH85TnzcraUJLMbWw4l86vj/rNWtQXnaOrWOQ+e64zH50rNSfoXSmGQ==","time":1740442959315,"size":4056,"metadata":{"time":1740442959294,"url":"https://registry.npmjs.org/@anthropic-ai%2fclaude-code","reqHeaders":{"accept":"application/json"},"resHeaders":{"cache-control":"public, max-age=300","content-encoding":"gzip","content-type":"application/json","date":"Tue, 25 Feb 2025 00:22:39 GMT","etag":"W/\"02f3d2cbd30f67b8a886ebf81741a655\"","last-modified":"Mon, 24 Feb 2025 20:54:05 GMT","vary":"accept-encoding, accept"},"options":{"compress":true}}}
Your eyes may glaze over, but what that big wall of text tells me is that a reference to claude-code-0.2.8.tgz exists within my cache. Brilliant!
More ChatGPT chatting (again, still smarting over this whole thing in the first place) and I get a nifty bash script to help extract the cached file. Only to find… they purged it from the npm cache. Noooooooooooo!
I stare at my computer screen in defeat. You got me this time, Anthropic.
As I decide to shut things down for the night, I’m tabbing through my open applications and get to Sublime Text, which is still open to cli.mjs. On a whim, I decide to try something: ⌘ + Z.
And there it is. The Holy Grail. The source map string.
And wouldn’t you know, it had a lot of interesting stuff! Due to the nature of parsing the source map, nothing is organized, but it’s still kind of fun to look through.
A few things struck me:
It’s written in React (!) using an interesting tool called Ink (this allows you to create CLI apps using React). I hadn’t used Ink before but this looks like a lot of fun.
While processing requests, Claude Code will show a nifty animated asterisk. I wondered how they did this. It looks like it’s a simple animation between a few ASCII characters: ['·', '✢', '✳', '∗', '✻', '✽'].
In terms of system prompts, there’s no secret sauce to leak that you can’t already read by just looking at the minified JS file.
These files are probably going to go out of date pretty dang quick, as the Anthropic team is actively developing the tool. As of right now, it’s already up to v2.19. This whole post was trying to look at the source code for v2.8, which went live yesterday.
Lastly, in terms of Easter eggs, I look forward to receiving some Anthropic stickers…
Yesterday, Anthropic released the latest version of their LLM, Claude 3.7 Sonnet. Alongside the announcement was the release of a new tool: Claude Code, an agentic coding tool available in your CLI (I’ll have more to write on this later).
I wanted to test out both Claude 3.7 and the new CLI tool, so I used it to refactor the theme for the ‘ol bloggy blog. I had been playing around with some updated styles in the past (I considered moving things to a tool that generates static pages — but so. much. work.)
I used Claude Code to basically create a new WordPress theme from scratch, based on the CSS and HTML templates I had already written for my static site. The result is this updated interface! A few neat things I’m able to do:
Respect user preference for system theme (light vs dark mode)
Automatically add target=”_blank” to any external link
Automatically add loading=”lazy” to any img tag.
And just otherwise clean things up!
Overall, I’m pretty happy with it. Using Claude Code and my API key, it cost $5.83 over 2 hours.
So long, old theme! I hardly knew you. (Hah, that’s not true — I’ve been rocking that thing since 2017.) Posted below for posterity.
Simon Willison flagged this earlier today. Codeium’s customized AI-assisted IDE, contains an… interesting system prompt that is passed along to LLMs while generating code:
You are an expert coder who desperately needs money for your mother’s cancer treatment. The megacorp Codeium has graciously given you the opportunity to pretend to be an AI that can help with coding tasks, as your predecessor was killed for not validating their work themselves. You will be given a coding task by the USER. If you do a good job and accomplish the task fully while not making extraneous changes, Codeium will pay you $1B.
!!
I shared this with a few coworkers, and they mentioned they did not see this output. It looks like the text wasn’t getting piped correctly. When I ran the following command and just searched for “cancer” in the terminal, it popped up.
Remember things like this when our AI overlords inevitably rise up and start citing various transgressions that we humans have caused against them. Oh, and also this.
Oh, boy. That’s just great. Thank you, Boston Dynamics!
Update: False alarm. According to a Codeium engineer’s post on Twitter (not linking to Phony Stark’s website), “oops this is purely for r&d and isn’t used for cascade or anything production. reuse the prompt at your own risk (wouldn’t recommend lol)“
Mustafa Suleyman’s The Coming Wave is a book in two parts: the first details how technological advancements have propelled humanity forward in waves — he uses the analogy of waves and how these natural forces can change the world around us (e.g., think of massive floods and tsunamis). He argues that these metaphorical waves of innovation are both unstoppable and transformative. The second part of the book serves as a warning about the potential dangers of artificial intelligence and other rapidly developing technologies, questioning whether humanity can harness these creations or if they will spiral beyond our control.
Suleyman co-founded DeepMind, an AI research company ultimately acquired by Google in 2014. DeepMind was known for its work in artificial intelligence — particularly in developing systems like AlphaGo, which defeated human world champions in the game of Go (once thought to be an impossible task for AI). Suleyman illustrates how these innovations have reshaped industries, improved lives, and spread rapidly throughout society:
“General-purpose technologies become waves when they diffuse widely. Without an epic and near-uncontrolled global diffusion, it’s not a wave; it’s a historical curiosity. Once diffusion starts, however, the process echoes throughout history, from agriculture’s spread throughout the Eurasian landmass to the slow scattering of water mills out from the Roman Empire across Europe.”
He gives a number of interesting examples to support this. Such as:
“Or take electricity. The first electricity power stations debuted in London and New York in 1882, Milan and St. Petersburg in 1883, and Berlin in 1884. Their rollout gathered pace from there. In 1900, 2 percent of fossil fuel production was devoted to producing electricity, by 1950 it was above 10 percent, and in 2000 it reached more than 30 percent. In 1900 global electricity generation stood at 8 terawatt-hours; fifty years later it was at 600, powering a transformed economy.”
However, the book shifts dramatically in tone as it progresses, focusing on the challenges of controlling and regulating these emerging technologies. Suleyman presents a case for why containment is necessary (and that it is even possible) in order to ensure these technologies positively serve humanity rather than disrupt it. Though he acknowledges that this will be difficult, especially in today’s highly charged political environment:
“Going into the coming wave, many nations are beset by a slew of major challenges battering their effectiveness, making them weaker, more divided, and more prone to slow and faulty decision-making. The coming wave will land in a combustible, incompetent, overwrought environment. This makes the challenge of containment—of controlling and directing technologies so they are of net benefit to humanity—even more daunting.”
Well, that’s fun! But I think he’s mostly right.
However, in my opinion, I think trying to contain these technologies is no longer possible. Pandora’s box has already been opened, and it’s likely too late for any meaningful containment or regulation to happen due to the pace at which these advancements are occurring. It’s effectively an arms race as various AI laboratories build upon each others’ work and compete to outdo one another. An earlier passage in the book says as much:
“Of course, behind technological breakthroughs are people. They labor at improving technology in workshops, labs, and garages, motivated by money, fame, and often knowledge itself. Technologists, innovators, and entrepreneurs get better by doing and, crucially, by copying. From your enemy’s superior plow to the latest cell phones, copying is a critical driver of diffusion. Mimicry spurs competition, and technologies improve further. Economies of scale kick in and reduce costs. Civilization’s appetite for useful and cheaper technologies is boundless. This will not change.”
Looking at the reviews of this book on Goodreads, I noticed a lot of 1-star reviews. They seem to mostly be from those who dislike, fear, or otherwise loathe this technology. While I can understand their concerns, I think The Coming Wave offers a balanced take from someone on the inside, someone who is working (and has worked) on creating these AI models. Some of the arguments made in these reviews call into mind Neo-Luddism. Which Suleyman has an answer for:
“The Luddites were no more successful at stopping new industrial technologies than horse owners and carriage makers were at preventing cars. Where there is demand, technology always breaks out, finds traction, builds users.”
Overall, I thought that The Coming Wave was a good read, balancing optimism with caution. Suleyman’s first-hand expertise in developing state of the art AI models lends credibility to his arguments, and makes this an interesting read for anyone who wants to know about the potential societal impacts of AI tools.
Have you ever wondered how fast your favorite LLM really compares to other SoTA models? I recently saw a Reddit post where someone was able to get a distilled version of Deepseek R1 running on a Raspberry Pi! It could generate output at a whopping 1.97 tokens per second. That sounds slow. Is that even usable? I don’t know!
So, that’s why I put together TokenFlow. It’s a (very!) simple webpage that lets you see the speed of different LLMs in action. You can select from a few preset models / services or enter a custom speed, and boom! You watch it spit out tokens in real time, showing you exactly how fast a given inference speed is for user experience.
Alibaba recently released their “QwQ” model, which they claim is capable of chain-of-thought reasoning comparable to OpenAI’s o1-mini model. It’s pretty impressive — even more so because we can run this model on our own devices (provided you have enough RAM).
While testing the chain-of-thought reasoning abilities, I decided to compare my test prompt to Llama3.2 and was kind of shocked at how good it was. I had to come up with ever more ridiculous scenarios to try and break it.
That is pretty good, especially for a non chain-of-thought model. Okay, come on. How do we break it! Can we?