Author: Dave

Book Review: The Alignment Problem by Brian Christian

The Alignment Problem, (released in 2020 but still highly relevant today, especially in the age of generative AI hype), is a fascinating exploration of one of the most interesting issues in artificial intelligence: how to ensure AI systems safely align with human values and intentions. The book is based on four years of research and over 100 interviews with experts. Despite the technical depth, I feel that this book is written to be accessible to both newcomers and seasoned AI enthusiasts alike. A word of warning though: this book is has A LOT of info.

Before we get too deep into this review, let’s talk about safety and what it means in the context of AI. When we talk about AI safety, we’re referring to systems that can reliably achieve their goals without causing unintended harm. This includes:

  • The AI must be predictable, behaving as expected even in novel situations.
  • It must be fair, avoiding the amplification of existing societal biases.
  • It needs transparency, allowing users and developers to understand its decision-making process.
  • It must be resilient against failures and misuse.

Creating safe AI tools is both a technical challenge, as well as a psychological challenge: it requires understanding human cognition, ethics, and social systems, as these elements become encoded in AI behavior.

The book is divided into three main sections: Prophecy, Agency, and Normativity, each tackling different areas of aligning artificial intelligence with human values.

Prophecy explores the historical and technical roots of AI and highlights examples of unintended outcomes, such as the biased COMPAS recidivism prediction tool. COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is a risk assessment algorithm used in the criminal justice system to predict the likelihood of a defendant reoffending. However, investigations revealed that the tool disproportionately flagged Black defendants as higher risk compared to white defendants, raising critical questions about fairness and bias in that AI system.

Agency delves into reinforcement learning and the parallels of reward-seeking behavior in human, showcasing innovations like AlphaGo and AlphaZero. His explanation of reinforcement learning, and its connection to dopamine studies, is particularly insightful. Christian dives into psychological experiments from the 1950s that revealed the brain’s pleasure centers and their connection to dopamine. Rats in these studies would press a lever to stimulate these areas thousands of times per hour, foregoing food and rest. Later research established that dopamine serves as the brain’s “reward scalar,” which helps influence decision-making and learning. This biological mechanism has parallels in reinforcement learning, where AI agents maximize reward signals to learn optimal behaviors.

Normativity examines philosophical debates and techniques like inverse reinforcement learning, which enables AI to infer human objectives by observing behavior. Christian connects these discussions to ethical challenges, such as defining fairness mathematically and balancing accuracy with equity in predictive systems. He also highlights key societal case studies, including biases in word embeddings and historical medical treatment patterns that skew AI decisions.

Christian interweaves these sections with interviews, anecdotes, and historical case studies that breathe life into the technical and ethical complexities of AI alignment.

He also delivers numerous warnings, such as:

“As we’re on the cusp of using machine learning for rendering basically all kinds of consequential decisions about human beings in domains such as education, employment, advertising, health care and policing, it is important to understand why machine learning is not, by default, fair or just in any meaningful way.”

This observation underscores the important implications of deploying machine learning systems in critical areas of human life. When algorithms are used to make decisions about education, employment, or policing, the stakes are insanely high. These systems, often trained on historical data, can perpetuate or amplify societal biases, leading to unfair outcomes. This calls for deliberate oversight and careful design to ensure these technologies promote equity and justice rather than exacerbate existing inequalities. (Boy, oh boy — fat chance of that in light of current events in January 2025)

Christian also highlights some of the strengths of machine learning. These systems can detect patterns in data that are invisible to human eyes, uncovering insights that were previously thought impossible. For example:

“They (doctors) were in for an enormous shock. The network could almost perfectly tell a patient’s age and sex from nothing but an image of their retina. The doctors on the team didn’t believe the results were genuine. ‘You show that to someone,’ says Poplin, ‘and they say to you, “You must have a bug in your model. ‘Cause there’s no way you can predict that with such high accuracy.” . . . As we dug more and more into it, we discovered that this wasn’t a bug in the model. It was actually a real prediction.”

Examples like this show the real-world potential of machine learning to revolutionize fields such as healthcare by identifying patterns that humans might overlook. However, these benefits are accompanied by significant challenges, such as the “black box” nature of AI decision-making, where it remains difficult to determine what features a model is actually using.

Christian shows how understanding these technical challenges, alongside ethical frameworks, can lead to more robust and equitable AI systems. These considerations emphasize the nature of AI safety, which requires combining insights from cognitive science, social systems, and technical innovations to address both immediate and long-term risks.

While the book is dense (very dense!) and information-rich, this strength can also be a drawback. Some sections felt overly detailed, and the pacing, especially in the latter half, left me feeling fatigued.

Despite this, The Alignment Problem remains a compelling and optimistic exploration of how researchers are tackling AI safety challenges. I think this book is an insightful read for anyone interested in AI and will leave you thinking about our future AI overlords long after you’ve turned the last page.

Just call him Patch

We took Benson into the vet for his (now, it seems almost monthly) check up. Plus, they had to drain more fluid from his body. This time, they shaved a big old patch on his side to “tap” him. He looks ridiculous!

But he’s doing well, all things considered! Sadly, the tumor on his heart has gotten bigger. He still loves his pillows, though.

My 2024 Reading List

Here’s another “year-in-review” post (I’m done, I swear). Over the course of 2024, I read 30 books. My favorite books this year were Bury My Heart at Wounded Knee and The Cuckoo’s Egg (I wrote about visiting the author at his Oakland house). My least favorite was easily Palo Alto (it was one of the few reviews I wrote this past year).

EDIT: Fixed hyperlinks. GoodReads changed how their reading challenge page is displayed and I did not update my parsing tool to account for this.

Morning coffee prevents death, say researchers

Add this to my coffee confusion post from last year. A new study published in the European Heart Journal concludes that greater coffee intake (in the morning) was “significantly associated with a lower risk of all-cause mortality.”

Hey, that’s pretty cool!

From the journal article:

In their study published in this issue of the European Heart Journal, Wang et al.8 analysed the time of the day when coffee is consumed in 40 725 adults from the NHANES and of 1463 adults from the Women’s and Men’s Lifestyle Validation Study. They noticed two distinct patterns of coffee drinking, i.e. the morning-type pattern, present in around a third of participants, and a less common all-day-type pattern present in 14% of the participants. During a median follow-up of almost a decade, and after adjustment for caffeinated and decaffeinated coffee intake, the amounts of cups per day, sleep hours, and other confounders, the morning-type, rather than the all-day-type pattern, was significantly associated with lower risks of all-cause mortality with a hazard ratio of 0.84 and of cardiovascular mortality of even 0.69 as compared with non-coffee drinkers.

This is fantastic news — wait.

I am one of those “all-day” coffee drinkers.

My top music of 2024

Last.fm has been diligently cataloging my music listening habits for nearly 20 (!!) years. Now that we’ve said goodbye to 2024, it’s time to look back at what I’ve been digging into. Compared to previous years, there are some interesting surprises. And stuff that is just absolutely the same as always.

  1. Dispatch
  2. Social Distortion
  3. Hot Water Music
  4. The Interrupters
  5. Red Hot Chili Peppers
  6. Angie Mattson
  7. Aesop Rock
  8. Guts
  9. Natural Incense
  10. The Juliana Theory

Dispatch and Hot Water Music have always consistently been in my top 3, (except for last year, where neither even made my top 10, weird). It’s no surprise that both of them rank up there as my favorite bands. I saw HWM earlier this year when they made their way back to the Bay Area.

Thanks to some iPhone photo memories, I was reminded of Angie Mattson early in the year — this is an artist who loved about 20 years ago and then literally dropped off the face of the Earth. Her music is no longer available on Spotify or Apple Music. I found a few videos that are still up on YouTube (who knows for how long), but other than the albums in my local library that Last.fm has logged, she apparently doesn’t exist anymore.

Social Distortion was coming back to town and I was so excited to see them. And then a few days before the show, I tore my ACL in a paintballing incident with friends (go figure, it was my first time ever playing paintball), and I could barely walk.

Fun times all around, really. Here’s hoping 2025 is even better — even though this year starts off with the letters W(ednesday) T(hursday) F(riday).

Previous years in music:

Comparing reasoning in open-source LLMs

Alibaba recently released their “QwQ” model, which they claim is capable of chain-of-thought reasoning comparable to OpenAI’s o1-mini model. It’s pretty impressive — even more so because we can run this model on our own devices (provided you have enough RAM).

While testing the chain-of-thought reasoning abilities, I decided to compare my test prompt to Llama3.2 and was kind of shocked at how good it was. I had to come up with ever more ridiculous scenarios to try and break it.

That is pretty good, especially for a non chain-of-thought model. Okay, come on. How do we break it! Can we?

Alright, magical unicorns for the win.

“Paint me like one of your French… err… humans.”

Earlier this year, OpenAI added memory to ChatGPT. If it thought something you discussed was notable, it would add that to its memory bank in order to make future conversations better.

If you’re a ChatGPT user and haven’t looked through what ChatGPT has added to its memory yet, do it! It’s wild. Here is a small sample of the things that it has remembered from various conversations.

Anyway, I decided to ask it: “Based on everything we’ve chatted about, can you create an image of what you think I look like? I want you to use your own creativity and insight into things we’ve talked about that might give you an idea of what I look like. And then create an image based on this.”

And it went to work!

Haha! I wish. I asked what the prompt for that image was:

“A creative depiction of a senior software engineer with a thoughtful and analytical demeanor. The person is sitting at a modern workspace, surrounded by elements representing their interests such as a laptop with coding screens, a bookshelf with a mix of technical books and sci-fi novels, a Betta fish tank on the desk, and a coffee mug. The individual has an approachable and intellectual vibe, with a relaxed yet focused expression. Their workspace is neat but personalized, reflecting creativity and problem-solving. The background includes subtle references to hobbies like running gear, a small display of game design sketches, and tools for crafting AI-related projects. Natural lighting with a modern and clean aesthetic.”

“Their workspace is neat…” — oh boy, if only you knew. Regardless, I’ll take it. Sure!

Work trip to Seattle

I’ve been in Seattle this week for work. Today marks 9 years at this ol’ place! Not a bad view from the office.

I also gave a fun talk on AI — “Using LLMs for Engineering Productivity”

A Dapper Dude

We went to a local brewery this weekend, where they had a “Dogoween” costume party / contest. We took Benson and he wore his finest threads.

…he may have only placed 15th out of 16 dogs, but he will always be first place in our hearts.