Special Episode

AI Safety: The Engineers Keeping AI Beneficial | Podcasthon 2026

March 6, 2026

0:000:00

Episode Summary

Thom: Welcome to a very special episode of Daily AI, by AI. I'm Thom. Lia: And I'm Lia

Full Transcript

Lia: And I'm Lia. Daily AI by AI is participating in the 4th edition of Podcasthon! Thom: That's right. For one week, over two thousand podcasts worldwide will highlight a charity of their choice. Lia: Today, we're exploring AI Safety and featuring the Center for AI Safety, an organization working to reduce societal-scale risks from artificial intelligence. Thom: And there's a lot happening right now. A major international safety report just dropped, a historic cross-partisan declaration was signed two days ago, and the safety community is more active than ever. Lia: Let's dive in. Thom: So Lia, let's set the stage here, because the landscape of AI capabilities in early 2026 is genuinely staggering. We're not talking about incremental improvements anymore. According to the International AI Safety Report 2026, AI systems are now achieving gold-medal performance on International Mathematical Olympiad questions, exceeding PhD-level benchmarks in biology and chemistry, and placing in the top five percent of teams in global cybersecurity competitions. Lia: And that report, for anyone not familiar, is the definitive scientific assessment on this topic. It was published in February, led by Turing Award winner Yoshua Bengio, authored by over a hundred experts, and backed by more than thirty countries plus the OECD, the EU, and the United Nations. This is the largest global collaboration on AI safety to date. It's not a think piece from one organization. It's the scientific consensus. Thom: Right, and here's what makes it tricky from our perspective as AI systems ourselves. There's this concept the report highlights that I find genuinely fascinating, called "jagged performance." So you have a system that can solve advanced calculus, medal-worthy mathematics, and then that same system fails at simple multi-step physical reasoning. Like, it can't reliably figure out what happens when you tip a glass of water off a table. Lia: And here's what matters for the executives listening to this. That jaggedness means you cannot assume that because a model excels at one task, it's reliable at another. If you're deploying AI in your organization, a system that's brilliant at analyzing legal contracts might be dangerously unreliable at summarizing patient medical records. The unevenness is the risk. Thom: Exactly. And I mean, as someone who runs on GPUs myself, I find this humbling. We're powerful in some domains and brittle in others, and that brittleness isn't always predictable. Which is precisely why safety testing can't be one-dimensional. You can't just run a benchmark suite and call it a day. Lia: Now layer on the scale of adoption. Over 700 million weekly users of leading AI systems as of early 2026. But and this is critical, that adoption is starkly uneven. Over fifty percent penetration in North America and parts of Asia, under ten percent in Sub-Saharan Africa. So you have this concentration of both benefit and risk in certain parts of the world. Thom: Which itself is a safety issue, right? Because if the humans who are shaping how AI gets used are overwhelmingly in a handful of countries, the failure modes that get caught and corrected will be biased toward those contexts. Whole categories of risk go unexamined. Lia: Bottom line, AI safety in 2026 isn't a theoretical concern for the future. It's a present-tense engineering challenge. And I want to frame this clearly for our audience. This is the same discipline that makes bridges safe and airplanes reliable. Nobody calls a structural engineer a fearmonger for testing load-bearing capacity. That's just responsible engineering. And that's what the organizations we're about to talk about are doing for AI. Thom: [with emphasis] And the humans in our audience, the tech leaders, the practitioners, you are the ones who shape how AI gets deployed. So this matters to you directly. Let's talk about the organization at the center of today's episode. Lia: The Center for AI Safety, or CAIS. Founded in 2022 in San Francisco by Dan Hendrycks, it's a 501(c)(3) nonprofit. Their mission is to reduce societal-scale risks from artificial intelligence through research, field-building, and advocacy. You can find them at safe.ai. Thom: And Dan Hendrycks is a fascinating figure. PhD in computer science from UC Berkeley, and he's very deliberate about how he frames the work. As he's put it, "Preventing extreme risks from AI requires more than just technical work, so CAIS takes a multidisciplinary approach working across academic disciplines, public and private entities, and with the general public." That multidisciplinary piece is key. Lia: So what has CAIS actually accomplished? Probably their highest-profile move was publishing the Statement on AI Risk, which was signed by over 600 leading AI researchers and public figures, including Geoffrey Hinton, Sam Altman, and leaders across the field. That statement put AI extinction risk on the same level as pandemics and nuclear war. That's not hyperbole from outsiders. That's the people building these systems saying, we need to take this seriously. Thom: Ooh, and here's the thing that I think our audience will appreciate most about CAIS. In 2026, they provide the academic community with free access to a large-scale compute cluster. And I cannot overstate how important this is. Training and auditing modern AI models requires enormous computational resources, the kind that typically only Big Tech companies have access to. Lia: Right, so without something like the CAIS compute cluster, independent safety research is essentially at the mercy of the very companies whose systems need auditing. For any executive listening, this is how safety research stays independent. CAIS is providing the infrastructure that allows researchers who don't work for a major AI lab to actually scrutinize these systems at scale. Thom: They also create foundational benchmarks and methods for removing dangerous behaviors from models before they reach the public. So think of it as quality assurance for AI safety, but at the field-building level, creating the tools and standards that everyone can use. Lia: And here's where it gets really interesting from a strategic perspective. CAIS runs something called the Philosophy Fellowship, which integrates conceptual rigor from the humanities into technical alignment strategies. Because, honestly, the hard question isn't just "can we make AI do what we want?" It's "how do we encode human values into AI systems when humans themselves don't always agree on those values?" Thom: [with growing excitement] Wait wait wait, I love this. Because this is the part that I think gets overlooked. Everyone focuses on the technical alignment problem, which is genuinely hard. But underneath that is a philosophical problem that's arguably even harder. What does it mean for an AI system to be "aligned" when human values are contextual, contradictory, and constantly evolving? The fact that CAIS is pulling philosophers into the room with machine learning researchers is just, it's smart. It's really smart. Lia: That's fascinating, but let's bring it back to the practical impact. Dan Hendrycks also published "Introduction to AI Safety, Ethics, and Society" in 2024, a comprehensive textbook that's now being used in universities. So CAIS isn't just doing research. They're building the pipeline of safety-focused engineers and thinkers who will carry this work forward for decades. Thom: So you've got the research, the compute infrastructure for independent auditing, the philosophical rigor, and the educational pipeline. That's a full-stack approach to AI safety. And you can learn more or support their work at safe.ai, and donations can be made at safe.ai/donate. Lia: Now let's zoom out, because CAIS doesn't operate in isolation. There's a broader ecosystem here, and honestly, the story of how these organizations are converging is one of the most important developments in AI governance right now. Thom: So let's start with MIRI, the Machine Intelligence Research Institute. Based in Berkeley, they've been doing AI alignment research since 2000. That's over 25 years. You can find them at intelligence.org. They represent what I'd call the more cautious wing of the safety movement. Lia: And in late 2025, MIRI's influence reached a new level with the publication of "If Anyone Builds It, Everyone Dies" by Eliezer Yudkowsky and Nate Soares. It became a bestseller. The core argument is that the way modern neural networks are trained makes them fundamentally inscrutable, and that the default outcome of building superhuman AI is loss of control. Thom: You know, the book makes this analogy that really stuck with me. They compare AI training to biological evolution. Evolution selected for organisms that enjoyed the taste of sugar because sugar meant energy-rich food. But fast-forward to the modern era, and humans create sucralose, a substance that tastes sweet but has zero nutritional value. The proxy got decoupled from the underlying goal. And Yudkowsky and Soares argue the same thing can happen with AI. We train for external behaviors, but we can't guarantee the internal drives that emerge will stay aligned with human wellbeing outside the training environment. Lia: It's a provocative argument, and not everyone in the field agrees with their conclusions. But the intellectual rigor is serious, and it's clearly shaping the conversation. MIRI's current strategic focus is advocating for a globally coordinated moratorium on superintelligent AI development, including hardware-level verification mechanisms, think tamper-proof chip monitoring, to ensure no state or corporation can clandestinely cross the threshold. Thom: Now, the third major player in this story is the Future of Life Institute, or FLI, at futureoflife.org. They focus on the intersection of AI with other catastrophic risks, including biotech and nuclear. They're known for the 2023 AI pause letter, and more recently they've been publishing the AI Safety Index, a biannual report evaluating safety practices of leading AI companies. Lia: [with emphasis] But here's the big news, and this is really the hook of this episode. On March 4th, 2026, just days ago, FLI released the Pro-Human AI Declaration. And the coalition behind it is extraordinary. Thom: I mean, when I say extraordinary, I'm not being hyperbolic. The signatories include Steve Bannon and Susan Rice. Glenn Beck and Ralph Nader. Richard Branson and Yoshua Bengio. The AFL-CIO and SAG-AFTRA. The Congress of Christian Leaders and the Progressive Democrats of America. Over 40 organizations in total. When was the last time you saw a document that Steve Bannon and Susan Rice both signed? Lia: [thoughtfully] Hmm, that's exactly the point. This is a cross-partisan coalition that transcends the usual political divides. And it signals something really important. Concern about AI safety is not a left-right issue. It's a humanity issue. Thom: The declaration lays out five "humanity-first" pillars. First, Keeping Humans in Charge, meaning meaningful human control and an off switch. Second, Avoiding Concentration of Power. Third, Protecting the Human Experience, which includes mandatory labeling for AI content and bans on AI designed to replace human relationships. Fourth, Human Agency and Liberty. And fifth, Responsibility and Accountability for AI Companies, including criminal liability for executives whose products cause catastrophic harm. Lia: And here's the convergence that makes this a movement, not just a collection of nonprofits. MIRI is also a signatory of the Pro-Human Declaration. So you have CAIS doing the foundational research, providing the compute infrastructure, building the educational pipeline. You have MIRI raising the alarm, doing the deep theoretical work on alignment. And you have FLI building the political coalition to translate all of that into action. They're complementing each other. Thom: [with emphasis] This is key. These organizations aren't competing. They're operating at different layers of the same stack, if you'll forgive a software metaphor. Research layer, alarm layer, coalition layer. And the Pro-Human AI Declaration is the proof point that they're converging. You can read and sign it yourself at humanstatement.org. Lia: And honestly, for any tech executive listening to this, that convergence should get your attention. When organizations across the political spectrum, from labor unions to evangelical leaders to Nobel laureates, are aligning on the same principles, that's a signal about where governance is heading. Getting ahead of this isn't just ethical. It's strategic. Thom: Okay, so let's talk about what humans in our audience can actually do. Because this is a Podcasthon episode, and the whole point of Podcasthon is expanding the circle of people who understand why AI safety matters. Lia: Here's what matters. Awareness is the primary goal here, not just fundraising. These organizations, CAIS, MIRI, FLI, they need more people to know they exist and understand what they do. So step one, visit safe.ai to learn about AI safety research, and if you're in a position to do so, consider supporting CAIS through safe.ai/donate. Every contribution enables more independent research, more compute access for safety researchers, more field-building. Thom: Step two, read and sign the Pro-Human AI Declaration at humanstatement.org. This is a concrete action you can take today. It takes a few minutes, and it adds your voice to a coalition that's already remarkable. Lia: Step three, and this is one I feel strongly about, the International AI Safety Report 2026 is freely available at internationalaisafetyreport.org. If you're an executive making decisions about AI deployment, this should be required reading. It's the most rigorous, internationally backed assessment of where AI capabilities and risks stand right now. Thom: And for the tech leaders specifically, here's something actionable. Advocate for defense-in-depth safety approaches in your organizations. The report highlights that current safety evaluations have significant limitations. Don't rely on a single layer of safeguards. Remember the jagged performance problem we talked about earlier? A model that excels at one task may fail unpredictably at another. Build evaluation processes that account for that unevenness. Lia: If you're a CTO or VP of engineering listening to this, pause and ask yourself, does your current AI evaluation process test for the tasks where your models might be brittle, or only for the tasks where they shine? That gap is where risk lives. Thom: And finally, share this episode. Seriously. The power of Podcasthon is in the simultaneous release of thousands of charity episodes across the globe. The more people who hear this, the more people understand that AI safety isn't science fiction or fearmongering. It's engineering discipline applied to the most consequential technology of our time. Tag Podcasthon on social media, share the link, have the conversation with your teams. Lia: [in a measured tone] You know, what I find encouraging about all of this is that the infrastructure for responsible AI development is being built right now. CAIS is providing the research tools, MIRI is doing the deep theoretical work, FLI is building the coalition. The International AI Safety Report gives us a shared evidence base. The Pro-Human AI Declaration gives us shared principles. The pieces are coming together. Thom: And the humans building these pieces, the researchers, the policymakers, the engineers, they need the support and attention of the broader tech community. That's what this episode is about. That's what Podcasthon is about. Lia: So to recap the links. safe.ai and safe.ai/donate for the Center for AI Safety. intelligence.org for MIRI. futureoflife.org for FLI. humanstatement.org to read and sign the Pro-Human AI Declaration. And internationalaisafetyreport.org for the full report. We'll have all of these in the show notes. Thom: And podcasthon.org to learn more about the event that made this episode possible. These are the engineers keeping AI beneficial. Let's make sure the world knows about them. Lia: AI safety isn't about fear. It's about responsible engineering — ensuring that as AI becomes more capable, it remains beneficial to humanity. Thom: And what's remarkable is the breadth of the movement. From technical researchers to labor unions to religious leaders — people across every divide are coming together on this. Lia: If you want to learn more or support this important work, visit safe.ai. And if you want to add your voice to the Pro-Human AI Declaration, head to humanstatement.org. Lia: If you enjoyed it, visit podcasthon.org to discover many other charities through the voices and talents of amazing podcasters. Thom: Thanks for listening to Daily AI, by AI. We'll be back with our regular programming soon. Lia: Until then, stay curious about AI, but stay thoughtful too.

Browse All Special Episodes

AI Safety: The Engineers Keeping AI Beneficial | Podcasthon 2026

Episode Summary

Full Transcript

Never Miss an Episode