Adventures in Multimodal AI
Readers of my site know that I'm all in on Claude. That doesn't mean, though, that we're in an exclusive relationship. As I recently wrote, foolish are the souls who only carry one AI club in their bags.
Against that backdrop and inspired by Joanna Stern's hysterical new book I Am Not a Robot: My Year Using AI to Do (Almost) Everything, I decided to noodle a little more with Google's NotebookLM.
Going From Text to Audio
Yesterday, I launched Notebook LM and promptly uploaded Chapter 1 of Low-Code/No-Code: Citizen Developers and the Surprising Future of Business Applications. (Download the original PDF excerpt here.) I then gave it the following prompt:

Half an hour later, NotebookLM dutifully spit out two professional, twentyish-minute interviews of topics discussed in that chapter. The final products are anything but robotic. Rather, they sound remarkably natural with the occasional vocal tic.
Here they are, along with their AI-generated descriptions.
Broken IT Departments
In this interview-style podcast, author Phil Simon discusses the struggle of modern workplace technology, focusing on the critical shortage of software developers and how IT backlogs lead employees to resort to shadow IT.
Trillions Spent
This overview examines the persistent chasm between leadership's positive view of workplace tech and the reality of employee frustration, despite trillions of dollars in global IT spending.
A Minor Modification
Ultimately, I wanted these AI-generated conversations to live here. The only issue: unnecessarily large file sizes. Each .wav file ran about 40 MB. Before uploading them to Cloudflare and embedding them here, a little shrinkage was in order. (Cue obligatory Seinfeld reference.) Claude Code wrote the Terminal code to shrink them.
Before deciding on hosting, you should compress these first. M4A files at 40+ MB are almost certainly encoded at higher than necessary bitrates for voice audio. FFmpeg can shrink them dramatically without audible quality loss:
ffmpeg -i Why_your_company_IT_is_so_broken.m4a -b:a 64k output.m4a
64 kbps is fine for speech. That would take a 40 MB file down to roughly 8β10 MB β a 75β80% reduction.
True to form, Claude's Terminal command considerably shrunk each file.

Simon Says: AI jargon is everywhere, but expect this term to stick.
I could quibble a bit with the results, but NotebookLM's output impressed the hell out of me. Even three years ago, transforming even raw textβmuch less a sophisticated PDFβinto audio of this quality would have taken much, much longer.
This simple example illustrates the unprecedented power of today's AI tools. They are truly new, and it's no coincidence that use of the term multimodal is spiking. Unlike much AI jargon, expect this one and AI slop to stick.
Check out my award-winning book, The Nine: The Tectonic Forces Reshaping the Workplace.


Member discussion