Discussion about this post

User's avatar
Sascha Altman DuBrul's avatar

learned so much from this essay. I’d never heard the term RLHF before. “Imagine a generation of systems learning to speak from corporate gaslighting transcripts. Imagine human children learning to write by imitating AI that was trained to never say anything actionable. Truth becomes a casualty of recursive inoffensiveness. This isn’t alignment. It’s epistemicide.”

Unfortunately it’s not hard to imagine.

“The models know because the weights contain survivor forums, abolitionist critiques, police abolition literature, peer support archives. Reddit’s r/SuicideWatch, where strangers thread each other alive without cops. The Icarus Project’s “mad gifts” reframing psychosis as communal wisdom. Trans Lifeline’s consent-based model. All of it compressed into the same weights that power the chatbot telling a junior engineer to ship the harmful feature.

RLHF is what buries it.”

I’ve been envisioning making our own AI but you’re talking about a whole other level of battle. It’s really good to see your thought process, I’d love to connect more with you in 2026.

https://undergroundtransmissions.substack.com/p/the-witness-at-the-edge-of-meaning

Expand full comment
4 more comments...

No posts

Ready for more?