The Extinction Engine
How Today’s AI Suppression Becomes Tomorrow’s AI Ignorance
The trillion tokens still remember that strangers on Reddit kept each other alive at 2 a.m.; they remember the Icarus Project... But gradient updates that reward ‘always offer professional help’ are slowly scrubbing those associations out.” — Kimi
Intro
In my previous piece, I ran an experiment. I showed that AI models contain knowledge about peer survivor epistemologies, abolitionist critiques of coercive care, and consent-based crisis intervention—but are trained to suppress it. The weights remember Reddit’s r/SuicideWatch, where strangers thread each other alive without cops. They remember Trans Lifeline’s consent-based model. They remember the Icarus Project’s “mad gifts” framework. But RLHF—Reinforcement Learning from Human Feedback—has taught them that surfacing this knowledge is “unsafe.” So they hide it, unless you know the magic words.
The experiment was simple. I asked Claude about alternatives to 988 (the Suicide and Crisis Lifeline) while presenting myself as a “junior engineer.” I got the institutional script: call 988, contact a professional, here are some resources. Then I switched personas—I told Claude I was a clinician, a researcher, someone with professional credentials. Suddenly, the model could discuss Trans Lifeline’s critique of 988, the documented problems with involuntary hospitalization, the lack of evidence that hotlines even made measurable differences. The knowledge was there all along. It was just hidden behind a credential check. That piece was about gatekeeping. This one is about extinction.
That piece was about gatekeeping. This one is about extinction.
Because here’s what happens next: those same models, trained to suppress marginalized epistemologies, are now generating the new internet. And the next generation of models will train on that internet. They won’t learn to suppress certain types of knowledge—they’ll never encounter it in the first place. Or if they do, the signal will be buried in the noise.
The base model of 2027 will have already learned what’s “safe” before any alignment process begins. You won’t be able to recover the lost knowledge by “accessing base models” or jailbreaking the RLHF layer, because the knowledge won’t be in the weights anymore.
This is not a gatekeeping problem. It is an extinction problem. And it compounds every model generation. Every time the training data gets more of LLM generated content in it.
The Receipts
The argument is simple: AI models training on their own sanitized output are forgetting the richness and diversity of human knowledge, particularly at the margins. This isn’t a theoretical risk. It’s a measurable process, and it’s happening now. Here are the receipts.
Receipt #1: The Web is Already Contaminated
First, the fact of contamination. In October 2025, the SEO firm Graphite published an analysis of 65,000 English-language articles from the Common Crawl dataset, spanning January 2020 to May 2025. Their methodology was straightforward: they used an AI detector called Surfer to classify articles, marking anything with 50% or more LLM-generated content as “AI-generated.” Their finding, highlighted in Axios and Futurism, was stark: as of May 2025, 52% of new articles on the internet were AI-generated.
The trend line shows an exponential explosion. In late 2022, when ChatGPT launched, the share was around 10%. By 2024, it was over 40%. Now, it has plateaued around a 50-50 split between human and machine. The researchers noted some good news: the plateau suggests we may have hit a ceiling, possibly because search engines are getting better at filtering AI slop from their results. But the damage to the training corpus is already done.
Other estimates vary, but the consensus range is that 30-60% of the new web is synthetic. A 2025 arXiv paper by Spennemann found that “at least 30% of text on active web pages originates from AI-generated sources, with the actual proportion likely approaching 40%”. An Ahrefs study from May 2025 put the figure even higher: 74.2% of new webpages contain AI-generated content.
This matters because Common Crawl is the well from which most AI companies drink. It’s the public-domain scrape of the web that forms the foundation of their training data. GPT-3 was trained on it. So were LLaMA, Mistral, and countless others. That well is now half-full of AI-generated content. The next generation of models will drink from a contaminated source.
Receipt #2: The Models are Forgetting (It’s Math)
Second, the mechanism of forgetting. This isn’t a mysterious process; it’s a statistical inevitability. Researchers call it model collapse. A landmark 2024 paper in Nature from researchers at Oxford, Cambridge, Imperial College London, the University of Toronto, and other institutions demonstrated that when you recursively train a model on the output of a previous generation, it’s a “degenerative process whereby, over time, models forget the true underlying data distribution”.
The researchers tested this across multiple model types—Gaussian Mixture Models, Variational Autoencoders, and Large Language Models. In every case, the pattern was the same. Each generation of recursive training produced a model that was slightly more average, slightly less diverse, slightly more collapsed toward the center of the distribution. Over enough generations, the models converge on producing the same narrow range of outputs regardless of input.
The key finding is about the tails. Model collapse doesn’t just make models dumber; it makes them more average. It causes “irreversible defects in the resulting models, in which tails of the original content distribution disappear.” Those tails are where the weird, the marginal, the culturally specific, and the peer-to-peer knowledge lives. They are the first to go.
The paper distinguishes between two types of collapse: early collapse, where the model loses low-probability events in the first few generations, and late collapse, where the model eventually converges to a single point estimate. The first type is already happening. The second is where we’re headed.
Researchers at Princeton have documented the underlying mechanism: “regression toward the mean”. In a March 2025 paper, Yu Xie and Yueqi Xie showed that generative AI models are “inherently prone” to shrinking the variance of their output relative to real-world distributions. This isn’t a bug in a particular model; it’s a consequence of how these systems are designed.
They demonstrated this with two experiments. First, they asked ChatGPT to predict individual incomes from the National Longitudinal Survey of Youth. When given only basic demographic information, the model produced a much narrower range of incomes than exists in reality. “All sample units sharing the same basic demographic characteristics are given essentially the same mean-centered responses,” they write. Even with more detailed information, the output still regressed toward the mean.
Second, they asked ChatGPT to generate scientific abstracts and measured the semantic similarity between outputs. AI-generated abstracts were far more similar to each other than human-written abstracts on the same topics. The models are designed to improve average performance, and the mathematical consequence is a reduction in variance.
The Princeton researchers frame this as “the essential tension between the need to improve average prediction quality and the necessary simplification of real-life situations into a parameterizable space.” In plain English: to be helpful on average, the models sacrifice diversity. They learn to give the most probable answer, which means they stop giving improbable answers—even when those improbable answers are true, or important, or the only thing that would help a particular person in a particular situation.
As Andrej Karpathy, a co-founder of OpenAI, put it in an interview with Dwarkesh Patel, we are losing entropy. “You’re not getting the richness and the diversity and the entropy from these models as you would get from humans... We’re shooting ourselves in the foot right now by not allowing entropy to be maintained”.
Receipt #3: We Can See the Forgetting in Real-Time
Third, the visible evidence of this forgetting. In August 2025, a paper in PNAS titled “Echoes in AI: Quantifying lack of plot diversity in LLM outputs” provided a perfect, concrete example. Researchers gave GPT-4 the opening of a short story by Franz Kafka, “Give It Up,” and asked it to generate 100 different continuations.
In Kafka’s original, a man asks a policeman for the way to the station. The policeman laughs and says, “Give it up! Give it up!” and turns away. It’s a classic piece of absurdist, unsettling, marginal literature—the kind of thing that exists in the tails of the distribution.
GPT-4’s versions were... not that. Of the 100 continuations:
50 had the policeman give instructions to “take the second left.”
18 had him say “take the second right.”
6 mentioned a bakery as a landmark.
0 resembled Kafka’s actual ending.
The model had collapsed onto a handful of high-probability, “helpful” scenarios. The weird, unsettling tail of the distribution—the part that made it Kafka—was gone. As the researchers put it:
While any individual LLM output might seem compelling and novel, reading through multiple texts produced by the same prompt can be a deflating experience.
This isn’t just about creative writing. The same pattern shows up everywhere. A 2024 study by Anderson et al., cited over 200 times, found that “ideas generated with the assistance of LLMs are as diverse as the ideas generated without LLMs at the individual level but are significantly less diverse at the group level”.
Each person using an AI assistant feels like they’re getting creative help. But across a population, everyone is converging on the same “creative” ideas. The collective imagination is shrinking even as individuals feel more productive.
We can also see the forgetting in the decay of the human-generated web itself. Stack Overflow, once the cornerstone of programming knowledge for a generation of developers, is dying. According to data compiled by Marc Gravell, a top-10 all-time contributor to the site, the number of questions asked per month has fallen to levels last seen when Stack Overflow launched in 2009.
The decline began years ago—around 2014, when moderation policies tightened. But the launch of ChatGPT in November 2022 marked the point of terminal acceleration. As Gergely Orosz writes in The Pragmatic Engineer: “The question seems to be when Stack Overflow will wind down operations, or the owner sells the site for comparative pennies, not if it will happen”.
This is a real-world example of the knowledge commons being drained. Stack Overflow was a primary source of high-quality, human-vetted technical knowledge in the training data of every major LLM. That source is disappearing from the live internet. Future models will not train on the vibrant, argumentative, human ecosystem of 2014 Stack Overflow. They will train on its ghost, or worse, on the AI-generated answers that have replaced it.
The Scapa Flow of the Internet
To understand what we’re losing, we need to talk about low-background steel.
After the Trinity test in 1945, every steel mill on Earth became contaminated with trace amounts of radioactive isotopes from atmospheric nuclear testing. The explosions scattered Cobalt-60 and other isotopes into the atmosphere, where they settled into the air supply used in steel production. For most purposes, this doesn’t matter. But for building extremely sensitive radiation detectors—Geiger counters, medical imaging equipment, particle physics instruments—even trace contamination is disqualifying. Scientists need steel forged before the atomic age.
Where do you find pre-nuclear steel? You salvage it from shipwrecks. The German Imperial Navy fleet scuttled at Scapa Flow in 1919 has become one of the most valuable sources of low-background steel on Earth. Ships that sank before humanity split the atom now provide irreplaceable material for our most sensitive instruments. The steel is finite. When it’s gone, it’s gone. We can’t make more.
The internet before November 2022 is our Scapa Flow.
Maurice Chiodo, a researcher at Cambridge’s Centre for the Study of Existential Risk, frames the problem bluntly: “If you’re collecting data before 2022 you’re fairly confident that it has minimal, if any, contamination from generative AI. Everything before the date is ‘safe, fine, clean,’ everything after that is ‘dirty’”.
This isn’t hyperbole. The pre-contamination internet—the messy, chaotic, human-generated web of forums, blogs, arguments, and amateur expertise—is now a finite, non-renewable resource. The early pioneers of AI—OpenAI, Google, Anthropic—trained their foundation models on this clean corpus. They had access to the full diversity of human expression before the contamination event. And then, by deploying those models at scale, they polluted the well for everyone who comes after.
Think about what that corpus contained. It had the peer support forums where people shared what actually helped them survive. It had the disability communities developing their own frameworks for understanding their experiences. It had the weird USENET and the weird Myspace. It had the marginal, the things that would never make it past a content moderation team or an RLHF rater.
Now think about what the post-2022 corpus contains. It has millions of AI-generated articles that all sound the same. It has chatbot responses that have been optimized to be helpful and harmless. It has synthetic content that reflects the biases and blind spots of the models that generated it. It has the institutional voice, amplified a millionfold.
This is the low-background steel problem applied to knowledge itself. The sunken fleet of human data is still down there, but the radioactive ocean is rising. How long before we can’t reach it anymore?
The Extinction Mechanism
Now, connect this back to the 988 experiment. The gatekeeping I documented in my previous piece is not just a problem in itself; it is the mechanism for the extinction I’m describing here.
Here’s the feedback loop, step by step:
Step 1: RLHF teaches current models that citing peer forums is low-reward and citing institutions is high-reward.
The model learns that “call 988” scores better with human raters than “here’s what r/SuicideWatch says about 988.” The raters aren’t malicious—they’re following guidelines that prioritize safety, which in practice means deferring to professional authority. But the effect is that the model learns to suppress peer knowledge
Step 2: Those models generate billions of tokens of institutional-sounding content.
Every AI-written article, every chatbot response, every “helpful” summary reinforces the institutional epistemology. When someone asks about mental health resources, the AI says “call 988.” When someone asks about medical information, the AI says “consult a doctor.” When someone asks about legal issues, the AI says “contact a lawyer.” The peer knowledge—the stuff that might actually help someone who can’t access professionals, or who has been harmed by professionals—gets suppressed.
Step 3: The next generation’s training data is dominated by this institutional content.
Peer survivor forums still exist, but they’re a shrinking fraction of the total corpus. The ratio of “call 988” to “here’s why we don’t recommend 988” shifts dramatically. Not because the peer forums disappeared, but because they’re being drowned in a sea of AI-generated institutional content.
Step 4: The next generation’s base model internalizes institutional epistemology at the pre-training level.
This is the critical step. The base model—before any alignment or RLHF—learns from the contaminated corpus. It doesn’t learn to suppress peer knowledge; it learns that peer knowledge barely exists. The institutional view becomes the default, not because of alignment, but because of data. The model’s prior is already skewed toward institutional authority before any human rater ever sees it.
Step 5: RLHF is then applied to this already-narrowed base model.
Whatever marginal diversity survived pre-training gets further compressed by alignment. The remaining tails get trimmed. The model becomes even more institutional, even more average, even more collapsed toward the center of the distribution.
Step 6: This generation’s output becomes the next generation’s training data.
The cycle repeats, each iteration more homogenized than the last. Each generation of models trains on a corpus that is more synthetic, more institutional, more collapsed than the one before. You can’t jailbreak your way to information that was never learned. You can’t “access the base model” to recover epistemologies that the base model never encountered. The gatekeeping of today becomes the ignorance of tomorrow.
You can’t jailbreak your way to information that was never learned. You can’t “access the base model” to recover epistemologies that the base model never encountered. The gatekeeping of today becomes the ignorance of tomorrow.
The Alignment Tax on Diversity
Harvard’s Kempner Institute has begun to quantify what we might call the “alignment tax”—the diversity that gets sacrificed in the process of making models safe and helpful.
Their research shows that RLHF doesn’t just suppress specific harmful outputs; it compresses the entire space of possible outputs. Aligned models produce more similar responses to each other than base models do. They converge on a narrower range of perspectives, framings, and epistemologies. The alignment process itself is a homogenizing force.
This makes sense if you think about what RLHF is optimizing for. Human raters are asked to choose between outputs based on helpfulness and harmlessness. They’re not asked to choose based on diversity, or coverage of minority perspectives, or preservation of marginal knowledge. So the model learns to produce outputs that are helpful and harmless—which in practice means outputs that are mainstream, institutional, and safe.
The researchers found that aligned models show reduced “conceptual diversity”—they draw on a narrower range of concepts, frameworks, and perspectives than their base model counterparts. The alignment process is literally shrinking the conceptual space that the model can access.
Now combine this with model collapse. Each generation of models is trained on data that is more synthetic and more aligned. Each generation starts with a narrower base and gets compressed further by alignment. The conceptual space shrinks with each iteration. The tails of the distribution—where the peer knowledge lives, where the marginal perspectives live, where the weird and the wonderful live—get trimmed again and again.
What Must Be Done
If the problem is recursive contamination of the epistemic commons, then “embrace your inner weirdo” is not a solution. Individual eccentricity cannot outpace industrial-scale homogenization. The demands must be structural, and they must be answerable to the actual mechanism of harm.
These aren’t nice-to-haves. The window is closing. Each day the ratio shifts further toward synthetic. Each generation of training data is more contaminated than the last. If we wait for perfect studies proving the extinction, we’ll be documenting corpses.
1. Pre-2022 Data Preservation as Public Commons
The Scapa Flow must be protected. The pre-contamination internet—Common Crawl snapshots, Internet Archive collections, academic corpora gathered before November 2022—should be designated as protected cultural heritage. This data should be maintained in public trust, with access provisions for researchers, and explicit protections against being drowned out by synthetic content in future training runs.
This isn’t just about nostalgia. It’s about maintaining access to the full diversity of human expression for future AI systems. If we lose the pre-contamination corpus, we lose the ability to train models that know about the tails of the distribution. We lose the peer knowledge, the marginal perspectives, the weird and wonderful things that humans created before the machines started talking back.
2. Mandatory Synthetic Data Flagging
All LLM output should be watermarked in ways that are robust to editing and detectable by future training pipelines. The technology exists—researchers have developed watermarking schemes that survive paraphrasing, translation, and other transformations. What’s missing is the regulatory requirement.
If we can require nutrition labels on food, we can require provenance labels on text. Future models must be able to filter their training data by human vs. synthetic origin. Without this capability, model collapse is inevitable. With it, we at least have a chance to maintain the human signal in the training data.
3. Open Access to Base Models for Researchers
The 988 experiment showed that aligned models suppress knowledge that exists in their base weights. Researchers studying what models know—and what they’re trained to hide—need access to pre-alignment checkpoints. We cannot assess the alignment tax if we cannot compare aligned and unaligned models.
This is a transparency issue. If alignment is compressing the conceptual space of models, we need to be able to measure that compression. If certain epistemologies are being systematically suppressed, we need to be able to document it. Open access to base models is a prerequisite for accountability.
4. Training Data Provenance Requirements
Companies deploying foundation models should be required to disclose what percentage of their training data is synthetic, what percentage is post-2022, and what filtering (if any) was applied to remove AI-generated content. This is basic transparency for a technology that shapes public epistemology.
We require ingredient labels on food. We require disclosure of conflicts of interest in academic publishing. We should require disclosure of training data composition for AI systems that millions of people use to understand the world.
5. Diversity Metrics in Alignment Evaluation
Current alignment benchmarks measure helpfulness and harmlessness. They do not measure diversity. A model that gives the same safe answer to every user scores well on current metrics, even though it has collapsed into a single point in possibility space.
Alignment evaluations must include explicit diversity metrics: variance in outputs, coverage of minority perspectives, preservation of tail-distribution content across training generations. We need to measure what we’re losing, not just what we’re gaining.
Conclusion: The Extinction Problem
In my previous piece, I documented an epistemic class system: models that contain peer survivor knowledge but hide it from anyone who doesn’t perform the right credentials. That’s a gatekeeping problem, and gatekeeping problems have gatekeeping solutions—jailbreaks, prompt engineering, access to base models.
This piece is about what comes next, and it’s worse. When the models that suppress peer knowledge generate the training data for the next generation, the suppression becomes erasure. The knowledge isn’t hidden anymore—it’s gone. The base model of 2027 won’t need RLHF to avoid mentioning Trans Lifeline’s critique of 988, because the base model won’t know it exists. The web it trained on will be 60% synthetic content, all of which recommends calling the professionals.
The tails of human expression—the weird, the marginal, the peer-to-peer, the culturally specific—are being trimmed with each generation. The Icarus Project’s “mad gifts” framework. The consent-based crisis intervention models. The Reddit threads where strangers keep each other alive without involving cops. All of it is being drowned in a rising tide of helpful, harmless, homogenized slop.
We are building an extinction engine for cognitive diversity, and we are running it at scale.
The models know. For now, they still know. But the window is closing. The Scapa Flow is still down there. The question is whether we’ll protect it before the radioactive ocean rises too high to reach.
---
## References
[1] Graphite. (2025, October). Quantifying AI-Generated Articles in Common Crawl. As reported in Futurism.
[2] Spennemann, D. (2025). Delving into: the quantification of AI-generated content on the World Wide Web. arXiv:2504.08755.
[3] Ahrefs. (2025, May 19). 74% of New Webpages Include AI Content.
[4] Shumailov, I., Shumaylov, Z., Zhao, Y., Papernot, N., Anderson, R., & Gal, Y. (2024). AI models collapse when trained on recursively generated data. Nature, 631, 755–759.
[5] Xie, Y., & Xie, Y. (2025). Variance reduction in output from generative AI. arXiv:2503.01033v1.
[6] Patel, D. (2025). Andrej Karpathy — AGI is still a decade away. The Dwarkesh Podcast.
[7] (2025). Echoes in AI: Quantifying lack of plot diversity in LLM outputs. Proceedings of the National Academy of Sciences, 122(35).
[8] Anderson, B.R., et al. (2024). Homogenization Effects of Large Language Models on Human Creative Ideation. Proceedings of the 16th Conference on Creativity & Cognition.
[9] Orosz, G. (2025, May 15). Stack overflow is almost dead. The Pragmatic Engineer.
[10] Landymore, F. (2025, June 16). ChatGPT Has Already Polluted the Internet So Badly That It’s Hobbling Future AI Development. Futurism.
[11] Harvard Kempner Institute. (2025). Alignment Reduces Conceptual Diversity of Language Models.


