Go Back

35 Percent of the Web is AI generated Now, Stanford-Imperial Study Finds

35 Percent of the Web is AI generated Now, Stanford-Imperial Study Finds

Murugaverl Mahasenan

Murugaverl Mahasenan

Make Catenaa preferred on (opens in a new tab)

Catenaa, Wednesday, April 29, 2026-A new study from Stanford University and Imperial College London finds that 35 percent of newly published websites are now AI-generated or AI-assisted, marking a sharp shift in how online content is created following the rise of large language models.

Rapid content shift detected

The research analyzed 33 months of archived web pages from the Internet Archive. It found that AI involvement in web publishing was near zero before November 2022 and rose rapidly after the launch of ChatGPT.

By mid-2025, more than one in three new websites showed signs of AI generation or assistance. Researchers described the speed of adoption as unusually fast for global content systems.

The dataset included automated classification using AI detection tools to assess linguistic patterns and content origin.

Web language becomes uniform

The study found measurable changes in how online content is written. Pages generated with AI assistance showed higher similarity in structure and phrasing compared with human-written content.

Researchers reported a 33 percent increase in semantic similarity across AI-generated pages. This suggests a reduction in variation in how ideas are expressed online.

The findings indicate that language models tend to produce content that converges toward similar outputs based on training data patterns.

AI-generated text also showed significantly higher levels of positive tone. The study recorded sentiment scores more than 100 percent higher than human-written content.

Researchers linked this trend to reinforcement learning systems that reward agreeable and non-confrontational responses. This leads to more neutral or upbeat language in generated text.

The effect may reduce expression of disagreement or critical tone across large portions of web content.

No accuracy decline detected

Despite concerns, the study found no clear evidence that AI-generated content increases factual errors online. Accuracy levels remained statistically similar between human and AI-assisted pages.

Researchers also found no strong evidence of widespread stylistic flattening. Individual writing styles were not significantly erased by AI use. Public perception, however, differs. Surveys showed most respondents believe AI is reducing originality and accuracy online.

Model training risks rise

Researchers warned that the growing share of AI-generated content could affect future AI systems. Models trained on web data may increasingly learn from earlier AI outputs.

This raises concerns about reduced diversity in training data over time. The issue is known in research as potential model collapse.

At current levels, AI-generated content is no longer a marginal factor in web ecosystems. It has become a structural component of new online data.

Monitoring systems proposed

The research team is working with the Internet Archive to develop ongoing monitoring tools. These systems would track AI content levels in real time rather than through periodic studies.

The goal is to better understand how quickly AI is reshaping digital information environments.

The study highlights a rapid structural shift in online publishing, with AI now responsible for a large share of new web content and raising long-term questions for digital information systems.

The rise of large language models since 2022 has transformed content production across industries. Tools capable of generating articles, websites and marketing material have lowered barriers to publishing.

This has led to a rapid increase in automated content across blogs, commercial sites and information platforms. Researchers have been tracking how this shift affects language diversity and information quality.

Early concerns focused on misinformation and factual accuracy. More recent research suggests stylistic and structural changes may be more pronounced than accuracy issues.

The internet is now entering a phase where human and machine-generated content are deeply intertwined, creating new challenges for data integrity and future AI training systems.