Inside the experiment: comparing pre- and post-moderation in online comment sections on reducing toxicity in digital discourse

Online comment sections are a microcosm of public debate – lively, diverse, and at times, toxic. Managing the balance between open expression and respectful dialogue has become a key challenge for digital publishers. To understand what works, researchers are now turning to field experiments with real-world partners.

At the latest Brown Bag seminar, Francisco Tomás-Valiente Jordá shared insights from a real-world experiment with an Austrian newspaper, exploring how pre- and post-moderation strategies affect online comment sections. The findings were clear: pre-moderation, which screens comments before publication, significantly reduces toxic content, without discouraging user engagement. By contrast, post-moderation, which removes harmful comments after they appear, leaves room for toxicity to influence the conversation before intervention.

Tomás-Valiente Jordá and a team that brought together Laura Bronner, Nico Berk, Gloria Gennaro, Fabrizio Gilardi, Karsten Donnay, Philip Grech, Dominik Hangartner, ran experiments with three newspapers: one in Austria and two in Switzerland. “Each partner is interested in different interventions. The Austrian newspaper was particularly keen to test pre- versus post-moderation,” Francisco Tomás-Valiente Jordá explained.

This work is part of a broader effort to understand how design choices can shape online behaviour. In related studies, the team explored the effects of moderators’ visible presence and notifications to users when their content is deleted – both interventions reduced toxicity by roughly 30%, showing consistent patterns across settings.

Pre- vs post-moderation: the core idea

“Pre-moderation means screening comments before they’re published,” Tomás-Valiente Jordá explained. “An algorithm evaluates all submissions and determines which ones need to be reviewed by moderators for approval. Post-moderation, on the other hand, allows all comments to appear immediately, but moderators can remove inappropriate ones later, often after user reports.”

By randomising the moderation style at the article level, the team could compare how the two systems influenced user behaviour. Crucially, the newspaper also granted the researchers access to unpublished comments – data that’s rarely available to academics studying online moderation. The Austrian experiment spanned nine months. “We focused on articles where randomisation was credible.” Despite topic differences – from politics to local and economic news – pre-moderation consistently lowered toxicity.

What the data revealed

The results were striking. “Pre-moderation reduced toxicity by around 25% compared to post-moderation,” said Tomás-Valiente Jordá. “And this effect wasn’t because moderators were stricter – it was because users actually wrote differently.”

When toxic comments are visible, others tend to “tone-match” and respond in kind. But when moderation filters those comments before publication, the conversation remains civil, creating a kind of positive feedback loop of respectful interaction.

“Pre-moderation stops that negative tone from ever becoming visible,” he added. “In post-moderation, by the time something is flagged and removed, the damage is already done.”

Engagement didn’t suffer

One of the most common fears among publishers is that stricter moderation might discourage participation. But this wasn’t the case. “We found no evidence that pre-moderation reduced engagement,” Tomás-Valiente Jordá said. “People kept commenting just as much. That was a big relief, because it shows you can reduce toxicity without silencing users.”

This insight has practical value for media outlets weighing the costs and benefits of moderation systems.Thus, the findings help inform how the Austrian partner approaches sensitive topics and community management.

Behind the data: methods and models

From a methodological standpoint, the research is an example of modern computational social science. “This was a field experiment, with treatment assignment randomised across articles,” explained Tomás-Valiente Jordá. “To measure toxicity, we used a fine-tuned model trained on newspaper comments from a separate online outlet.”

The team also measured other dimensions of online discourse, including deliberation quality and topic relevance, using transformer-based models. “We developed a hierarchical transformer to evaluate deliberation quality at the thread level, and a cross-encoder model to assess how on-topic each comment was,” he said. “Interestingly, pre-moderation affected toxicity but didn’t really change deliberation quality or topicality. So it’s a targeted intervention.”

A step towards healthier online conversations

At a time when digital toxicity feels inescapable, this research offers evidence-based optimism. Pre-moderation – often dismissed as too heavy-handed – may not stifle engagement after all. Instead, it can foster a more civil space for public dialogue, simply by stopping toxicity before it spreads. As Tomás-Valiente Jordá concluded, “The tone of online discussion is contagious. If you can prevent toxicity from surfacing in the first place, you change the whole conversation.”

The study offers practical guidance for media outlets: early intervention matters. By stopping harmful comments before they spread, publishers can foster more civil, constructive online conversations, without losing engagement.

Aliya Boranbayeva, Associate Communications and Events | Data Science Lab
William Lowe, Senior Research Scientist

Pre- vs post-moderation: the core idea

What the data revealed

Engagement didn’t suffer

Behind the data: methods and models

A step towards healthier online conversations

Navigation

Follow us on:

Cookie settings