New Study on AI-Powered Content Moderation Explores Machine Influence on Human Behavior

A study involving half a million comments across platforms like AOL, Sky Sports, RT, and Newsweek showed that technology could have a positive effect on the quality of the conversation. Perspective API, the tool used for the test, applies machine learning models to detect potentially toxic comments, and can ‘nudge’ users to reevaluate their messages.

Yisela Alvarez Trentini
5 min readSep 30, 2020
Image courtesy of Christin Hume.

The extensive study was carried by OpenWeb and Google’s Jigsaw and involved over 400,000 comments across several publishers. The goal was to measure the impact of a “nudge” that could target offensive or profane comments on the platforms and encourage their creators to reevaluate them.

OpenWeb is a leading audience engagement and conversation platform founded in 2012 and backed by investors that share their vision, such as Insight Partners, AltaIR Capital, Index Ventures, and ScaleUp. Their mission is to eradicate hostility and toxicity from online environments, allowing all voices to be heard. They do this by building conversation technologies that can help publishers provide safer platforms to promote their brands.

Jigsaw is a Google unit that deals with emerging technology threats. They develop cutting-edge research to identify and neutralize threats that can destabilize the internet and affect our society. Their tools can help journalists, activists, and civil society fight disinformation campaigns, reducing online toxicity, and creating spaces for healthy conversation.

The study used Perspective API, a system developed by Jigsaw, to simplify the moderation process by predicting whether a comment is abusive or toxic. Perspective ML models have been steadily running on millions of comments at OpenWeb.

Promising Results

Over 50,000 users and almost half a million comments were analyzed over three months (May to July 2020).

The “nudge” concept is based on behavioral sciences, which propose that positive reinforcement and indirect suggestions are the best ways to influence the behavior of groups or individuals.

Every time a comment was submitted in one of the sites, it went through the moderation algorithm and was compared with the publisher’s Community Guidelines. Before it was sent to a human moderator, a message was presented back to the user, asking them to take another look at what they wrote.

The concept was tested against a diverse group of audiences, interests, and industry verticals, and using both a test group and a control group. A variety of messages were tried, one with motivation and another with direct negative feedback. These messages were also optimized over time based on user responses.

During the study, a third of commenters who received a nudge edited their comments, and more than half of them made them immediately permissible (meaning they weren’t automatically rejected). 45% of users removed or replaced the toxic element, while 8% reshaped their entire comment.

Overall, the study reported a 12.5% lift in civil and thoughtful comments being published. This also caused increased engagement and retention rates at the individual level, lifting community approval and safety rates by 2.5% to 4.5% and 5% to 6%, respectively.

The AI-Powered Content Moderation Market

In recent years, advances in artificial intelligence (AI) have been mainly driven by machine learning — the ability for computer systems to predict outcomes based on common features from complex data inputs. As businesses expand their content offers and connect more closely with users, moderation has become an essential piece of any online strategy.

In 2019, the Global Content Moderation Solutions Market size was $9.3 billion. By the end of 2026, it’s expected to reach $18.3 billion, with a compound annual growth rate (CAGR) of 9.9% for the period 2021–2026. The largest regions are Europe, followed by North America and Asia-Pacific, with top players including Basedo, Viafoura, Appen, Microsoft Azure, Cogito, and others.

There’s a broad range of harmful content that needs moderation — including but not restricted to hate speech, child abuse material, extreme or cruel content, and spam. This presents several challenges.

A lot of moderation has to be done before submissions are approved, but checks are also required after they’ve been posted. Because the language used for online content is constantly evolving, users adjust the words used to subvert moderation, and systems need to be kept up to date.

Some of the harmful content that populates the web can be identified by analyzing it alone, but the majority of it requires an understanding of the context. Interpreting this context (and doing so in a variety of formats such as text, video, and audio) can be challenging for both humans and machines.

Pros and Cons of AI Moderation

AI-powered systems have large capabilities; however, they can still make errors.

Even as the technology advances thanks to new algorithms and the increasing availability of low-cost computational power, there’s always a lack of transparency associated with neural networks. Many of the mentioned algorithms can’t fully explain the reasoning for their decisions, and society as a whole doesn’t yet trust AI systems when making complex decisions.

There is also a risk that bias is being introduced into these systems when the data that gets incorporated is unrepresentative or includes subconscious prejudices from human developers.

This week, for example, Twitter had to apologize for an image-cropping algorithm that focused on white faces over black ones. These concerns are being actively addressed in an effort to improve algorithms; although Twitter said they had tested the system for bias, the company recognized they ‘hadn’t gone far enough.’

One advantage that AI moderation provides is that it can improve the effectiveness of human moderators by prioritizing the content to be revised. It can also limit the level of harmful content moderators are exposed to, for example, blurring parts of images. This is particularly urgent as studies have found that moderators can develop PTSD-like symptoms while doing their job.

The fast-paced and highly-competitive nature of online platforms means that businesses require moderation systems that are costly and take time. Smaller organizations might have trouble accessing skilled AI developers and datasets. That’s why the growing content moderation services sector can offer solutions to a range of sites.

OpenWeb and Jigsaw’s study shows that machines and humans can effectively collaborate in the process of moderation and perform better when limited to a certain range of capabilities. If automatic tools are empowered by human decisions, we can look forward to a healthier internet for everyone.

This article was originally published in Startup Savant on Sunday, September 27, 2020. Link:



Yisela Alvarez Trentini

Anthropologist & User Experience Designer. I write about science and technology. Robot whisperer. VR enthusiast. Gamer. @yisela_at