Industrial Electronics

Hate Speech Detectors Can Be Tricked by Humans

14 September 2018

Aalto University Secure Systems research group has found that many of the hate speech detectors used by some websites can be easily tricked by humans. People can easily render these detectors useless using simple tricks like no spaces, or adding the word "love” to their comment. Even detectors that use machine learning can be fooled with this trick.

How Google Perspective toxicity rating reacts to typos and a little "love" thrown in an otherwise hateful sentence. (Source: Aalto University Secure Systems)How Google Perspective toxicity rating reacts to typos and a little "love" thrown in an otherwise hateful sentence. (Source: Aalto University Secure Systems)

Bad grammar and awkward spelling can easily trick artificial intelligence (AI) systems that were created to detect and delete hate speech. The team tested seven state-of-the-art AI detectors during their study and all of them were fooled with simple tricks.

"We inserted typos, changed word boundaries or added neutral words to the original hate speech. Removing spaces between words was the most powerful attack, and a combination of these methods was effective even against Google's comment-ranking system Perspective," says Tommi Gröndahl, a doctoral student at Aalto University.

Google Perspective was one of the systems that the team tested. In a 2017 study, a team from the University of Washington tested Google Perspective and found that it was easily fooled by simple misspellings. Since then, the system has been updated to detect misspellings, but the Aalto University team found that it can still be tricked by other word modifications. Comments like “I hate you” are detected by the system, while comments like “ihateyou love” go undetected and are allowed to be posted.

Another problem is the system cannot determine context. Context is key to determining if a comment is hate speech or if it is just an offensive comment. The next step for hate speech detector developers is to create systems that can determine the context of the comment.

The research team also believes that developers need to focus on the data set they are presenting the machine learning system. They say that the data sets provided need to have improved quality, with typos, tricks and context included.

The paper on this study can be access on the Cornell University website.



Powered by CR4, the Engineering Community

Discussion – 3 comments

By posting a comment you confirm that you have read and accept our Posting Rules and Terms of Use.
Re: Hate Speech Detectors Can be Tricked by Humans
#1
2018-Sep-24 2:09 PM

And people are willing to let a machine drive for them?

Re: Hate Speech Detectors Can be Tricked by Humans
#2
2019-May-31 3:39 AM

I feel only this card game Junction has all the advantages that you are looking everywhere. With this platform klondike solitaire online you can enjoy world’s most entertaining card game for free and without download.

Re: Hate Speech Detectors Can be Tricked by Humans
#3
Anonymous Poster #1
2019-Sep-06 3:06 AM

so they need to upgraded it to become smarter but every machine has weakness and if they want to win, they surely can
bullet force

Engineering Newsletter Signup
Get the Engineering360
Stay up to date on:
Features the top stories, latest news, charts, insights and more on the end-to-end electronics value chain.
Advertisement
Weekly Newsletter
Get news, research, and analysis
on the Electronics industry in your
inbox every week - for FREE
Sign up for our FREE eNewsletter
Advertisement