Right-leaning political figures fuel online hate
Different approaches to addressing labeling bias in hate speech datasets. The traditional Machine learning approach increases the size of the training dataset by adding more labeled rows with the same labeling definition, leading to additional bias to that labeling criteria. Our novel multi-task learning approach allows for increasing the number of datasets and definitions in the training pipeline for a more general representation. Credit: Computer Speech & Language (2024). DOI: 10.1016/j.csl.2024.101690

Researchers have developed a new way to automatically detect hate speech on social media platforms more accurately and consistently using a new multi-task learning (MTL) model; a type of machine learning model that works across multiple datasets.

The spread of abusive hate speech online can deepen political divisions, marginalize , weaken democracy and trigger real-world harms, including an increased risk of domestic terrorism.

Associate Professor Marian-Andrei Rizoiu, Head of the Behavioural Data Science Lab at the University of Technology Sydney (UTS) is working on the frontline in the fight against online misinformation and hate speech. His combines computer and social sciences, to better understand and predict human attention in the online environment, including the types of speech that influence and polarize opinion on digital channels.

"As social media becomes a significant part of our daily lives, automatic identification of hateful and abusive content is vital in combating the spread of harmful content and preventing its ," said Associate Professor Rizoiu.

"Designing effective automatic detection of hate speech is a significant challenge. Current models are not very effective in identifying all the different types of hate speech, including racism, sexism, harassment, incitement to violence and extremism.

"This is because current models are trained on only one part of a and tested on the same dataset. This means that when they are faced with new or different data, they can struggle and don't perform consistently."

Associate Professor Rizoiu outlines the new model in the paper, "Generalizing Hate Speech Detection Using Multi-Task Learning: A Case Study of Political Public Figures," published in Computer Speech & Language, with co-author and UTS Ph.D. candidate Lanqin Yuan.

A multi-task learning model is able to perform multiple tasks at the same time and share information across datasets. In this case, it was trained on eight hate speech datasets from platforms like Twitter (now X), Reddit, Gab, and the neo-Nazi forum Stormfront.

The MTL model was then tested on a unique dataset of 300,000 tweets from 15 American public figures—such as former presidents, conservative politicians, far-right conspiracy theorists, media pundits, and left-leaning representatives perceived as very progressive.

The analysis revealed that abusive and hate-filled tweets, often featuring misogyny and Islamophobia, primarily originate from right-leaning individuals. Specifically, out of 5,299 abusive posts, 5,093 were generated by right-leaning figures.

"Hate speech is not easily quantifiable as a concept. It lies on a continuum with offensive speech and other such as bullying and harassment," said Rizoiu.

The United Nations defines hate speech as "any kind of communication in speech, writing or behavior, that attacks or uses pejorative or discriminatory language concerning a person or a group based on who they are," including their religion, race, gender or other identity factor.

The MTL model was able to separate abusive from , and identify particular topics, including Islam, women, ethnicity and immigrants.

More information: Lanqin Yuan et al, Generalizing Hate Speech Detection Using Multi-Task Learning: A Case Study of Political Public Figures, Computer Speech & Language (2024). DOI: 10.1016/j.csl.2024.101690

Citation: Multi-task learning model enhances hate speech identification (2024, October 14) retrieved 14 October 2024 from https://techxplore.com/news/2024-10-multi-task-speech-identification.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.