Detecting interethnic relations with the data from social media

The ability of social media to rapidly disseminate judgements on ethnicity and to influence offline ethnic relations creates demand for the methods of automatic monitoring of ethnicity-related online content. In this study we seek to measure the overall volume of ethnicity-related discussion in the Russian-language social media and to develop an approach that would automatically detect various aspects of attitudes to those ethnic groups. We develop a comprehensive list of ethnonyms and related bigrams that embrace 97 Post-Soviet ethnic groups and obtain all messages containing one of those words from a two-year period from all Russian-language social media (N=2,660,222 texts). We hand-code 7,181 messages where rare ethnicities are over-represented and train a number of classifiers to recognize different aspects of authors’ attitudes and other text features. After calculating a number of standard quality metrics, we find that we reach good quality in detecting intergroup conflict, positive intergroup contact, and overall negative and positive sentiment. Relevance to the topic of ethnicity and general attitude to an ethnic group are least well predicted, while some aspects such as calls for violence against an ethnic group are not sufficiently present in the data to be predicted.

Репрезентация этничностей в русскоязычных социальных медиа

The paper presents the results of a study based on the Big Data para­digm analysis. The study aims at defining the features of the ethnic discourse in the Russian-­speaking social media and the place of the North Caucasus ethnic­ities in this discourse. The informational basis for the study is 2,659,849 social media publications containing ethno­nyms. The author concludes that the eth­nic discourse is full of problematic topics mainly discussed by male participants. The study shows that the ethnonyms re­lated to the North Caucasus peoples are often used in the context of crime and terrorism.