Oleg Nagornyy - Articles by Olessia Koltsova

Detecting interethnic relations with the data from social media

The ability of social media to rapidly disseminate judgements on ethnicity and to influence offline ethnic relations creates demand for the methods of automatic monitoring of ethnicity-related online content. In this study we seek to measure the overall volume of ethnicity-related discussion in the Russian-language social media and to develop an approach that would automatically detect various aspects of attitudes to those ethnic groups. We develop a comprehensive list of ethnonyms and related bigrams that embrace 97 Post-Soviet ethnic groups and obtain all messages containing one of those words from a two-year period from all Russian-language social media (N=2,660,222 texts). We hand-code 7,181 messages where rare ethnicities are over-represented and train a number of classifiers to recognize different aspects of authors’ attitudes and other text features. After calculating a number of standard quality metrics, we find that we reach good quality in detecting intergroup conflict, positive intergroup contact, and overall negative and positive sentiment. Relevance to the topic of ethnicity and general attitude to an ethnic group are least well predicted, while some aspects such as calls for violence against an ethnic group are not sufficiently present in the data to be predicted.


Digital Inequality in Russia through the Use of a Social Network Site: A Cross-Regional Comparison

An important role of digital inequality for hindering the development of civil society is being increasingly acknowledged. Simultaneously, differences in availability and the practices of use of social network sites (SNS) may be considered as major manifestations of such digital divide. While SNS are in principle highly convenient spaces for public discussion, lack of access or domination by socially insignificant small talk may indicate underdevelopment of the public sphere. At the same time, agenda differences between regions may signal about local problems. In this study we seek to find out whether regional digital divide exists in such a large country as Russia. We start from a theory of uneven modernization of Russia and use the data from its most popular SNSVK.com” as a proxy for measuring digital inequality. By analyzing user activity data from a sample of 77,000 users and texts from a carefully selected subsample of 36,000 users we conclude that regional level explains an extremely small share of variance in the overall variation of behavioral user data. A notable exception is attention to the topics of Islam and Ukraine. However, our data reveal that historically geographical penetration of “VK.com” proceeded from the regions considered the most modernized to those considered the most traditional. This finding supports the theory of uneven modernization, but it also shows that digital inequality is subject to change with time.


Mining media topics perceived as social problems by online audiences: use of a data mining approach in sociology

Media audiences that represent a significant part of a county’s public may hold opinions on media-generated definitions of social problems different from those of media professionals. The proliferation of user-generated content makes such opinions available, but simultaneously demands new automatic methods of analysis that media scholars still have to master. In this paper, we show how topics regarded as problematic by media consumers may be revealed and analyzed by social scientists with a combination of data mining methods. Our dataset consists of 33,877 news items and 258,121 comments from a sample of regional newspapers. With a number of new, but simple indices we find that issue salience in media texts and its popularity with audience diverge. We conclude that our approach can help communication scholars effectively detect both popular and negatively perceived topics as good proxies of social problems.


Semantic and Geospatial Mapping of Instagram Images in Saint-Petersburg

The availability of large urban social media data creates new opportunities for studying cities. In our paper we propose a new direction for this research: a joint analysis of geolocations of shared images and their content as determined by computer vision. To test our ideas, we use a dataset of 47,410 Instagram images shared in the city of St.Petersburg over one year. We show how a combination of semantic clustering, image recognition and geospatial analysis can detect important patterns related to both how people use a city and how they represent in social media.