In recent years, news on migration have been part of our everyday lives. Our analysis studies 10,330 pictures extracted from 42,845 articles published in Hungarian online media between 27th September 2014 and 11th June 2016. Our project aims to automatically identify the most significant topics of the more than ten thousand images.
To be able to build a topic model from the images, the information incorporated by the images had to be converted into textual information. In order to accomplish this task, the labelling service of Clarifai was applied. Then labels were treated as documents, from which topics were extracted with Latent Dirichlet Allocation (LDA). Based on the results of the topic model, the images were organized into seven groups. The first interactive visualization (Topics and their labels ) represents how strongly the topics are related. In other words, it gives an idea of how many times the topics were in the top 3 regarding each image.
We were not only interested in the main topics of the images, but also in the sex and age of the people they represent. Therefore after studying many freely available devices, we decided to create our own sex and age classifier. The results of age and sex identification by topics are shown by interactive diagrams. Furthermore, the correlation between our findings and Eurostat data regarding demography is also visualized.
Finally, we intended to identify what emotions of the refugees are documented by the images, hence we trained our own emotion recognition algorithm. Two visualizations illustrate the most dominant emotions per topics. The project targets a wide audience, namely the public, since migration is considered a serious contemporary issue, affecting millions of people.
What makes this project innovative?
On the one hand, this is the single webpage that makes accessible and searchable a significant amount of online media articles and images on Hungarian migration crisis. It can be considered an open repository.
On the other hand, what makes our venture exceptional is the fact that images relating to migration are not only gathered, but also processed by colorful machine learning techniques. The data is analyzed with various methods (topic modelling, automatic face detection, gender and age identification, automatic emotion detection), which allow the reader to get a complex picture on the issue. Moreover readers are not left alone with the outcomes of the analysis, but the results are interpreted and contrasted with that of other researches. The figures (i.e. interactive dashboards, charts, static visualizations, collection of images) help one comprehend the data with less effort and see the correlations. Consequently the project encourages one to think critically about the migration crisis.
What was the impact of your project? How did you measure it?
Source and methodology
1) Topic model
To be able to build a topic model from the images, the information incorporated by the images had to be converted into textual information. In order to accomplish this task, the labelling service of Clarifai was applied. It matches each image with a relevant label on the basis of its content and provides the level of relevance, expressed in certainty values. Only those labels were considered which value was above 0.75. The labels that belonged to less than 25 images were excluded.
2) Automatic face detection
We had tested many freely available face detection devices, but finally we decided to use the pre-trained Haar Cascade model of the OpenCV library. Our decision was based on the fact that most of our images are of spontaneous moments, hence people do not look into the camera and their head can be covered by a hat, hood or head-scarf. Comparing with other algorithms, it gave much less false positive results, which convinced us that it is the best choice.
3) Automatic sex and age classification
After studying many freely available devices, we decided to create our own sex and age classifier. Our training data was the IMDB-WIKI-500k dataset, which contains the images and data (sex, age) of the actors of the IMDB data set. With the use of the Keras deep learning framework, a convolunional neural network was trained for sex and age identification
4) Automatic emotion detection
After getting familiar with the freely available devices of emotion recognition, we decided to train our own algorithm, to which we used fer2013 dataset. We trained a convolutional neural network to this task with the use of Keras deep learning framework. It performs at 70% accuracy, which means it provides valid evaluation to 70% of the images identified as a certain emotion. Furthermore, it is able to recall 58%, which means that 58% of the images belonging to a certain emotion is recognized.