You are what you tweet⋯pic! gender prediction based on semantic analysis of social media images
Abstract
We propose a method to extract user attributes from the pictures posted in social media feeds, specifically gender information. While traditional approaches rely on text analysis or exploit visual information only from the user profile picture or colors, we propose to look at the distribution of semantics in the pictures coming from the whole feed of a person to estimate gender. In order to compute such semantic distribution, we trained models from existing visual taxonomies to recognize objects, scenes and activities, and applied them to the images in each user's feed. Experiments conducted on a set of ten thousand twitter users and their collection of half a million images revealed that the gender signal can indeed be extracted from the users image feed (75.6% accuracy). Furthermore, the combination of visual cues resulted almost as strong as textual analysis in predicting gender, while providing complementary information that can be employed to further boost gender prediction accuracy to 88% when combined with textual data. As a byproduct of our investigation, we were also able to extrapolate the semantic categories of posted pictures mostly correlated to males and females.