Inferring traits from profiles - Introduction to Social Media Investigation: A Hands-on Approach, 1st Edition (2015)

Introduction to Social Media Investigation: A Hands-on Approach, 1st Edition (2015)

Chapter 24. Inferring traits from profiles

Abstract

People reveal a lot about themselves on social media, including information they would prefer to keep private. This information is discovered using advanced computer programs or algorithms, which examine the huge amounts of data people share online and look for subtle patterns within it. From that, they are able to predict or infer a wide range of personal attributes (including things like sexual orientation, personality traits, drug and alcohol use, and political leanings). In some instances, they can even predict a person's future behavior. This chapter describes some of the work happening in this area and insights that may come within reach in the future.

Keywords

Social media

Social networks

Inference

Artificial intelligence

The technology to do everything in this chapter already exists. Most of it is in the hands of lab researchers, and not available for commercial use. However, it is likely to become available in the very near future, and it will have a profound impact on the way we investigate people online.

So far, we have looked at what people are sharing on social media, what they say about themselves, who they interact with, where they post from, and what is reflected in their posts.

But it turns out that people reveal a lot more about themselves in those posts, including information they would prefer to keep private. This information is discovered using advanced computer programs or algorithms, which examine the huge amounts of data people share online and look for subtle patterns within it. From that, they are able to predict or infer a wide range of personal attributes (including things like sexual orientation, personality traits, drug and alcohol use, and political leanings). In some instances, they can even predict a person's future behavior.

This chapter will describe some of the work happening in this area and insights that may come within reach in the future.

Since these algorithms tend to be used on one particular site, the analysis is broken down by social media site.

Twitter

Twitter is a great source of information for computer programs, because almost all of its data is public. Furthermore, most of it is text—and computer scientists know a lot about how to analyze texts.

Your words can reveal quite a lot about you, in more subtle ways than you think. The little words we use, which we could often dismiss as meaningless, are chosen unconsciously. Our unconscious choices reveal information about how our mind works. Research has shown that these words, called “particles”—which include pronouns (“you,” “I,” and “we”), prepositions, articles, and auxiliary verbs (“could,” “should,” and “may”)—are inexorably linked to a person's personality, emotional state, and social identity.1

Computer scientists can use this by grouping words into categories and then analyzing people's social media posts to find connections between the word categories and individual traits. The analysis starts with a long list of words and each word's category. This is fed into an algorithm, along with a file that has text a person has written online; it could be, for example, a bunch of a person's tweets. The algorithm then compares that text to the lists of words and outputs a report. The report can then be used in statistical analysis to find connections between word use and personal attributes.

Here are some examples:

• As people get older, they tend to use more positive emotion words (like “happy” or “joy”), more future tense verbs, and fewer past tense verbs.

• Men tend to curse far more than women.

• People who rate higher in neuroticism (indicating they are less emotionally stable) tend to use more anxiety words (like “worry”).

• Agreeable people tend to use more positive emotion words.

• People who are anxious use more explainer words (like “because” and “since”).

The list goes on and on.

On Twitter, this kind of analysis has been used to predict social media users' big five personality traits.2 The traits include

• whether someone is extroverted or introverted,

• their openness to new experiences,

• how conscientious they are (which deals with planning or procrastination),

• how agreeable they are, and

• their emotional stability (neuroticism).

Algorithms have been able to guess a person's scores on a big five personality test with accuracy of close to 90%. Similar techniques have been used to measure the strength of people's interpersonal relationships, too.3

Tool: AnalyzeWords

Most of the computer programs that discover people's personality traits from social media only exist within the research community and aren't public available. However, there is one online tool you can use to analyze someone's Twitter personality right now. It's called AnalyzeWords, and you can find it at http://analyzewords.com. Enter your (or anyone else's) Twitter handle, and see a report of some high-level personality traits.

As an example, here are analyses of two Twitter accounts from AnalyzeWords: the author's account, and President Obama's account (Figures 24.1 and 24.2).

f24-01-9780128016565

FIGURE 24.1 AnalyzeWords analysis of the author’s tweets.

f24-02-9780128016565

FIGURE 24.2 AnalyzeWords analysis of the President Obama’s tweets.

Facebook

Because people share so much information on Facebook, there are numerous algorithms that use its data to infer user traits. None of these tools are currently available for public use. We will look at two studies that highlight the breadth and depth of information that such tools can uncover from Facebook.

Facebook Likes

One of the studies that cover the widest range of traits is one that started with a narrow set of data: Facebook likes.4 From this data, researchers attempted to predict dozens of personal traits, including race, religion, gender, sexual orientation, intelligence, personality, and even whether someone has used drugs and alcohol. They were able to guess all of these things (and more) just by looking at the Facebook pages a person liked. This works because, among the tens of thousands of people they analyzed, subtle patterns emerge in likes.

The liked pages may have nothing to do with the attribute they help reveal. For example, one of the top predictors of high intelligence was liking the Facebook page for “Curly Fries.” The reason the algorithms work anyway relies on advanced patterns and analysis of how the likes spread through networks, the nature of people's social structures, and other attributes shared by the analyzed persons. From the outside, this research certainly seems mysterious. But when working with very large data, statistics can uncover many unexpected things.

Analyzing Significant Others

A different study used only the social network of a user, ignoring likes and all other profile information. The original goal of this study was to figure out which person among all of a user's friends was that user's spouse or significant other.5

A first guess at how to do this might be to find the person who had the most friends in common with the user. However, it turns out that a better indicator is finding the person who knows people from the greatest number of the user's different social circles. Recall the chapter on network analysis that showed different clusters of friends. Ideally, someone's spouse will know people from many different groups; for example, they will know at least one high school friend, college friend, coworker, family member, teammate, etc.

This turned out to be a very good predictor of someone's significant other. However, when the algorithm was wrong, an even more interesting result came about. Researchers went back two months later to look at the people for whom they had guessed incorrectly. In those cases where the algorithm was wrong, people were 50% more likely to have broken off their relationships.

Not only did a person's social network often reveal who their partner was, but also it gave hints about whether their current relationship would last.

Sexual Orientation

A person's social network, when combined with a bit of additional information, can also be revealing. One of the earliest studies in this area actually began as a class project at MIT. Students analyzed people's Facebook friends with the goal of determining their sexual orientation. Their project, called Gaydar, looked at what percentage of a person's friends self-identified as gay on Facebook (recall that a Facebook profile has a section where people can indicate their orientation). They found that, for men, they could draw a line regarding a percentage of gay friends. If someone had a higher percentage of gay friends, there was a good chance they were gay, too.

This relies on a principle we all know, often stated as “birds of a feather flock together.” People tend to be friends with people who are similar to them more than would be expected if they chose friends randomly. This applies to not only sexual orientation but also wealth, education level, race, politics, and many other traits. While projects like Gaydar are not online, this insight can be useful in an investigation. If you find someone has, for example, many politically conservative friends, it would be logical to guess that they, too, might be conservative.

Offline

Obviously, this book is concerned with social media, but these same techniques are being used offline. One example worth mentioning comes from big-box retailer Target.6 In 2012, The New York Times reported a story of a 15-year-old girl who received coupons for baby items like diapers and bottles in a flyer sent by Target. Her father was very upset by this until two weeks later when his high school daughter told him she was indeed pregnant. Target found out before her parents did.

How did they do this? By analyzing her purchases. Target calculates a score for their female customers that not only guesses whether or not they are pregnant but also even attempts to pinpoint their due dates. They do this through analyzing patterns in what people buy.

As with some of the examples above, the patterns are not obvious ones. To find out someone is pregnant, it is not the purchase of a pregnancy test or of baby items that reveals it most clearly. Instead, strong indicators are things like buying more vitamins than normal, buying a large handbag, or buying a brightly colored rug.

If you saw those purchases in the checkout line ahead of you, it is unlikely that you would guess the woman buying them was pregnant. However, when considered in the context of data from tens of thousands of other customers, they happen to reveal a common attribute.

Target isn't the only company doing this kind of analysis, but this example reveals the kind of power that comes from analyzing the vast amounts of data in this digital age.

Conclusions

Computer scientists are actively developing technology that reveals all kinds of secrets about social media users. Although people say a lot online, there are things they want to keep secret, and that's becoming harder to do. From personalities to basic demographic information, a lot comes through in subtle or unconscious ways. When data from millions of people can be analyzed, statistical analysis and advanced computational techniques can detect patterns that indicate that a person has a particular attribute.

While most of this technology is not yet available to the public, it is coming in the next five to ten years. Some tools, like AnalyzeWords, are already available online. Technologies like this will become an increasingly important tool for social media investigation.


1 Pennebaker, J.W., Mehl, M.R., Niederhoffer, K.G. 2003. Psychological aspects of natural language use: our words, our selves. Annual Review of Psychology 54(1): 547–577.

2 Golbeck, J., Robles, C., Edmondson, M., Turner, K. 2011. Predicting personality from twitter. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust (PASSAT), and 2011 IEEE Third International Conference on Social Computing (Socialcom) (pp. 149–156). IEEE.

3 Gilbert, E., Karahalios, K. 2009. Predicting tie strength with social media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 211–220). ACM.

4 Michal, K., Stillwell, D., Graepel, T. 2013. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences 110(15): 5802–5805.

5 Backstrom, L., Kleinberg, J. 2014. Romantic partnerships and the dispersion of social ties: a network analysis of relationship status on Facebook. Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM.

6 Duhigg, C. February 16, 2012. How Companies Learn Your Secrets. New York Times. www.nytimes.com/2012/02/19/magazine/shopping-habits.html