Big Data Bootcamp: What Managers Need to Know to Profit from the Big Data Revolution (2014)
Chapter 12. Capstone Case Study: Big Data Meets Romance
Products, Services, and Love: How Big Data Plays Out in the Real World
When it comes to developing new applications to help people meet, connect, and interact with each other, the opportunities are limitless. You can combine interesting new sources of data, such as people’s interests in music, movies, travel, or restaurants, with new technologies like mobile applications. For example, you could build an application that aggregates and analyzes data about people’s interests and then alerts them when they are near each other so they can meet up in person. Or you could build an application that monitors people’s activity levels, such as their sleep schedules, how active they are, and how many meetings they are in and warn them if their stress level is increasing—which could have a negative impact on their personal relationships.
Calendar-based applications could use data to figure out if you’ve been neglecting to spend time with your friends, while new social networking applications could make it easier to connect with people you care about while on the go.
A number of companies are taking advantage of data and mobile technologies to help us be more connected and to keep users more engaged. Social networking site Facebook uses Big Data to figure out what content to show, which ads to display, and who you should become online friends with. Dating site Match.com stores some 70 terabytes of data about its users.1 Recent mobile dating entrants Tinder and Hinge use Big Data to connect people based on their Facebook profiles.2
Online dating web site OkCupid, which was acquired by Match.com in 2011, conducted a series of research studies on what makes for the most successful online dating profiles. In a post entitled “The 4 Big Myths of Profile Pictures,” the OkCupid researchers described the results of analyzing more than 7,000 profile photos to figure out which ones produced the best results, as measured by the number of messages each user received.3
Of course, just receiving more messages isn’t necessarily a good thing. Quality is often more important than quantity. But before we explore that issue, let’s first take a look at what the researchers found.
The researchers characterized photos into one of three categories: flirty, smiling, and not smiling.
The researchers discovered that for women, eye contact in online profile pictures was critical. Those with a flirty look got slightly more messages each month than those without and significantly more messages than those who were characterized as not smiling.
Without eye contact, the results were significantly worse. Regardless of facial expression, those who didn’t make eye contact received fewer messages overall than those who did.
In contrast to women, men had the most success in meeting women when they used profile pictures in which they looked away from the camera and didn’t smile. However, men who used photos that showed them looking away from the camera and flirting had the least success in meeting women. For both men and women, then, looking away from the camera and flirting produces the worst results.
The researchers also found that for men, wearing “normal clothes” in a photo or being “all dressed up” doesn’t make much of a difference when it came to meeting a potential mate.
In perhaps one of their most interesting findings, the researchers discovered that whether or not someone’s face appears in a photo doesn’t have a big impact on how many messages they receive.
Photos of people in scuba gear, walking across the dessert, or with their faces out of the camera completely were about as likely to help users of the site generate interest as those photos in which users showed their faces. Women who used photos showing their faces received, on average, 8.69 contacts per month, while those who didn’t show their faces received 8.66 messages per month. Men who used facial photos met 5.91 women for every 10 attempts, while those not showing their faces met 5.92 women for every 10 attempts. This remarkable data, as shown in Figure 12-1, dispels the myth that you should make sure your face is showing in online dating profile photos.
Figure 12-1. Dispelling the myth that you should make sure your face is showing in online dating photos
The researchers’ conclusion? That the photos do all the work—they “pique the viewer’s curiosity and say a lot about who the subject is (or wants to be).” It’s important to note as the authors of the study stated, “we wouldn’t recommend that you meet someone in person without first seeing a full photo of them.”
What all that means is that Big Data isn’t just a tool for business. Given the right data sources it can also give us insights into how best to portray ourselves to find the right mate. The other takeaway, of course, is that you shouldn’t underestimate the importance of choosing the right photo—based on the data.
How Big Data Outs Lies, Damn Lies, and Statistics
Of course, people are known to stretch the truth when it comes to online dating. In another study, OkCupid co-founder Christian Rudder looked at data from some 1.51 million active users of the dating site.
In The Big Lies People Tell In Online Dating,4 Rudder discovered that when it comes to height, “almost universally, guys like to add a couple inches.” In fact, as guys “get closer to six feet,” they round up more than usual, “stretching for that coveted psychological benchmark.” Women also exaggerate their height, “though without the lurch toward a benchmark height.”
Note People often exaggerate when on social media sites, with height and income among the qualities users choose to inflate. It won’t be long before astute product and service developers find ways to test the veracity of claims, saving those looking for romance from heartache, at least occasionally.
What about income? Do people who claim to make $100,000 a year or more really make that much? When it comes to online dating, Rudder found that people are 20% poorer than they say they are. As we get older, we tend to exaggerate more, with those in their 40s and 50s exaggerating by 30% or more. People don’t just alter their income; they also upload photos that are out of date. Rudder’s research found that “the more attractive the picture, the more likely it is to be out of date.”
So how do the dating sites decide who to show you as potential matches?
Match.com, which has some 1.8 million paying subscribers, introduced a set of new algorithms codenamed Synapse to analyze “a variety of factors to suggest possible mates,” according to an FT.com article, “Inside Match.com.”5
What we say we’re looking for isn’t always what we’re actually looking for. Although the algorithm takes into account people’s stated preferences, it also factors in the kind of profiles that users actually look at. For example, if a user states a preference for a certain age range but looks at potential mates outside that range, the algorithm takes that into account and includes people outside the range in future search results.
What makes the challenge of predicting preferences even more complicated is that unlike with movie or book recommendations, algorithms that match human beings together have to take into account mutual preferences. “‘Even if you like The Godfather, The Godfather doesn’t have to like you back,’” Amarnath Thombre, head of analytics at Match.com was quoted as saying.
The challenge with such algorithms is that although Match.com has data on the 75 million users who have registered on the site since its founding in 1995, it doesn’t have much data on which dates are successful and which ones aren’t. This inability to close the loop is an important missing element in creating the ultimate matchmaking algorithm.
That’s why when people cancel their subscriptions, dating sites often ask whether the reason is because they’re dissatisfied with their online dating experience or because they met someone. Not only is such data useful for marketing, it can also factor into predictive algorithms.
Using Big Data to Find the Missing Data
But it is just that kind of missing data that may be causing machine-matching algorithms to fail. As with all Big Data analytics and predictive engines, such algorithms are only as good as the data that engineers used to develop them and the data they’re fed. If key data is missing, predictive algorithms won’t work.
Eli J. Finkel from Northwestern University and Benjamin R. Karney at the University of California, Los Angeles, the authors of a study published in the journal Psychological Science in the Public Interest, point out in their study and in a corresponding New York Times opinion piece that what really matters is how people interact when they meet each other in person, not what they say online.6
“Things like communication patterns, problem-solving tendencies, and sexual compatibility are crucial for predicting the success or failure of relationships,” say the two professors.
They also point out that the way in which “couples discuss and attempt to resolve disagreements predicts their future satisfaction and whether or not the relationship is likely to dissolve.” These, however, are not the sorts of characteristics that are easily evaluated in the context of an online dating site. Moreover, dating sites don’t take into account the environment of a relationship, including stresses such as “job loss, financial strain, infertility, and illness.”
The authors also point out that while dating sites may collect a lot of information, such information is a very small piece of the pie when it comes to figuring out what will make two people a good long-term match. While many sites claim to match people based on common interests, a 2008 study of 313 other studies found that “similarity on personality traits and attitude had no effect on relationship well-being in established relationships.”
A 2010 study of more than 23,000 married couples showed that having major personality attributes in common, such as neuroticism, impulsivity, and extroversion, only accounted for half a percent of marriage satisfaction, meaning that 99.5% was due to other factors.
The conclusion of the studies’ authors? Online dating isn’t any better or worse than any other way to meet potential mates. While the algorithms online dating sites use may make for good marketing, such algorithms may ultimately just be a way to help users of such sites get started with their online dating experience, and to provide a manageable pool of potential mates in densely populated areas such as New York City.
What such research really points out from a Big Data perspective is the importance of having complete data. Whether algorithms are trying to recommend movies, tell sales people which customers to call next, or suggest potential mates, the algorithms are only as good as the data. Without sufficient data and a way to close the loop by knowing whether an algorithm made a correct prediction, it’s difficult to create accurate algorithms. Like throwing darts at a dartboard, getting all the darts in the same area doesn’t matter if that area isn’t the bull’s-eye. So capturing complete data—and being able to close the loop about what works and what doesn’t, is key.
Predicting Marriage Success with Big Data
When it comes to prediction, one scientist is able to predict with astounding accuracy which matches are likely to succeed. Professor of psychology Dr. John Gottman is famous for running a physiology lab known as the “Love Lab” at the University of Washington.
Gottman has studied more than 650 couples over a period of 14 years. Based on his research, Gottman can predict with over 90% accuracy based on a half hour interview with a recently married couple whether their marriage will last.7
Gottman refers to what he calls mental map-making as the basis for romance. Mental map-making, in the context of a relationship, is the process of finding out about our partners. One simple example of mental map-making Gottman provides is of men who have an interest in what their wives are going to do on a given day.
An active mental map-maker not only gathers information but also thinks about that information during the day and follows up on it later. That means asking about a spouse’s meeting, lunch, or an event they talked about in the morning.
Remarkably, the process of gathering data, thinking about that data, and following up on that data is useful not only in computing, but in maintaining a healthy relationship as well.
Gottman and his colleague, Professor James Murray, took things even further and developed Big Data models of biological research. They created mathematical models of human behavior that they used to analyze and predict marriage success. The two professors, along with several others, even published a book entitled The Mathematics of Marriage: Dynamic Nonlinear Models.
Gottman and his fellow authors believe that “the development of marriage is governed, or at least can be described, by a differential equation.”8 Gottman and his colleagues are able to describe a marriage using the mathematics of calculus and simulate how couples will act under various conditions.
Gottman’s research discovered four negative behaviors that frequently predict divorce: criticism of a partner’s personality, contempt, defensiveness, and emotional withdrawal from interaction.9 Gottman describes contempt, in particular, as “sulfuric acid for love.”
By combining an evaluation of a couple’s responses to various questions with analysis of body language and biological data via mathematical models, Gottman can not only predict whether a marriage is on course to succeed, but can also suggest immediate changes for course correction if it isn’t.
Given all this talk of mathematics and differential equations, it might seem like Gottman is taking the magic out of romance, but as Gottman puts it, it’s important for scientists and researchers to have an objective model for measuring human relationships. And the information gleaned from such Big Data models can help keep relationships strong.
Gottman studied those who remained happy with their marriages to figure out what kept their relationships happy. His conclusions were that couples that focused on preserving knowledge of each other, mutual admiration, and affection for each other were more likely to be happy than those who did not. In particular, he found that couples had to be five times as positive with each other as negative.
Can failing marriages be saved? According to Gottman’s analytical research, they can. Many couples that go through something called marriage counseling relapse, falling into the same old habits, after two years.
However, based on his studies, Gottman concluded that there are two active ingredients that can produce a lasting positive effect on a marriage. First, reduce negativity in conflict resolution. Second, increase overall positivity by focusing on helping partners in a marriage “be better friends.” Most importantly, Gottman’s research isn’t just a lot of data points. As with all of the best Big Data analyses, Gottman has converted his research into simple, yet powerful actionable insights. In this case those insights can lead us to happier, more fulfilling relationships.
Note The best Big Data analyses, and those that can be turned into successful products most easily, are those that convert heavy-duty research into simple, powerful, actionable insights.
How Facebook Is Shaping Relationships
While Gottman and his colleagues are helping couples in the offline world based on their years of research, perhaps no online company is having a bigger impact on relationships than social networking giant Facebook.
The company, which now has more than a billion users worldwide, is the go-to destination for photo sharing, status updates, and a timeline of your life, both on your own and with others.
Social networking sites like Facebook represent such relationships in something called the social graph. Unlike in a one-to-one relationship, a social graph consists of many interconnected relationships. If Joe knows Fred and Fred knows Sarah, then in the context of the social graph, Joe has a link to Sarah via Fred. But such interconnections extend even further with the inclusion of interests, places, companies, brands, birthdays, and status updates, among other non-human elements. In the context of the social graph, people have connections not just with other people, but with activities, events, companies, and products.
Tip Just as some companies are focused on building databases for structured and unstructured data, there are also companies that build databases specifically for graph-based data. For example, Neo4j has focused on building a database specifically to represent connections. The company has raised some $24 million to date and counts companies like eBay, HP, and Walmart among its customers.
The concept of six degrees of separation has popularized the notion that two people are at most six steps away from each other in terms of social connections. Business networking site LinkedIn takes advantage of this concept by showing business people how they’re connected to other people through intermediary relationships. These professionals then take advantage of this connectivity when they want to talk with people they don’t know directly but who are a friend of a friend or a colleague of a colleague.
At Facebook, people are even closer to each other than the six degrees concept might suggest. Sanjeev Kumar, an engineering manager at the company, indicated that Facebook users are highly connected, with an average 4.74 degree of separation between them.10 Relationships in the social graph are made denser due to connections with places, interests, and other elements.
By representing the relationships between a very large number of connections, the social graph can answer lots of interesting Big Data questions. That includes not only questions posed by data analysts, but questions actively or passively posed by hundreds of millions of users, such as who should I connect with?, what photos should I look at?, and which information is important to me?
Although most of these users don’t think of themselves as performing queries against the Big Data that is the social graph, that is exactly what they’re doing, or at least what social networking services are doing on their behalf.
Social graphs are equally interesting in Big Data from a technical perspective. Answering questions such as the ones above consumes a lot of computing resources. Each query involves working with a very large subset of the overall graph (known in technical terms as the working set) and is highly customized to each user.
Tip Social graphs are the digital representation of our human connections. They may be one of the most powerful sources of insight into human relationships.
What’s more, the social graph represents lots of actual data, not just the interconnections in the graph itself, but the photos, videos, status updates, birthdays, and other information associated with each user. A query has to return the right set of relationships. It also has to return the data associated with those relationships and do so near instantaneously.
The social graph represents so much data that, to keep up with it, Facebook has had to develop custom servers, build out its own data centers, and design special software for querying the graph and storing and retrieving its associated data efficiently.
So what does all this mean for our relationships?
Of the many ways to express ourselves on Facebook, “relationship status is the only one that directly involves another person.”11 We commonly announce engagements, marriages, breakups, and divorces on the social networking site.
In 2010, about 60% of Facebook users set a relationship status on their profile and as of December of that year, women outnumbered men by a rate of 1.28 to 1.12 As of January, 2014, about 53.3% of the site’s users were female while 45.6% were male.13 In 2011, a third of all divorce filings mentioned the word Facebook, up from 20% the previous year.14
Announcing our relationship status online can strengthen what researchers refer to as “feeling rule reminders.”15 Such rules are the social norms that tell us when and what to feel, and how strong our emotions should be. By declaring our relationship status online we reinforce such rules.
Social networking sites like Facebook can also affect one’s health and personality. Some people present themselves authentically on the site, while others present themselves in an improved light due to feelings of insecurity.16
About 24% of Americans and 28% of Brits have admitted to lying or exaggerating on a social network about what they’ve done and/or who they’ve met, according to statistics cited by Cara Pring of social networking statistics site The Social Skinny.17 Meanwhile, visiting our profiles too frequently can make us overly aware of ourselves, causing stress and anxiety.
Having more digital friends than others—the average Facebook user has 229 friends on the social network—may give us a shot of self-confidence due to receiving additional social support. Some 25% of people believe that social networks have boosted their confidence. Having fewer friends than others may make us feel self-conscious, but the smaller digital circle may also lead to more genuine interactions.
In case there is any doubt about how big an impact online socializing is having on our lives, 40% of people spend more time socializing online than they do having face-to-face interactions. Internet users spend 22.5% of their online time engaged in social networking activities, with more than half of all Facebook users returning to the site on a daily basis.
Note If you’re worried you may not have enough of a market for your social media application, bear in mind that nearly half of the people who socialize online spend more time there than interacting with human beings in the real world.
How Big Data Increases—And Decreases—Social Capital
We value our online identities and the relationships they entail not just because they are a way to express ourselves but also because we associate them with social capital. Social capital is the benefit we receive from our position in a social network and the associated connections and resources that are a part of that network, both online and offline.
Such capital comes in two forms, bonding and bridging, according to researchers at the Human-Computer Interaction Institute at Carnegie Mellon University and at Facebook.18 Bonding refers to social capital derived from relationships with family members and close friends, while bridgingrefers to that derived from acquaintances.
Both have value, but bonding typically enables emotional support and companionship. Bridging, which comes from a looser and more diverse set of relationships, typically enables access to new communication and opportunities such as job openings, because very close ties are likely to share redundant information.
The researchers classified activities on social networks into three categories. The first kind is directed communication with individual friends via chat or wall posts. The second involves passive consumption of social news by reading updates from others. Finally there is broadcasting, which consists of updates not directed at any particular person.
The researchers concluded that directed communication has the potential to improve both bonding and bridging capital due to the rich nature of the content and the strength of the relationship between the two communicators. In the case of one-to-one communication, both offering and receiving information increases the strength of relationships. Moreover, simply communicating one to one, due to the effort required relative to broadcast communication, signals the importance of a relationship.
For bridging social capital, only one-to-one communication increases social capital for senders on Facebook. Other forms of communication increase social capital only for the receiver. As the researchers point out, undirected broadcasts and passive consumption of news and updates may increase knowledge for the recipients and consumers of the news, but they don’t further the development of relationships.
Broadcast communication is useful as a source of information about others that we can then use to increase the strength of relationships or to develop new friendships by referencing shared interests. In terms of casual acquaintances, putting out broadcasts doesn’t buy us much, but receiving broadcasts and consuming news may. (Ironically, of course, recipients can only derive such value by nature of senders making the broadcasts.) The benefit of receiving broadcast information is greater for those with lower existing social communication skills.
The researchers also found that only a narrow set of major life changes have a significant impact on bridging social capital. Moving, for example, has a positive impact on bridging social capital, likely due to adding new relationships that diversify our access to information and resources. Losing one’s job, however, has a negative impact due to losing the social context associated with former coworkers.
Where does this leave us when it comes to the impact of changes in our personal relationships on our broader networks? Much as we worry about the impact of life changes such as marriage, divorce, death, family additions, new jobs, and illness on our broader social networks, the reality is that while such events have a big impact on us personally, they have relatively little impact on our broader social networks.
How does this compare to increases and decreases in social capital via Facebook? Every time we double our one-to-one communication in the network, that results in about the same quantity of change impact on bridging social capital as moving to a new city. It’s equivalent to about half the impact associated with losing a job. In other words, a one-to-one digital message that establishes a new connection can have as much impact on bridging relationships as moving to a new city.
The implication is that, if we choose to take advantage of one-to-one communication online, social networks can greatly reduce the friction normally associated with expanding our broader networks. This may be one reason why when it comes to job searches, one-to-one messages on professional business networking site LinkedIn, even messages that come by way of a connection, are so effective.
While communication clearly plays a huge role in the development of relationships, so does information. Although its role may be somewhat less obvious than that of Facebook, Google impacts our relationships as well.
Ask nearly anyone who has been on a date that started online and they will tell you that they’ve Googled their potential mate. According to It’s Just Lunch, a dating service, 43% of singles have Googled their date before going out with them. And of the 1,167 singles surveyed, 88% said they wouldn’t be offended if their date Googled them.19
Type virtually anyone’s name into the search engine and you’ll see a series of results related to that person. They range from web sites providing background information to articles someone has been mentioned in or written, to work profiles on LinkedIn.
The relative newness of Facebook, Twitter, and other forms of digital social communication mean that academic research on the impact of such mediums is relatively limited. What is clear, however, is that one’s online identity is a massive form of Big Data.
When it comes to humans, Big Data is the information we share about ourselves in the form of photos, videos, status updates, tweets, and posts, and about our relationships, not to mention the digital trail we leave behind in the form of web site clicks and online purchases.
Big Data and Romance: What the Future Holds
Enhancements to digital social communication, such as virtual gifts, may seem an unlikely way for people to communicate romantic interest, but such forms of expression are becoming more and more commonplace. Virtual items like flowers that look like actual real-world goods but exist only in digital form have become quite popular both in social games and on social networks. People have shown a willingness to pay for such digital items.
One can imagine a future in which recommendation engines can accurately predict which gifts will be most positively received and make suggestions to the senders. Of course, social networks might want to start by ensuring that partners simply don’t forget each other’s anniversaries!
Whether we want such insight or not, Big Data is both collecting and providing more information about us and our relationships. Algorithms such as those developed by Match.com to try to recommend better matches may be missing important data about how relationships evolve over time, but we can foresee a time when services integrated with Facebook and other platforms provide high-quality recommendations and predictions.
Indeed, Facebook may have more insight into our relationships over time than just about any other web site. Blog site All Facebook cites a passage from The Facebook Effect: The Inside Story of the Company That Is Connecting the World, saying, “By examining friend relationships and communications patterns [Zuckerberg, Facebook’s co-founder and CEO] could determine with about 33% accuracy who a user was going to be in a relationship with” a week in advance. Zuckerberg was apparently able to use data about who was newly single, who was looking at which profiles, and who was friends with whom to glean such insights.20
Online dating and social communication are becoming more and more socially acceptable and are producing vast amounts of data in the process. Mass forms of communication are no substitute for one-to-one communication. But social networks that make such communication easier have the ability to reduce dramatically the amount of effort required to create new relationships and that required to maintain existing ones. Today algorithms such as those Match.com has developed may simply be a way to start communicating. But in the future, the opportunity for Big Data to improve our relationships, both personal and professional, is large.
As more people use smartphones like iPhone and Android devices, we’re seeing mobile applications spring up for dating and as a more general way to meet new people. Applications such as Skout help users discover new friends wherever they are, whether at a local bar, at a sporting event, or while touring a new city. The application provides integrated ways to chat, exchange photos, and send gifts. Mobile applications will continue to reduce communication barriers and help us stay in touch with people we care about, all the while generating immense amounts of data that can help further refine interactions and introduce new connections.
Big Data may not yet be able to help an ailing relationship, but it can give us insight into the context surrounding our relationships, such as when the most difficult times of year are for relationships, based on the number of breakups that happen at those times. By learning from such data we can take proactive steps to improve our relationships.
Big Data may also help us determine if a friend or relative is headed for trouble, by determining if he or she is communicating less frequently than normal, has an increased level of stress, or has experienced multiple major life events such as divorce, job loss, or the death of a relative or close friend that could lead to depression or other issues. At the same time, by continuing to make communication easier, Big Data may very well strengthen existing relationships and support the creation of new ones.
Our desire to create new bonds and strengthen existing ones—our inherently social nature as human beings—is one of the biggest drivers of the creation of new technologies to help us communicate more easily.
Tip The market for match-making apps is big. But the market for helping existing relationships thrive using Big Data may be even bigger.
We may not think of web sites like Facebook or Google as Big Data Applications because they are wrapped in easy-to-use, consumer-friendly interfaces. Yet in reality, they represent the enormous potential of Big Data.
Applications like Facebook are just the tip of the iceberg when it comes to applying Big Data to improve our relationships and form new friendships. By combining data with mobile, a variety of new applications are possible. Why stop at just matching people up based on general interests? Why not go a step further and recommend matches based on the music we’ve listened to recently, the movies we like to watch, or the food we like to eat? When it comes to building new relationship applications, today’s entrepreneurs have the opportunity to be as disruptive to existing matchmaking and social networking services as the iPod was to the Walkman.
In all the discussion of the size and economics of Big Data, of the talk of real-time data streams and the lower costs of storing and analyzing data, it is easy to lose sight of the positive impact that Big Data has on our daily lives, whether we’re talking about online matchmaking or taking steps to improve our relationships. Not only can Big Data open new avenues for you to find the love of your life, the good news for singles the world over is that in the future Big Data is also likely to help them hold onto that love once they find it.
9http://en.wikipedia.org/wiki/John_Gottman and http://psycnet.apa.org/?&fa=main.doiLanding&doi=10.1037/0893-3188.8.131.52
18Social Capital on Facebook: Differentiating Uses and Users (May, 2011)