Data Science For Dummies (2016)
Part 5
Applying Domain Expertise to Solve Real-World Problems Using Data Science
IN THIS PART …
Explore the impact of data science on journalism.
Use data science techniques to illuminate environmental issues.
Turbocharge your e-commerce growth with the help of data science.
Look at data science’s crime-busting potential.
Chapter 18
Data Science in Journalism: Nailing Down the Five Ws (and an H)
IN THIS CHAPTER
Defining the who, what, when, where, why, and how of a data-driven story
Sourcing data to use for a story
Finding and presenting your data-driven story
For as long as newsrooms have been around, reporters have been on a mission to cover answers to questions about the “Five Ws and an H” (or 5W1H) — the who, what, when, where, why, and how of a given topic. The tools have changed over the years and the data sources — such as data generated on social media networks — have grown, but this only provides journalists with deeper and more insightful answers to their questions. In this era of digital media, traditional journalists can’t survive if they cannot quickly find sufficient answers to these questions — they simply won’t be fast enough or relevant enough to compete with other, more data-savvy journalists. Indeed, any journalist who wants to stay competitive in her field has to develop at least basic data science skills and hone those skills to help her develop, design, and publish content that consistently demonstrates competitive readership and engagement rates (metrics that indicate the content’s popularity).
In a quest to stay relevant, many traditional media venues have turned to the same technological advances that previously threatened to annihilate them. This adoption of more advanced digital technologies resulted in the birth of data journalism as an independent field. Data journalism — also known as data-driven journalism — is proving to be a potent marriage between traditional journalism and the power of data.
Modern data journalists — the experts who craft all the cool data-driven stories you see out there — must be masters at collecting, analyzing, and presenting data. In its simplest form, data journalism can be described as a process involving these three distinct steps:
1. Data collection: This step may involve web-scraping (setting up automated programs to scour and extract the data you need straight from the Internet) and configuring automated data feeds.
2. Data analysis: Spot trends, analyze outliers, and evaluate context.
3. Data presentation: Design the data visualization and draft concise, well-written story narratives.
Let me emphasize here, at the beginning of the chapter, that data journalists have an ethical responsibility to always represent data accurately. Data journalists should never distort the message of the data to fit a story they want to tell. Readers rely on data journalists to provide honest and accurate representations, thus amplifying the level of ethical responsibility that the journalist must assume. Data journalists must first find out what the data really says and then either tell that story (even if it wasn’t the journalist’s first idea for the story) or, in the alternative, drop the story altogether.
Who Is the Audience?
When most people think of data, questions about who (as it relates to the data) don’t readily come to mind. In data journalism, however, answers to questions about who are profoundly important to the success of any data-driven story. You must consider who created and maintains the sources of your datasets to determine whether those datasets are a credible basis for a story. If you want to write a story that appeals to your target readership, you must consider who comprises that readership and the most pressing needs and concerns of those people.
Who made the data
The answer to the question “Who made your data?” is the most fundamental and important answer to any of the five W questions. No story can pass the litmus test unless it’s been built upon highly credible sources. If your sources aren’t valid and accurate, you could spend countless hours producing what, in the end, amounts to a worthless story.
You must be scrupulous about knowing who made your data because you need to be able to validate those sources’ accuracy and credibility. You definitely don’t want to go public with a story you generated from noncredible sources, because if anyone questions the story’s validity, you have no ground on which to stand.
News is only as good as its source, so protect your own credibility by reporting on data from only credible sources. Also, it’s important to use as many relevant data sources as can be acquired, to avoid bias or accusations of cherry-picking.
If you want to create a meaningful, credible data-driven story that attracts a maximum amount of attention from your audience, you can use the power and clout of reputable data sources to make your stories and headlines that much more compelling. In any type of data journalism piece you publish, it’s critical that you disclose your data sources. You don’t have to provide a live web link back to those sources, but you should at least make a statement about where you found your information, in case people want to investigate further on their own.
Who comprises the audience
Research your target audience and get to know their interests, reading preferences, and even their aesthetics preferences (for choosing the best images to include in your story) before planning your story so that you can craft something that’s of maximum interest and usefulness to them. You can present the same interesting, high-quality story in countless different lights — with some lights beaming in a much more compelling way than others.
To present your story in a way that most attracts readers’ attention, spend some serious time researching your target audience and evaluating what presentation styles work well with readers of that group. One way to begin getting to know your readers is to gather data on stories that have performed well with that audience in the recent past.
If you search social bookmarking sites — StumbleUpon, for example, (http://stumbleupon.com), or Digg (http://digg.com) or Delicious (http://delicious.com) — or if you just mine some Twitter data, you can quickly generate a list of headlines that perform well with your target audience. Just get in there and start searching for content that’s based on the same topic as that of your own. Identify what headlines seem to have the best performance — the highest engagement counts, in other words — among them.
After you have a list of related headlines that perform well with your target audience, note any similarities between them. Identify any specific keywords or hashtags that are getting the most user engagement. Leverage those as main draws to generate interest in your article. Lastly, examine the emotional value of the headlines — the emotional pull that draws people in to read the piece, in other words.
Speaking of emotions, news articles generally satisfy at least one of the following core human desires:
· Knowledge: Often, but not always, closely tied to a desire for profit.
· Safety: The desire to protect one’s property, income, and well-being, or that of friends and family.
· Personal property: A person’s innate desire to have things that bring him comfort, safety, security, and status.
· Self-esteem: People are sometimes interested in knowing about topics that help them feel good about themselves. These topics often include ideas about philanthropy, charity, service, or grassroots causes for social change.
Ask yourself what primary desires your headlines promise to satisfy. Then craft your headlines in a way designed to appeal most strongly to that desire. Try to determine what type of articles perform the best with your target audience or what your target audience most strongly seeks when looking for new content to consume. With that info in hand, make sure to exact-target your writing and headlines in a way that clearly meets a core desire among your target audience.
What: Getting Directly to the Point
The what, in data journalism, refers to the gist of the story. In all forms of journalism, a journalist absolutely must be able to get straight to the point. Keep it clear, concise, and easy to understand.
When crafting data visualizations to accompany your data journalism piece, make sure that the visual story is easy to discern at a moment’s glance. If it takes longer than that, the data visualization is not focused enough. The same principle applies to your writing. No one wants to drag through loads of words trying to figure out what you’re trying to say. Readers appreciate it when you make their lives easier by keeping your narrative clear, direct, and to the point.
The more people have to work to understand your content, the less they tend to like it. If you want to provide readers with information they enjoy consuming, make your writing and data visualizations clear and to the point.
Bringing Data Journalism to Life: The Black Budget
Any chapter on data science in journalism would be utterly incomplete without a solid case study to demonstrate the power of data journalism in action. The Washington Post story “The Black Budget” is one incredible example of such a piece. (Check it out for yourself at www.washingtonpost.com/wp-srv/special/national/black-budget.)
When former NSA contractor Edward Snowden leaked a trove of classified documents, he unleashed a storm of controversy not only among the public but also among the data journalists who were tasked with analyzing the documents for stories. The challenge for data journalists in this case was to discover and disclose data insights that were relevant to the public without compromising the safety of ordinary citizens.
Among the documents leaked by Snowden was the so-called Black Budget for fiscal year 2013, a 178-page line-by-line breakdown of the funds that were earmarked for 16 various U.S. federal intelligence agencies. Through the Washington Post’s “The Black Budget,” the American public was informed that $52.6 billion taxpayer dollars had been spent on mostly covert federal intelligence services in 2013 alone.
The Washington Post did a phenomenal job in its visual presentation of the data. The opening title is a somber visual pun: The words The Black Budget are written in a huge black box contrasted only with gray and white. This layout visually implies the serious and murky nature of the subject matter. The only touch of color is a navy blue, which conjures a vaguely military image and barely contrasts with the black. This limited palette is continued throughout the visual presentation of the data.
Washington Post data journalists used unusual blocky data graphics — an unsettling, strangely horizontal hybrid of a pie chart, a bar graph, and a tree map — to hint at the surreptitious and dangerous nature of the topic, as well as the shady manner in which the information was obtained.
The data graphics used in the piece exhibited a low data-to-ink ratio — in other words, only a little information is conveyed with a lot of screen space. Although normally a low data-to-ink ratio indicates bad design, the data-to-ink ratio here effectively hints that mountains of data lie underneath the layers being shown, and that these layers remain undisclosed so as not to endanger intelligence sources and national security.
Traditional infographic elements used in this piece include stark, light gray seals of the top five intelligence agencies, only three of which the average person would have ever seen. Simple bar charts outlined funding trends, and people-shaped icons represented the army of personnel involved in intelligence gathering.
A lot of thought went into the collection, analysis, and presentation of this story. Its ensemble is an unsettling, yet overwhelmingly informative, piece of data journalism. Although this sort of journalism was in its infancy even just a decade ago, now the data and tools required for this type of work are widely available for journalists to use to quickly develop high-quality data journalism articles.
When Did It Happen?
As the old adage goes, timing is everything. It’s a valuable skill to know how to refurbish old data so that it’s interesting to a modern readership. Likewise, in data journalism, it’s imperative to keep an eye on contextual relevancy and know when is the optimal time to craft and publish a particular story.
When as the context to your story
If you want to craft a data journalism piece that really garners a lot of respect and attention from your target audience, consider when — over what time period — your data is relevant. Stale, outdated data usually doesn’t help the story make breaking news, and unfortunately you can find tons of old data out there. But if you’re skillful with data, you can create data mashups (described a little later in this chapter) that take trends in old datasets and present them in ways that are interesting to your present-day readership.
For example, take gender-based trends in 1940s employment data and do a mashup — integration, comparison, or contrast — of that data and employment data trends from the five years just previous to the current one. You could then use this combined dataset to support a truly dramatic story about how much things have changed or how little things have changed, depending on the angle you’re after with your piece.
Returning once again to the issue of ethical responsibilities in journalism, as a data journalist you walk a fine line between finding datasets that most persuasively support your storyline and finding facts that support a factually challenged story you’re trying to push. Journalists have an ethical responsibility to convey an honest message to their readers. When building a case to support your story, don’t take things too far — in other words, don’t take the information into the realm of fiction. There are a million facts that could be presented in countless ways to support any story you’re looking to tell. Your story should be based in reality, and not be some divisive or fabricated story that you’re trying to promote because you think your readers will like it.
You may sometimes have trouble finding interesting or compelling datasets to support your story. In these situations, look for ways to create data mashups that tie your less-interesting data into some data that’s extremely interesting to your target audience. Use the combined dataset as a basis for your data-driven story.
When does the audience care the most?
If your goal is to publish a data journalism piece that goes viral, then you certainly want to consider the story’s timeliness: When would be the prime time to publish an article on this particular topic?
For obvious reasons, you’re not going to do well by publishing a story in 2017 about who won the 1984 election for U.S. president; everyone knows, and no one cares. Likewise, if a huge, present-day media scandal has already piqued the interest of your readership, it’s not a bad idea to ride the tailwinds of that media hype and publish a related story. The story would likely perform pretty well, if it’s interesting.
As a recent example, you could have created a data journalism piece on Internet user privacy assumptions and breaches thereof and then published it in the days just after news of the Edward Snowden/NSA controversy broke. Keeping relevant and timely publishing schedules is one way to ensure that your stories garner the attention they need to keep you employed.
Where Does the Story Matter?
Data and stories are always more relevant to some places than others. From where is a story derived, and where is it going? If you keep these important facts in mind, the publications you develop are more relevant to their intended audience.
The where aspect in data journalism is a bit ambiguous because it can refer to a geographical location or a digital location, or both.
Where is the story relevant?
You need to focus on where your story is most relevant so that you can craft the most compelling story by reporting on the most relevant trends.
If your story is location independent — you’re reporting on a trend that’s irrelevant to location — of course you want to use data sources that most clearly demonstrate the trend on which you’re reporting. Likewise, if you’re reporting a story that’s tied to a specific geographic location, you probably want to report statistics that are generated from regional areas demonstrating the greatest degree of extremes — either as greatest value fluxes or as greatest value differences for the parameters on which you’re reporting.
Sometimes you find multiple geographic or digital locations that exemplify extreme trends and unusual outliers. In other words, you find more than one excellent information source. In these cases, consider using all of them by creating and presenting a data mashup — a combination of two or more data sources that are analyzed together in order to provide readers with a more complete view of the situation at hand.
Where should the story be published?
Another important question to consider in data journalism is, “Where do you intend to publish your story?” This where can be a geographical place, a particular social media platform, or certain series of digital platforms that are associated with a particular brand — Facebook, Twitter, Pinterest, and Instagram accounts, as well as blogs, that are all tied together to stream data from one branded source.
Just as you need to have a firm grasp on who your audience is, you should clearly understand the implications of where your publication is distributed. Spelling out where you’ll be publishing helps you conceptualize to whom you’re publishing, what you should publish, and how you should present that publication. If your goal is to craft high-performing data journalism articles, your headlines and storylines should cater to the interests of the people that are subscribed to the channels in which you’re distributing. Since the collective interest of the people at each channel may slightly differ, make sure to adapt to those differences before posting your work.
Why the Story Matters
The human capacity to question and understand why things are the way they are is a clear delineation point between the human species and other highly cognitive mammals. Answers to questions about why help you to make better-informed decisions. These answers help you to better structure the world around you and help you develop reasoning beyond what you need for mere survival.
In data journalism, as in all other types of business, answers to the question why help you predict how people and markets respond. These answers help you know how to proceed to achieve an outcome of most probable success. Knowing why your story matters helps you write and present it in a way that achieves the most favorable outcomes — presumably, that your readers enjoy and take tremendous value from consuming your content.
Asking why in order to generate and augment a storyline
No matter what topic you’re crafting a story around, it’s incredibly important to generate a storyline around the wants and needs of your target audience. After you know who your audience is and what needs they most often try to satisfy by consuming content (which I talk about in the section “Who comprises the audience,” earlier in this chapter), use that knowledge to help you craft your storyline. If you want to write a story and design a visualization that precisely targets the needs and wants of your readership, take the time to pinpoint why people would be interested in your story, and create a story that directly meets that desire in as many ways as possible.
Why your audience should care
People care about things that matter to them and that affect their lives. Generally, people want to feel happy and safe. They want to have fulfilling relationships. They want to have good status among their peers. People like to learn things, particularly things that help them earn more money. People like possessions and things that bring them comfort, status, and security. People like to feel good about themselves and what they do. This is all part of human nature.
The desires I just described summarize why people care about anything — from the readers of your story to the person down the street. People care because it does something for them, it fills one of their core desires. Consequently, if your goal is to publish a high-performing, well-received data journalism piece, make sure to craft it in a way that fulfills one or two core desires of your target readership. To better understand your audience and what they most desire in the content they consume, flip back to the section “Who Is the Audience,” earlier in this chapter.
How to Develop, Tell, and Present the Story
By thinking through the how of a story, you are putting yourself in position to craft better data-driven stories. Looking at your data objectively and considering factors like how it was created helps you to discover interesting insights that you can include in your story. Also, knowing how to quickly find stories in potential data sources helps you to quickly sift through the staggering array of options.
And, how you present your data-driven story determines much about how well that story is received by your target audience. You could have done everything right — really taken the time to get to know who your audience is, boiled your story down so that it says exactly what you intend, published it at just the right time, crafted your story around what you know about why people care, and even published it to just the right venue — but if your data visualization looks bad, or if your story layout makes it difficult for readers to quickly gather useful information, then your positive response rates are likely to be low.
Integrating how as a source of data and story context
You need to think about how your data was generated because that line of thinking often leads you into more interesting and compelling storylines. Before drawing up a final outline for your story, brainstorm about how your source data was generated. If you find startling or attention-grabbing answers that are relevant to your story, consider introducing those in your writing or data visualization.
Finding stories in your data
If you know how to quickly and skillfully find stories in datasets, you can use this set of skills to save time when you’re exploring the array of stories that your datasets offer. If you want to quickly analyze, understand, and evaluate the stories in datasets, then you need to have solid data analysis and visualization skills. With these skills, you can quickly discover which datasets to keep and which to discard. Getting up to speed in relevant data science skills also helps you quickly find the most interesting, relevant stories in the datasets you select to support your story.
Presenting a data-driven story
How you present your data-driven story determines much about whether it succeeds or fails with your target audience. Should you use an infographic? A chart? A map? Should your visualization be static or interactive? You have to consider countless aspects when deciding how to best present your story. (For much more on data visualization design, check out Chapter 9.)
Collecting Data for Your Story
A data-journalism piece is only as good as the data that supports it. To publish a compelling story, you must find compelling data on which to build. That isn’t always easy, but it’s easier if you know how to use scraping and autofeeds to your advantage.
Scraping data
Web-scraping involves setting up automated programs to scour and extract the exact and custom datasets that you need straight from the Internet so you don’t have to do it yourself. The data you generate from this process is commonly called scraped data. Most data journalists scrape source data for their stories because it’s the most efficient way to get datasets for unique stories. Datasets that are easily accessible have usually already been exploited and mined by teams of data journalists who were looking for stories. To generate unique data sources for your data-driven story, scrape the data yourself.
If you find easy-to-access data, beware that most of the stories in that dataset have probably been told by a journalist who discovered that data before you.
To illustrate how you’d use data scraping in data journalism, imagine the following example: You’re a data journalist living in a U.S. state that directly borders Mexico. You’ve heard rumors that the local library’s selection of Spanish-language children’s books is woefully inadequate. You call the library, but its staff fear negative publicity and won’t share any statistics with you about the topic.
Because the library won’t budge on its data-sharing, you’re forced to scrape the library’s online catalog to get the source data you need to support this story. Your scraping tool is customized to iterate over all possible searches and keep track of the results. After scraping the site, you discover that 25 percent of children’s books at the library are Spanish-language books. Spanish-speakers make up 45 percent of the primary-school population; is this difference significant enough to form the basis of a story? Maybe, maybe not.
To dig a little deeper and possibly discover a reason behind this difference, you decide to scrape the catalog once a week for several weeks, and then compare patterns of borrowing. When you find that a larger proportion of Spanish books are being checked out, this indicates that there is, indeed, a high demand for children’s books in Spanish. This finding, coupled with the results from your previous site scrape, give you all the support you need to craft a compelling article around the issue.
Setting up data alerts
To generate hot stories, data journalists must have access to the freshest, newest data releases that are coming from the most credible organizations. To stay on top of what datasets are being released where, data journalists subscribe to alert systems that send them notifications every time potentially important data is released. These alert systems often issue notifications via RSS feeds or via email. It’s also possible to set up a custom application like DataStringer (https://github.com/pudo/datastringer) to send push notifications when significant modifications or updates are made to source databases.
After you subscribe to data alerts and form a solid idea about the data-release schedule, you can begin planning for data releases in advance. For example, if you’re doing data journalism in the business analytics niche and know that a particularly interesting quarterly report is to be released in one week, you can use the time you have before its release to formulate a plan on how you’ll analyze the data when it does become available.
Many times, after you’re alerted to important new data releases, you still need to scrape the source site in order to get that data. In particular, if you’re pulling data from a government department, you’re likely to need to scrape the source site. Although most government organizations in western countries are legally obligated to release data, they aren’t required to release it in a format that’s readily consumable. Don’t expect them to make it easy for you to get the data you need to tell a story about their operations.
Finding and Telling Your Data’s Story
Every dataset tells a story, but not every story is newsworthy. To get to the bottom of a data-driven story, you need an analytical mind, plus basic skills in data science and a solid understanding of journalistic procedure for developing a story.
Spotting strange trends and outliers
One quick way to identify interesting stories in a dataset is to do a quick spot-check for unusual trends or extreme outliers. These anomalies usually indicate that an external force is affecting a change that you see reflected in the data.
If you want to do a quick-and-dirty spot-check for easy-to-identify stories, you can simply throw your data into an x-y scatter plot and visually inspect the result for obvious trends and outliers. After you spot these anomalies, look into reasons behind why the data behaves oddly. In doing so, you can usually uncover some juicy stories.
Illustrating this fact, consider the World Bank Global Development Indicator (GDI) open dataset, available for review at http://data.worldbank.org. Looking at this data, you can easily see a clear correlation between a country’s gross domestic product and the life expectancy of its citizens. The reason for this correlation is obvious: More affluent people can afford better healthcare.
But say you’re searching through the hundreds of GDI indicators for the year 2013 and you come across something less obvious — the survival rate of newborns is reasonably well-correlated with the percentage of employed females who receive wages or salaries instead of only performance-based remuneration, as illustrated in Figure 18-1.
FIGURE 18-1: A scatter plot of the inverse correlation between two GDI indicators.
The relationship in this data is a little murky. Although you naturally expect two metrics based on health and economic well-being to be related, after analyzing your data a bit, you find a Pearson correlation coefficient of 0.86. That’s quite high. Is there a story here? Does this qualify as a newsworthy trend? An effective and time-efficient way to explore answers to this question is to try to find the exception that proves the rule. In Figure 18-2, the simple least-squares best-fit line is in black, and the two data points that most differ (horizontally and vertically) from this line are indicated with light gray lines.
FIGURE 18-2: A least-squares line of the inverse correlation between two GDI indicators.
A Pearson coefficient is a statistical correlation coefficient that measures the linear correlation between two variables. A high (nearer to 1) or low (nearer to -1) Pearson correlation coefficient indicates a high degree of correlation between the variables. The closer the coefficient is to 0, the smaller the correlation between the variables. The maximum value for a Pearson coefficient is 1, and the minimum is -1.
You could also look for exceptions at the largest perpendicular distance from the line, but it’s a little more difficult to calculate. The topmost point in Figure 18-2 would fulfill that criterion.
Examining context to understand the significance of data
By pinpointing strange trends or outliers in your dataset, you can subsequently focus in on those patterns and look for the interesting stories about the external factors that cause them. If you want to cultivate the most thought-provoking story about what’s happening in your source dataset, you need to further investigate, compare, and contrast these factors you’ve identified. By examining the context in which competing causative factors are creating extreme trends and outliers, you can then begin to get a solid understanding of your data’s significance.
For example, think about the World Bank Global Development Indicator (GDI) in the preceding section. The topmost point in Figure 18-2 represents the country of Jordan. For a given level of child mortality, Jordan has an unusual number of women with predictable income. If you dig a little deeper into factors that might account for this outlier, you see that the overall employment rate for women in Jordan is among the lowest in the world. In a country where few women work, the women who do work in Jordan are earning relatively stable wages or salaries. This might indicate that the precariously paid work is being given mostly to men. Maybe the underlying story here is about gender roles in Jordan? If so, what conclusions could you draw by looking at another outlier country in the dataset — Peru, for example?
Well, because only 41 percent of Peruvian women are employed with a stable income, Peru is near the bottom ranks in terms of employed women with stable incomes. Perhaps this is to be expected in a country with so much agriculture by hand. And, in all honesty, Peruvian men aren’t much better off either, indicated by the fact that only 51 percent of them are reporting stable employment. But the Peruvian neonatal mortality rate is unusually low. Does Peru have an exceptionally well-funded healthcare system? Not really — it spends less, per capita, on healthcare than most of its neighbors. So what could be the cause for the low neonatal mortality rate?
Introducing contextually relevant datasets, the Social Institutions and Gender Index (SIGI) data ranks Peru low on the scale for economic gender equality, but quite high — at 17th — on an overall scale for gender equality that includes legal, societal, and educational metrics (http://genderindex.org/country/peru). Perhaps there’s a story there!
Although the possible stories identified in this exercise are not exceptionally dramatic or earth-shaking, that’s okay — not all stories have to be.
You shouldn’t expect to find a groundbreaking story in a dataset as accessible as World Bank’s open data. Again, the more accessible a dataset is, the more likely it is that the dataset has been thoroughly picked over and exploited by other data journalists.
Emphasizing the story through visualization
When it comes to data-visualization design, you always want to pick the colors, styles, and data graphic types that most dramatically convey the visual story you’re trying to tell. You want to make your visual stories as clear, concise, and easy to understand as possible. In making selections among visual elements for your data visualization, find the best way to visually convey your story without requiring your audience to have to strain and work to understand it.
Looking back to the World Bank Global Development Indicator (GDI) example from the preceding sections, imagine that you decide to go with the Peru story. Because the SIGI dataset is so relevant to the Peru story, you need to make a thorough study of that dataset, as well as any other datasets you’ve identified to be relevant. Time-series data on different statistics is likely to be quite informative because it should indicate several relevant metrics — the proportion of total income earned by women, survival rates of pregnant mothers, legal-based gender equality metrics, and so on.
After you gather and appraise the most relevant metrics that are available to you, pick and choose the metrics whose datasets show the most extreme trends. Subtle changes in data don’t lend themselves to dramatic and easy-to-understand visual stories. After you select the most dramatic metrics to use in telling your story, it’s time to decide the best way to represent that story visually.
For the Peruvian example, imagine that its legal system’s gender-based equality metric is the most striking statistic you’ve found in all the data that covers the topic of women’s status in that country. The dramatic impact of that metric makes it an excellent choice for the fundamental basis of your visual story. With that impact in mind, you decide to use a static infographic to show how the Peruvian legal system is more equitable when it comes to gender than that of its neighboring countries. Whatever story you decide to cover, just be sure that the visualizations you create are well branded — be sure to pick a common color palette, font type, and symbol style to unify your visualizations.
Solid visual branding practices govern that you make subtle associations between each of your related visualizations by using color, font type, or symbol style. If you choose to brand with color choices, for instance, then independent of the data graphics you use to display data on a metric, make sure to always use the same color to represent that metric across every graphic you employ.
Creating compelling and highly focused narratives
As you well know, no one wants to wade through a bunch of needless, complicated words to try to figure out what your story says. It’s frustrating, and it simply takes too much work. Presuming that your purpose for creating a data-driven story is to publish something that has impact and value in the lives of your readers, you must work hard to whittle your narrative down to its simplest, most-focused form. Failure to do so decreases the impact and performance of your data-driven story.
Narrow each of your stories down to its hook and lede before going any further into the process of writing a full narrative. In journalism, a hook is a dramatic angle that cultivates interest in potential readers and draws them into your piece. A lede is the first sentence of your story — it introduces the story and shows readers why the story is newsworthy. After you go through your story and flesh out a hook, a lede, and a full narrative, you always need to go back through the piece once or twice and cut unnecessary words or restructure sentences so that they most directly express the ideas you seek to convey.
Referring back to the Peru example from the preceding sections, gender equality is quite a broad subject. So, start by creating a hook — in this case, the hook could be a description of the most dramatic metrics indicating that Peruvian women are far better off than women in neighboring South American countries. Next, get to work on the lede — perhaps something like, “In Peru, economically disadvantaged women experience the highest chance of dying during childbirth, yet their children have among the highest chances of surviving.” Lastly, go back through and clean things up so that the lede is as clear and easy to understand as possible. For this example, you might rewrite the lede so that it reads, “Poor Peruvian mothers have the greatest childbirth mortality rates of any South America women yet, oddly, Peruvian infants demonstrate the highest chances of surviving childbirth.”