The Google Guys: Inside the Brilliant Minds of Google Founders Larry Page and Sergey Brin

Chapter 9 The Ruthless Librarians

Everybody believes in something, and everybody, by virtue of the fact that they believe in something, uses that something to support their own existence.

—Frank Zappa

You can’t create the world’s greatest library without being a bit ruthless. In order to build the Library at Alexandria, the Ptolemies would confiscate all scrolls found on ships that entered the port of Alexandria, returning copies to the owners. They sent emissaries to every corner of the Mediterranean, the Middle East, and India to collect documents, bought or stolen. Legend has it that Ptolemy II brought Jewish scholars from all twelve tribes of Israel to translate the Torah into Greek, the preferred language of the region. Scholars also believe that the resulting text, the Septuagint, became the foundation of early Christians’ understanding of the Old Testament. Even cookbooks were collected for the library. The Ptolemies also tried to collect the oldest version of every book they could, on the theory that those versions would be less corrupted by copying errors and later editors.

At that time, the Greeks were also adamant about preserving texts in their original form. Around 330 B.C., Athenian officials were disturbed to discover that actors were taking liberties with the works of Aeschylus, Sophocles, and Euripides, all dead for at least half a century and considered by Athenians to be the greatest writers who ever lived. So official versions of these works, as close to the originals as they could determine, were placed into the government’s records office, and it was mandated by law that performers stick to the proper text.

A hundred years or so later, Ptolemy III decided he wanted the Library at Alexandria to possess the official versions of these plays from Athens, so he asked to borrow them to have them copied. To ensure their return, the Athenians made him pay an enormous deposit, the equivalent of millions of dollars today. But Ptolemy was greedier for great books than for money. He sent back the copies he had made instead, keeping the originals for the library and forfeiting his deposit. It was the biggest library fine ever paid. At least, until October 2008, when Google reached a settlement with book publishers.

Larry and Sergey have wanted to build an electronic library for books since they were at Stanford getting paid by the Digital Libraries Initiative. That concept was put on hold while they started up Google and focused on Web search, but it was never abandoned. The Internet does not have all the world’s information, and almost none of the text that was created before 1995. But physical libraries are a rich storehouse of information created through the ages. Libraries are the obvious source to tap for books, since 95 percent of published works are now out of print. The largest libraries have spent decades or even hundreds of years building their collections, and some 75 percent of their inventory consists of books that are no longer being published, some of them very old. Sergey and Larry wanted to access that treasure every bit as much as the Ptolemy clan. The plan to realize that dream started coming together in 2002.

The first problem they had to face was how to get the books into digital form. They wondered how long it would take to scan and digitize every book in the world. So Larry decided to find out by scanning one book. He and Marissa Mayer, then a product manager, took a camera, a three-hundred-page book, and a metronome into his office. Using the metronome to keep time, Marissa turned the pages while Larry photographed each one. It took them forty minutes to capture all three hundred pages.

Larry and a small team then started visiting other book digitization projects, including one at his undergraduate alma mater, the University of Michigan. There he learned that the university estimated the time needed to digitize all seven million books it owned was one thousand years. He told University of Michigan president Mary Sue Coleman that Google could do it in six.

He also made what was then an unusual move for a software company by hiring robotics engineers to create a robotic page turner and scanner that could replace Marissa and Larry and their metronome. Such devices already existed on the market from other companies, but Larry thought that a good team at Google could do a better job by building a very gentle device that could handle older books with fragile pages. Software programmers at Google created a page-recognition software program that could recognize odd type sizes and unusual fonts in 430 different languages.

The team then started visiting large libraries to discuss their plan. At the Oxford University library, they examined centuries-old books that are stored carefully away and rarely brought out—and then only for qualified scholars. The Googlers talked enthusiastically about digitizing them and making them available to anyone. After more than a year of discussion, Oxford ended up becoming Google’s first partner in the Google Print initiative (later renamed Google Book Search) with an agreement to digitize its collection of more than one million nineteenth-century books within three years.

The Publisher’s Dilemma

That part was easy. Two-hundred-year-old books are no longer under copyright. But Larry also wanted more recent titles, some in print, some not. For this he needed the help of somebody who knew his way around the publishing business. He found a young man working at Random House named Adam Smith.

Smith is not your typical Google geek. Tall, thirtyish, athletically slim with neatly trimmed hair, he was loquacious on the day I met him, sporting the self-satisfied smile of someone who has been very successful and looking like the up-and-coming young publishing executive he once was. In 2003 Smith wrote an article in the New York Times about Random House’s interest in going digital. It caught the attention of the boys at Google. In August of that year, he agreed to meet with Larry, corporate counsel David Drummond, and advertising executive Susan Wojcicki, who were going to be in New York in a couple of days. It wasn’t to be a job interview, but Smith, a California native, hoped it would turn out that way. “I was thinking, this is my ticket back to California,” he says.

The meeting did not come off as planned. The Googlers showed up in New York just as a huge blackout shut down Manhattan. Google called Smith in for a formal interview in November. That’s when he finally met Larry.

As usual, the meeting was more of an information dump than an interview. “Larry was intellectually curious about the publishing industry, how it worked, and what motivated them. He wanted to know what drives them and what they’re interested in. He wanted very much to get insight to the industry in a way I would call ‘problem solving.’ He said, ‘Publishers must have problems, so what can we do for them? This is a big industry, and the Internet is going to be playing some role. So how can Google look at this from a product standpoint?’ ”

Larry was impressed enough with Smith’s answers to hire him. Smith officially joined Google in December 2003. A year later Google announced its partnership to digitize the books of five libraries—those at Harvard, the University of Michigan, Oxford, Stanford, and the New York Public Library. In each case, Google would cover all the expenses. It was important to the culmination of their long-standing desire to build the world’s biggest library. “From their days at Stanford, Larry and Sergey never really gave up on their dream of digitizing books,” says Smith.

The controversy started as soon as that initiative was announced. Officials at the French National Library immediately complained of bias. Just as the librarians of Alexandria heavily favored Greek volumes, the French complained that Google’s program was biased in favor of English-language books. That controversy was quelled relatively easily. CEO Schmidt traveled to Paris to explain the program, and Google expanded it to foreign sources. This was Larry’s intent from the start, his goals typically as ambitious as those of Ptolemy I. “We want all the world’s books, in every language,” says Smith. “And we want to be able to search across the full text of all books.”

At first, they thought the idea of digitizing books would be easily palatable. For one thing, all publishers were watching the written word going digital and were struggling to find a way to join the revolution. “The industry was just coming off its first wave of ebook strategy,” recalls Smith. “They didn’t get the traction they were expecting. It wasn’t clear that, left to their own devices, they would be able to make electronic books available to the public.”

Smith and his team found a receptive audience. They met with major publishers and offered the same deal they had given the libraries: the publishers would supply the books, and Google would digitize them, put them online, make short excerpts available to the public, and provide links to sites where people could buy the books, if interested. Google would cover all expenses and wouldn’t even collect any fees for referring buyers, although publishers could also run ads along with the displayed text, paying for clicks in a bidding auction like any other advertiser. In the case of books under copyright, people would be able to search through them to find a piece of information, but Google would provide only short snippets of text at a time. For publishers that wanted it, access would be further limited by restricting the amount of text available to just 20 percent of the book per month.

Google argued that the main benefit to publishers would be to raise the profile of books for sale. Most people don’t want to read an entire book online, and would rather buy an easier-toread paper version if they found the book interesting enough. Google’s approach, says Smith, is to offer an experience that “is akin to walking into a bookstore and flipping through books.”

In each publishing house, they found someone to champion Google’s idea. Many of them said yes, some of them on a provisional basis, to see how well the program worked. Some balked at the concept that offering free access to books would cause people to buy them.

Google also makes the program available to anyone with a book for sale. Book authors can put searchable copies of their work on their blogs with a link to Amazon, collecting Amazon’s 10 to 15 percent kickback every time someone clicks on the Amazon link and buys the books.

The program has been a success. Today it has digitized and posted about seven million books. Smith says that Google has found that the more book pages people view, the more likely they are to end up buying the whole book.

The Rest of the Books

But controversy over the program lingered. Books still in print were no problem. Books out of print were, because many of them are still under copyright, and the copyright owners are often difficult to find, especially when it comes to older published works.

Over the years, copyright law in the United States has become increasingly liberal toward the copyright owners. Modern copyright laws started a couple hundred years after Gutenberg invented the printing press in the fifteenth century. In 1662, Charles II, the king of Great Britain and Ireland, established the Licensing Act, creating a register of licensed books. This didn’t protect the creators of the work, but provided a monopoly to the Stationer’s Company to sell the works. In 1709, under the reign of Queen Anne, Britain established the first true copyright, known as the Statute of Anne. That act granted exclusive rights for twenty-one years, but this time to the authors instead of the publishers. Since then, copyright laws have been an issue of great controversy, revision, and debate.

In the United States, the founding fathers granted Congress the right to set copyright laws in the Constitution. Congress did so in 1790, stipulating that copyright holders should hold the rights to their works for fourteen years and had the right to renew once for a second fourteen-year term. After that, the works were in the public domain. Since then, the length of copyright has been repeatedly expanded. In 1831 it was extended to twenty-eight years with a fourteen-year renewal. In 1909 the renewal term was also expanded, to twenty-eight years. In 1976 it was expanded to the life of the creator of the work plus fifty years. In 1998 it grew to seventy years after the death of the work’s creator. This was later extended to ninety-five years. By this time, it became difficult to find just who owned the copyright, especially for obscure and out-of-print works. And copyrights have been applied to more and more types of works, including photographs, film, music, dramatic compositions, and eventually all published works, including computer programs, semiconductor chips, and online works. Nobody has to apply for copyright anymore—their works’ mere publication gives them all rights automatically.

Stanford professor and legal scholar Lawrence Lessig, who has advised Larry and Sergey on the matter, argues that this is unreasonable and that copyright laws have reached onerous levels that do more harm than good. He points out that in 1930, 10,027 books were published, and that 9,853 are now out of print, putting them out of reach of the public. But it’s almost impossible to determine which heirs, descendants, or corporations own the copyright to the vast majority of those books, which prevents them from being republished. Google, Lessig says, is trying to bring these books back to life. “Publishers don’t have a moral position to stand on,” he says. He describes Google Print this way: “The project promises to radically enhance our access to the past—to remind us of forgotten information. It is the greatest gift to knowledge since, well, Google.”

The big problem with current copyright law, Lessig argues, is that people can no longer create derivative works, a long-standing approach to creating new work. Shakespeare’s works are repeatedly used as inspiration, such as for the Broadway musical West Side Story, a retelling of Romeo and Juliet. Disney has been a powerful force in lobbying for extended copyright, preventing anyone from reproducing Mickey Mouse or creating a derivative character. Ironically, Mickey Mouse was itself a derivative work. The first Mickey cartoon, Steamboat Willie, from 1928, was itself a parody of a Buster Keaton movie, Steamboat Bill, Jr., which was created the same year—despite the fact that U.S. copyrights were extended to films in 1912.

Book publishers have a problem with Google’s ambitions to index all the world’s books. Google promised that it would respect any copyright, as long as it could figure out who held it. There is no comprehensive registry of copyright, but the copyright owner could come forward and claim the work, subjecting it to the same terms as works still in print. This wasn’t good enough for publishers. On October 19, 2005, the Authors Guild and the Association of American Publishers, the latter representing five major publishers—including Penguin Group (USA) Inc., the parent company of the publisher of this book—filed a lawsuit against Google, arguing that the plan to digitize, search, and show snippets of copyrighted books was illegal. The argument was that Google should find the copyright holders themselves, a task that Google said would be prohibitively expensive.

Google’s view has been that the Online Copyright Infringement Liability Limitation Act of 1997, which became Title II of the Digital Millennium Copyright Act of 1996, protects its efforts, since the act states that online publishers are obligated to remove copyrighted material only after copyright owners ask them to do so. But many experts disagree with this view, insisting that it applies only to sites where users post copyrighted works on a site owned by someone else, such as YouTube.

At Google’s annual shareholders meeting in 2004, Google lawyer David Drummond said, “We do run into a lot of areas where our innovation bumps up against laws that were not designed for the world we now live in. Sometimes others don’t share our commitment [to change]. Trademark and copyright laws are two areas where we have legitimate disagreements [with existing companies and lawmakers].”

In October 2005, Eric Schmidt wrote an op-ed piece in the Wall Street Journal titled “Books of Revelation.” In it, he argued that for most books, the amount of information available through Google Book Search would be comparable to a library card catalog. The lawsuit by the publishers would keep 60 percent of existing books out of Google’s library. “We find it difficult to believe that authors will stop writing books because Google Print makes them easier to find, or that publishers will stop selling books because Google Print might increase their sales,” wrote Schmidt.¹ He argued that book sales would increase and that people in developing countries would have access to information they could not possibly get any other way.

Larry and Sergey see their task as nothing less than creating a new Hellenistic Age. “We did not think necessarily we could make money” off Book Search, says Sergey. “We just feel this is part of our core mission. There is fantastic information in books. Often when I do a search, what is in a book is miles ahead of what I find on a Web site.”²

This Time, Compromise

But Larry and Sergey are getting older, and they have developed more willingness to compromise—some say they are compromising their idealism. On October 28, 2008, they reached a settlement with the publishers over Google Book Search. In exchange for the right to provide free full-text versions of out-of-print books, it agreed to make payments totaling $125 million. The money, which comes from revenue Google generates from publishing the works, goes to authors and publishers. It has also agreed to create a not-for-profit Books Rights Registry to try to locate the copyright holders, to collect and maintain details of their rights, and provide a way for the copyright holders to request inclusion in or exclusion from the project. “The boys have grown up,” says Schmidt. “The young men I started with seven years ago are now seasoned executives. They’re no longer the stereotype [of computer geeks]. It’s offensive to them to treat them any other way.”

David Drummond was the one who worked out the settlement. But Larry, Sergey, and Schmidt all signed on immediately. “When David came in and briefed us, it was a no-brainer,” says Schmidt. “There was never any significant dissent. It was a clever and innovative approach to the problem. We have to pick our battles.” And, one might note, $125 million is no longer a prohibitive amount of money to Google in exchange for ending a costly and time-consuming lawsuit (despite its pared-down daycare system).

Sergey enthusiastically praised the settlement when it was announced. “Google’s mission is to organize the world’s information and make it universally accessible and useful. Today, together with the authors, publishers, and libraries, we have been able to make a great leap in this endeavor. While this agreement is a real win-win for all of us, the real victors are all the readers. The tremendous wealth of knowledge that lies within the books of the world will now be at their fingertips.”

Some people, however, do want Larry and Sergey to remain the idealistic geeks defending their causes. Criticism of the settlement came from online. Some bloggers complained that by selling out instead of pursuing the case in court, Google gave up the opportunity to establish the legality of providing short passages from published works, which should fall under the “fair use” provisions of copyright laws. And, in fact, Larry and Sergey still believe that they are covered by fair use laws, even without the settlement. “The agreement we came to was not about lack of confidence in the legal case,” says Smith, “but about what we and publishers could do together.” The prevailing view at Google is that the settlement serves the interests of all parties involved. In response to the criticism over the settlement, Smith says, “Google isn’t in the business of establishing legal precedents.”

What About the Future?

Further into the future, the problem becomes even more complex. As technology improves, it will become easier to read entire digitized books online or through specially designed digital book readers. As books are digitized, it will be easy for pirates to copy and share them. People are used to lending books to friends, and electronic publishing will make it possible to share books with a few million of them. And given the nature of the Internet, people are likely to create book “mashups,” combining excerpts from many different books, videos, and music into new works. Who collects revenue from such projects?

Google does not create the problem, it simply adds fuel to the fire of a trend that has already been made inevitable by the Internet. When there are pirated books and mashups floating about the Internet, Google’s search engine will find them.

Existing industries still have to figure out how to deal with the Internet and its changes. The Internet, with Google pushing it along, is simply and significantly a new alternative to existing forms of media and entertainment. Bloggers, for example, have become a huge form of publishing. Social networks pull young people away from older forms of entertainment and socializing. Google has figured out how to make money off all these new forms of entertainment, while every competitor is struggling with that issue.

Online news is rapidly replacing the local newspaper. Google and companies such as Craigslist are providing a better and more cost-effective form of classified advertising, eating up a huge portion of newspapers’ revenue. Many news publishers don’t like this fact. Belgian news companies have successfully sued Google just for offering snippets of their articles before sending readers off to the source for the full story.

All publishers will have to come to terms with the Internet and will need to be deeply involved in the coming changes, including figuring out how to make a buck in an age of rampant piracy—or they could be sideswiped by these rapidly moving vehicles of change.

Most likely, Google will play a huge role in the transformation of these industries, and publishers and creators will have to follow its lead. Their best bet is to start thinking more like Larry and Sergey. They need to create economic models that accept a certain level of piracy. The best way to do that is to figure out how to serve the needs of their customers, rather than suing them to keep them from sharing works through the Internet. Google has already begun experimenting with, and tracking, the value of free distribution as a marketing tool.

But the solutions to these problems will not come easily, and many of the answers will be determined in court, often with Google as a defendant. It is likely that the supreme courts of many countries will end up defining the rights of publishers and, perhaps, redefining the scope of copyright law. The difference is that Larry and Sergey may not fight as hard to defend their views. They’re becoming less ruthless than the Ptolemy clan.