Writeprints - They will identify those other anonymous netizens... but not you - How to be Anonymous Online: Step-by-Step Anonymity with Tor, Tails, Bitcoin and Writeprints (2016)

How to be Anonymous Online: Step-by-Step Anonymity with Tor, Tails, Bitcoin and Writeprints (2016)

Section: Writeprints - They will identify those other anonymous netizens... but not you

As I stated before, my goal in writing this manual is to provide you with a means of being anonymous. It is not intended to be a comprehensive book about anonymity technology, so, please forgive me for the brevity on this topic. Still, I will try to give you a clear understanding of writeprints, how they can be used as a weapon against your anonymity and how to counter the attack. A sharp, good looking person like you can keep the writeprinters off your ass with just a little bit of knowledge and effort. Let's get to it.

Writeprints are a means of identifying an author solely from the characteristics of her written work. It is a separate discipline from handwriting analysis and digital forensics. With the ability for individuals to mask IP addresses and minimize digital fingerprints, writeprinting is often the only method available to identify the author.

The field of writeprinting is far from perfect, however, the accuracy of some writeprinting analysis is scary. Bloggers, tweeters, chatters, and posters are identified enough to warrant concern. I do not rank writeprints as high as fingerprints, digital fingerprints, handwriting analysis or DNA when it comes to evidence. Writeprinting is more comparable to a witness telling the police that the thief was "around 5'5, 300lbs, round-faced, oddly tanned skin, had short brown hair shaved on the sides, was smiling, waving, wearing a Dennis Rodman jersey and riding a white stallion into the sunset." That does not give the police a name or address, but, it does allow them to focus their search.

A number of methods are available for writeprint analysis. Most seek to identify an author by combining a variety of features, such as average word length, vocabulary complexity, favorite words, topics, grammar, punctuation, capitalization and sentence length. These nuances vary enough among individuals that they can be compiled to create a unique writeprint. To clarify, when I talk about the author, this does not only mean someone that has written a book, article or some other large body of work. That applies to individuals who post on message boards, tweet, blog or email. Ten 50 word tweets can be as useful as one 500 word email.

As part of China's 12th Five-Year Plan, funding was provided for extensive research into identifying bloggers with writeprints. Apparently, they find it imperative to identify people who exercise free speech. A China funded study at their own Wuhan University had an 80% success rate when attempting to single out different authors from a pool of 50 Amazon.com reviews [1]. Of course, China is not the only beneficiary of writeprinting. Corporations and other governments can use writeprints to identify whistleblowers.

The most famous case study in writeprinting involves the Federalist Papers. The Federalist Papers are 85 anonymously authored articles, published in the late 1780's, to promote the ratification of the United States Constitution. Speculation by scholars, as well as contradicting claims by various Founding Fathers, narrowed the field to only a handful of potential authors. Researchers writeprinted the 85 articles and determined there were three authors. Author #1 wrote 51 articles, Author #2 wrote 26 articles, Author #3 wrote five articles, and Authors #1 and #2 collaborated on three articles. By matching the writeprints of each article with the writeprints of the Founding Fathers, it was determined that Author #1 was Alexander Hamilton, Author #2 was James Madison, and Author #3 was John Jay.

Investigators are not always so lucky as to have a list of suspects from which to identify an author. This scenario was tested by a team of researchers who mined writeprints from an email database in which all emails were anonymous, and no suspects were given. The researchers extracted each email's writeprint. With that, they grouped the emails by their respective authors (example: suppose there were 100 emails, they determined 30 emails were by one author, 20 by another and 50 by a third). Once each author's emails were grouped, creating a larger body of work per author, a more accurate writeprint was extracted [2]. No further attempt was made to identify each author's true identity, but, one can only imagine what an entity with large resources and supercomputers scouring the web could do with the “more accurate” writeprints.

In another project, researchers extracted unknown authors writeprints from individual, anonymous blog posts. The writeprints were matched against a database of 100,000 non-anonymous blogs (2.4 million blog posts in total). With no further personal investigative work and strictly using open source software, the researchers were able to identify successfully the authors 7.5% of the time. When the researchers extracted an unknown authors writeprints using three anonymous blogs posts, the success rate grew to 25%. Even when writeprinting failed to match unknown authors to their non-anonymous blogs, the field of possibilities was often narrowed from 100,000 to 20 [3].

Given the previous research, let's play out a scenario that applies existing technology:

· An anonymous whistleblower named Sam is one of ACME Corporation's 100,000 employees.

· Sam anonymously writes an open letter to The News Network Times detailing the evil doings of ACME Corporation.

· Sam is thrilled to find The News Network Times publishes his open letter. The whistle is officially blown!

· ACME Corporation hires investigators to identify the whistleblower.

· Using existing software, the investigators writeprint Sam's letter and run it against ACME Corporation's database of emails from its 100,000 employees.

· Depending on the length of Sam's letter, we can project that there is a 7.5% to 20% chance the writeprinting software identifies Sam as the whistleblower.

Realistically, the investigators are going to be more thorough. The scenario will probably go more like this:

· An anonymous whistleblower named Sam is one of ACME Corporation's 100,000 employees.

· Sam anonymously writes an open letter to The News Network Times revealing ACME Corporation's secret toxic waste dump in Tumangang City.

· Sam is thrilled to find The News Network Times publishes his open letter. The whistle is officially blown!

· ACME Corporation hires investigators to identify the whistleblower.

· From the subject of the letter, it is clear that Sam is one of the 900 employees that have worked in Tumangang City since 2006.

· Using existing software, the investigators writeprint Sam's letter and run it against ACME Corporation's database of emails from those 900 employees.

Given this scenario, there is probably a much greater than 7.5% to 20% chance that the software correctly identifies Sam from the 900 employees. There is probably a near 100% chance that the writeprinting software can narrow the field to 20 suspects, of which Sam is included. From these 20 suspects, traditional investigative techniques can probably weed out Sam as the whistleblower.

This scenario might be hypothetical, but, it is not unrealistic.

You have an advantage over Sam in that you know writeprints exist. As such, you will not be naive when you send that letter exposing the toxic waste dump in Tumangang City. Moreover, since you are a genius, you can wear your leather writer gloves, so your prints do not end up all over the net.

IMPORTANT – You should not mask your writeprints in your daily life. You only mask them when you need to be anonymous. You do not want to alter your non-anonymous writeprints and end up with them matching your anonymous ones.

A few techniques you can apply to anonymize writing:

· Put your writing through online language translators. DO NOT rely on translators as the sole method of masking writeprints. Writeprinters can detect the use of online translators and the languages used. From there, it is occasionally possible to reverse-translate the writing to extract a writeprint [4,5]

· Write in short sentences to minimize the expression of your personality and the ability for writeprinters to match average sentence length to your daily writing [6].

· Write at an average level of intelligence, not at your physics Ph. D. level of intelligence [6]

· Do not use fancy words or elaborate descriptions. Favorite adjectives, rarely used words and consistent choices among words with many synonyms can be matched to your real life writing (Example: don't say he is mindless, dopey, idiotic, nonsensical, rash or half-baked. Just say he's fucking stupid) [6]

· Do not write in paragraphs. Writeprinters will compare average paragraph length to that of your daily writing [5]

· Write less than 250 words. Writeprinters want at least 500 words to pull an accurate writeprint. The 500 words could be from an email, compiled tweets, chatroom conversations, etc. [2]

· If the short sentences and robotic writing style do not appeal to you, try to observe and mimic another author. This is often successful [6]

· Do not make spelling mistakes or typos. Writeprinters will try to match those errors to your daily writing [2]

· Use perfect punctuation or do not use it at all (you can still use 'periods'... it's the 'commas' that'll kill ya). Again, writeprinters will try to match those errors to those of your daily writing [2]

You do not necessarily need to follow all of these methods. You can pick and choose those that you feel best work for you

Applying the previous techniques can make you stand out as trying to hide your writeprints. Feel free to use them as a guide to fake your writeprints instead of following them exactly to obscure your writeprints

Above all else, writeprinting relies on your consistency as a writer. Fortunately, you can change your style anytime without too much hassle.

For fun, visit http://www.textalyser.net to have stuff writeprinted.