Tuesday, August 29, 2017

How real stylometry works

https://medium.com/@amuse/how-the-nsa-caught-satoshi-nakamoto-868affcef595

"Using stylometry one is able to compare texts to determine authorship of a particular work.... According to my source, the NSA was able to the use the ‘writer invariant’ method of stylometry to compare Satoshi’s ‘known’ writings with trillions of writing samples from people across the globe. By taking Satoshi’s texts and finding the 50 most common words, the NSA was able to break down his text into 5,000 word chunks and analyse each to find the frequency of those 50 words. This would result in a unique 50-number identifier for each chunk. The NSA then placed each of these numbers into a 50-dimensional space and flatten them into a plane using principal components analysis. The result is a ‘fingerprint’ for anything written by Satoshi that could easily be compared to any other writing."

Note: real stylometry uses 5,000-word chunks, but the Interpreter thinks 300-word chunks are probative.
"The NSA then took bulk emails and texts collected from their mass surveillance efforts. First through PRISM (a court-approved front-door access to Google and Yahoo user accounts) and then through MUSCULAR(where the NSA copies the data flows across fiber optic cables that carry information among the data centers of Google, Yahoo, Amazon, and Facebook) the NSA was able to place trillions of writings from more than a billion people in the same plane as Satoshi’s writings to find his true identity. The effort took less than a month and resulted in positive match.
"This wasn’t the first time efforts had been made to unearth the identity of Satoshi using stylometry. Various reporters and members of the Bitcoin community have used various open source stylometry tools to attempt to uncover the true identity of Bitcoin’s creator. Their problem? They didn’t have access to trillions of emails from a billion people and they weren’t able to plug them into a supercomputer. The NSA’s proprietary software, bulk email collection ability, and computing power made it possible for them to conclusively identify Satoshi."