Friday, October 2, 2015

Black boxes, openness, and your own experiments

Okay, some people wonder what I meant by the black box reference yesterday. A few weeks ago I posted a comment about that on the Interpreter but they censored it, so I'll explain it here.

The Roper/Fields articles on stylometry described their methodology in a generalized way. They don't give enough details for anyone to replicate their work, much less experiment with different databases, assumptions, etc. If anyone has details about their database, software, and parameters, let me know. Their work is a black box; we're expected to simply accept their data on its face, along with their conclusions.

Frankly, I don't think anyone but a few die-hard Mesoamericans believe the Roper/Fields stylometry analysis. However, as long as the Interpreter publishes their work without rebuttal (or even peer review), I suppose I have no choice but to point out how deeply flawed the Roper/Fields work is. In the next few days I'll post the problems with their methodology as well as their historical assumptions.

For now, here's a simple online analysis anyone can use.

http://www.aicbt.com/authorship-attribution/online-software/

It's not sophisticated, but it does look at function word analysis, lexical analysis, and punctuation analysis. True, you don't get all the data you would want if you were going to apply a variety of statistical methods. But at least it's open and you can try it for yourself with different authors and see how accurate you think it is. For example, compare speeches by Barack Obama and George Washington and see if it accurately determines the author. Or compare Stephen King to George Washington. You'll see that the greater the difference in the authors' work, the more the needle will point to the green.

Then try the software to compare Joseph Smith with Benjamin Winchester (and/or W.W. Phelps) regarding the 900 words. Make Joseph Author 1, Winchester Author 2, and the 900 words the Unknown Author.

Undoubtedly some Mesoamericanists will complain that this isn't a fair comparison because while Winchester and Phelps wrote about Central America and ruins and extracts and the rest, there is not a single writing by Joseph Smith that mentions any of those topics.

But isn't that exactly the point?

I'll discuss that particular issue in the next few days in more detail. For now, experiment yourself and see what you get.

Here are the results I got when I tried out the 3 columns I posted yesterday (after I corrected the spelling). Joseph Smith was author 1, Winchester was author 2, and the 900 words were the unknown author. The software, like your lying eyes, is pretty confident Winchester was the author.




Experiment to your heart's content.




No comments:

Post a Comment