This is not Big Data. This isn’t even medium-sized data. This is two versions of the same novel, one from 1994 and the other from 2012. The book is Chronique de la derive douce, by Dany Laferriere. I spent a month digitizing and cleaning up the text, and today, I was finally able to run it through both Juxta and Voyant.
I was most interested by Juxta because it would highlight exactly what’s changed and what hasn’t in the text. What shocked me however was just how much the new version changed; I knew it had doubled in size, thus there was a significant portion added, but Laferriere went in and tinkered with the original text, something he hadn’t done in his other “new” versions of his work.
What did I learn from Juxta? First off, the text wasn’t as clean as I thought. As I was working on the document on different platforms (the mac at home and a PC at work), some of the commas and quotation marks were off. Plus, in the original version of the book, certain parts had the words broken up at the end of the lines, divided by a dash, while the new version didn’t. I made the executive decision that formatting was less interesting to me than the actual words themselves. So I went back and cleaned up the text some more.
What I ended up with was a great side-by-side comparison of the text. But first, how much had the text changed? A lot.
While it was processing the text, it said it was processing over 10000 changes. Great. That’s what a close-reading, textual scholar wants to hear.
You can click on the picture to see a larger version, but this is just how much the first page changed. The epigraph changed. The first verse changed. And I have to say that I probably wouldn’t have noticed these changes in the first verse (clearly, I noticed the epigraph) unless I was able to visualize it like this. The content of the verses are still pretty much the same, but subtly changed. It’s not like a wholesale addition or subtraction. Just…different.
Now we come to the heat map to see where the changes in the text have taken place. Conclusion? EVERYWHERE.
The last line of the book didn’t change. There isn’t anywhere else in the book that hasn’t changed somehow. This is going to take a lot longer than I thought. But what does Voyant have to say about my piece? Quite a bit actually.
First off, the number of words? Almost literally doubled.
I was really interested in the peeks and some of the words that appeared more frequently in one version versus the other.
Relative to the text, the words, “filles” and “fille” and “femme” (girls, girl, woman) and “chambre” (room) decrease in terms of their density from the first to the second version, but the word “temps” (time) increases. Here it is put another way:
This is actually REALLY interesting and possibly significant. I’d have to look at when and where a little more closely in terms of how they map within the two texts, but this shows me that his relations (as well as his room) becomes less significant, while the concept of time becomes more important or significant. This is a suspicion that I had long had about the revision, and this just confirms it.
I’m pretty excited, and I am grateful for Stefan Sinclair for helping me with some of the pickier aspects of Voyant. I’m going to be doing more work in here to study the two texts, but certainly, this is a great place to start.