Lab #8 – Distant Reading

“It was the best of times, it was the worst of times…” That opening line and the author are literally everything that I knew about A Tale of Two Cities by Charles Dickens. It struck me as strange that the title was so recognizable to me and yet at the same time I had no preconceptions about the story at all. However, it’s this completely blank slate that made this particular novel seem like a good choice for the distant reading exercise.

As I tried to make predictions about the story, I realized that I really didn’t know anything about it. I didn’t have any inkling as to what the characters names might be, I didn’t really know what the setting of the book was beyond the fact that there are two cities, and the plot could be about herding sheep and it would not have surprised me at all. Therefore, my prediction was understandably very vague: The story would involve two different cities and there would be people in those cities and they would have some sort of conflict. I shudder at the thought of how many stories I probably just described, but I guess at least it proves that I am not familiar with this book at all! So, I found the text online, copied, pasted, and was on my way.

A Tale of Two Cities WordleThis is the word cloud I created on Wordle with the text of A Tale of Two Cities. I thought it was interesting the “Mr” appeared to be the most used word (besides the obvious common words), and I could see that “miss”, “monsieur”, “madame”, and even “monseigneur” also made the cut into the word cloud. I figured that this must mean that the characters often address each other in a formal manner, which definitely coincides with Victorian literature and culture. It also suggested that the characters might be of a higher class, although not necessarily. The French words also made me suspect that one of the two cities in this tale must be in France. In addition to this, there also seemed to be a fair amount of names, and specifically last names: “Defarge”, “Manette”, “Carton”, “Darnay”, etc. At first I thought “lorry” might have been used as a term for a carriage like it is used today as a word for a truck, but that didn’t make much sense. Then I realized it was capitalized, so I figured it must be another last name. All of these last names reinforced my idea that characters were addressed formally, and also suggested that there are a great number of characters in the story. Another category I could create with the words in the cloud was body parts, such as “eyes”, “face”, “hand”, and “head”. This to me implied that there would be a lot of physical description in the story. It would certainly make sense if Dickens decided to show the emotion of his characters through their features and actions, so that could possibly account for the high usage of these words.

The word cloud was giving me a slight idea about the style of the book, but I still was extremely lost on what the plot was. So, I created word clouds for a few random chapters hoping that maybe this would shine some light on the plot. First, I did Chapter III and got words like “big”, “night”, “passenger”, “still”, and “shadow”. This created a picture in my mind of someone traveling through the night, perhaps secretly. Then, I skipped ahead some and created a word cloud for Chapter XIV. The words that stood out in this chapter were “father”, “mother”, “wife”, “funeral”, “hearse”, “dead”, and three parts of a name: “Mr”, “Jerry”, “Cruncher”. This made me think that Mr. Jerry Cruncher (whether those three parts actually went together or not) had a death in the family, but I had no way of knowing which member of his family it was. Then, I skipped ahead some more to Chapter XXIII and found words like “village”, “roads”, “chateau”, “monsieur” and “monseigneur” again, “fire”, “officers”, and “smoke”. This created the image of a building on fire and also confirmed in my mind that one of the titular two cities is in France. These three chapters most likely gave me some plot points, but there were no common threads in them and thus it didn’t clear up the plot at all for me. It seems like I would have to go chapter by chapter if I really wanted to learn the plot only from word clouds, at which point I should probably just read a summary.

A Tale of Two Cities Voyant word cloud

I then moved on to the more sophisticated word cloud technology, Voyant. Above is the word cloud that Voyant created, and as one would expect, the word clouds are very similar. The most striking difference was that Voyant must not include “said” in its list of common words to delete because it is the most used word in the cloud at 661 times. This means that there is definitely dialogue in the book, perhaps maybe even a lot of it. However, I bet books without extraordinary amounts of dialogue still use the word “said” quite often (this could be the start of a different project using these programs!). Also, although it’s not one of the larger words in the cloud, the word “prisoner” caught my eye. Was this the key to figuring out the plot? Looking at the other tools on Voyant, I could see that “prisoner” was used much more at the beginning of the story than at the end, so maybe the story begins with someone getting out of prison. However, this is pure speculation and still doesn’t reveal much about the story. So, I took the words that I had already encountered before out of the word cloud to see if I could get anything new. Now the words that jumped out to me were “doctor”, “father”, and “business”. Maybe Dad is having health issues and someone needs to take over the family business? Again, these words still didn’t tell me anything concrete about the story. Other words that stood out were pairs: “night” and “day”, “old” and “young”, “came” and “went”. Maybe there is some meaning in the juxtaposition of these ideas, or maybe this story just covers a lot people in many diverse situations. Or maybe even both. Either way, I really had no way of knowing without actually reading the book or an in-depth written analysis.

A Tale of Two Cities word network

The next stop was using the text to create a network on Textexture. Most of the words in the network were words that I had already encountered in the previous steps. They were all pretty connected to each other, but there still seemed to be four different groups that were the most connected. One group had words that seemed to correlate to travel, another was body parts, the next was mostly names and French/France-related words, and then the last was a miscellaneous grouping containing words such as “Mr” and “Lorry”, but also “hope”, “life”, and “answer”. I had already made these categories on my own, though, so the network didn’t help as much as I thought it would.

A Tale of Two Cities ngram - names A Tale of Two Cities ngram - titles A Tale of Two Cities ngram - body parts

I then went to Google’s Ngram viewer to search some of the words that are popular in the novel. I began with some of the names that I assume are the main characters: Defarge, Lorry, Cruncher, and Manette. All of the names see a spike in usage in 1855, right before A Tale of Two Cities began appearing as a series of weekly installments in 1859. Then, in 1859 all of the names plateau except for Lorry, which has another major spike. It seemed strange to me that these names would become popular right before Dickens began publishing the story, instead of when he began publishing it. Furthermore, the names don’t have a gradual ascent over a few years but a dramatic spike in the year 1855 that continues until the story is published. If there was a gradual rise in popularity, I would assume that Dickens just chose popular names for his characters. However, this rise comes seemingly out of nowhere and I have no explanation for it. Next, I searched for “Mr” and the other popular titles used in the story and found that “miss” was without a doubt the most used of these terms at that time. This differs from A Tale of Two Cities because in this story “Mr” is the most used of the titles. This suggests to me that there are more male characters in this story than female characters and that these characters, for whatever reason, must be referred to with their title. Lastly, I searched some of the words of body parts that seemed popular in the book. These words definitely had the highest usage out of all of the words I searched, but they aren’t especially popular around 1859. In fact, the usage does not fluctuate much, except for having a slight rise around 1900 and then descending back down a bit. Out of these words, “hand” is the most popular both in the story and general use according to Ngrams. I guess describing hands is just an all around popular practice!

Finally, I read the first chapter. The chapter is quite short, but it does confirm my suspicion that one of the two cities is in France while the other is in England. What surprised me is that the story is set in 1775, almost 100 years before the publication year. I didn’t really have any reason to think that the setting would be in 1859, but I still assumed it was and never considered that it was historical fiction. Also, the first chapter was really more like an introduction to the story, giving the setting and describing what the novel was generally going to be about (two different cities and how they’re both equally terrible). Since the first chapter didn’t give me too information, I decided to read the second chapter too. This chapter reinforces the idea that these cities are horrible because the characters were assuming that everyone was a depraved criminal. It also introduces us to a Mr. Lorry, who is a banker traveling from England to France. There also seems to be some sort of secret, coded message that will be important to the plot. However, I still don’t have a great idea about what this story is about.

I tried to be brief in recounting all of the different types of analysis we had to engage in for this distant reading exercise, but this is already much longer than any of the previous lab reports (sorry!) because analysis in itself is not short. Sure, doing this was much quicker than reading the book itself, but I didn’t get the same thing out of this that I would have gotten from reading the book. Analysis showed me tiny parts of this book in relation to a bigger picture and larger outside context, but it didn’t help me learn more about what the actual story is. Therefore, I think all of these types of analysis are much more useful if you already know and are familiar with the text. If there’s something specific that you want to look in to, such as when I suggested earlier examining whether using the word “said” a lot actually correlates to a lot of dialogue, these tools can help. However, just trying to know what the book is about is not a good use of these tools, as I still do not even know the major theme of this book, let alone the plot. These textual analysis programs can be used in a lot of exciting and helpful ways, but trying to discover the plot of the story does not seem to be one of those ways.

Of course, all of this leads to the question: Why even look at text this way? What do we get from text analysis? Some may think that just because it doesn’t give us the plot that that means it can’t help us understand the story better. However, it helps us understand the story in a different way than we might be used to by analyzing the language and its patterns. You might be able to write pages and pages about the major symbols of a story, but doesn’t knowing that the symbol is only used in certain parts of the story help you understand it better? And doesn’t knowing that the symbol is highly uncommon in books of that era or books about that particular subject help put it into context and reveal a whole other layer of meaning? It certainly does, and that’s exactly what text analysis does. We already do some of this text analysis in our brains when we’re reading, but these programs let us do it much faster and on both a much smaller and larger level. I remember when J.K. Rowling first began publishing under the pseudonym Robert Galbraith there were reports of people using text analysis to prove that the books were most likely written by J.K. Rowling because of similarities in word choice, sentence structure, etc. That’s incredible, and a completely different use of the programs than I have already described even though it’s the same idea. You really can do so much with text analysis, it just depends on what you’re looking for. However, that’s the important thing: you need to know what you’re looking for. It was interesting to use all of these different programs and see how they work, but there is so much (probably irrelevant) information that you don’t want to read in this post because I didn’t have a specific thing that I was looking for. If anything, the thing I was looking for most was probably plot, but that’s not something you can really get from text analysis. You still have to actually read (whether the story itself or a summary) to get that!


Leave a comment

Your email address will not be published. Required fields are marked *