Lab #8: Distant Reading 2

The idea for this lab was stolen adapted from my colleague Paul Fyfe of North Carolina State University. Prof. Fyfe describes his version of the assignment in “How Not to Read a Victorian Novel,” Journal of Victorian Culture 16, no. 1 (April 2011). Here’s how Prof. Fyfe introduces the assignment for his students:

Franco Moretti was dissatisfied with how literary scholars accept just a handful of possible texts as representative of cultural eras. Even if those texts are diverse and interesting, how can they possibly represent broader trends at scale? Moretti wants to change our sense of literary history by enlarging it, or by increasing our critical distance from it. He coined the phrase “distant reading” as an approach to analyzing lots and lots of texts instead of an unrepresentative few. Distant reading uses other modes of analysis and models of interpretation than the “close reading” we are familiar with. In his own work, Moretti compiles textual information from lots and lots of novels into maps, graphs, and logical trees. Seen this way, texts can reveal new patterns and language trends than we could otherwise discover close up. An array of digital visualization and text analysis tools now make Moretti’s methods more accessible to the casual user. The first paper will be an experiment in using these tools. We will consider “distance” not only as the subject of our course but also as a potential mode of reading and interpretation. What does literary criticism and analysis look like if we accept distance “as a condition of knowledge”?

Distance is a pretty good approach to the Victorian novel, considering that 40,000+ books of prose fiction were published in the last two-thirds of the nineteenth century. No one can read them all. But perhaps we can learn how to not read them. As Moretti and others have demonstrated, digital technology provides lots of interesting ways of doing this. Using some selected tools, you will analyze a big Victorian novel and then write a paper explaining your questions and insights. There’s one catch: it has to be a book you have never read.

English classes more typically emphasize close reading than “not reading.” This exercise will be new to many of you. So will the technology and the interfaces. The paper requires thinking about texts in a very different way than you might be used to. There may be dead ends; on the other hand, there will be no wrong answers. This preludes two important points:

  • Play. Experiment. This assignment is as much about testing the methods as it is learning about the text. The goal here is not to reconstruct a missing story, but to “read” the novel in a fundamentally different way, and to think about the implications of doing so.
  • Ask for help. Please don’t struggle with the technology, or tear hair in confusion about the assignment. Visit my office hours or email for an appointment if you’d like to go over this, work out a problem, or discuss how to talk about your results.
  • Use frustration creatively. This is perhaps the hardest and most essential trick. If you hit a dead end, feel frustrated, or get null results, how can you use that to learn? In other words, what might be the values of that frustration or failure in thinking about your critical approach? Try to take any moment of frustration as instead an opportunity to reflect on the kinds of questions you are asking and how you might change them.

Ready to get started?


Okay—got all that? Here’s how we’ll be engaging in the kinds of experimentation, computational analysis, and play that Prof. Fyfe describes in today’s lab:


    1. Choose a work to not read.

      I’m trusting you here—kindof; it will probably be obvious if you know the work when you write this up—but for this assignment you need to pick something you’ve never read. More specifically, I’m requiring you to pick a Nineteenth-Century novel. Pick a big, triple-decker that you’ve always felt guilty for not reading: e.g. Jayne Eyre, Anna Karenina, Les Misérables, Great Expectations, or even Moby-Dick. Other than the date of original publication, the only real requirement is that you’ve never personally read the book and that you can find its full text on a site like Project Gutenberg.

    2. Make some predictions

      What do you think this work is about? You’ve never read it, but if it’s a well-known book you probably have some idea what it’s about. Before you begin your computational analysis, then, list some predicted themes, characters, plot elements, or stylistic characteristics of the text. Be sure to write your ideas down in a document you can refer back to later.

    3. Create word clouds

      When provided with a bunch of text, tag cloud or word cloud engines will return you a graphical representation of the most common words: the more frequently a word appears in the text, the larger it appears relative to other words on the screen. On the ProfHacker blog, Julie Meloni called word clouds a “gateway drug” to textual analysis. Wordle is nice for making word clouds because, once your word cloud gets generated, you can toggle common English words (e.g. and, the, if) on or off, and you can customize or even “randomize” the display, allowing you different visualizations of the data. Using the text of your chosen work, experiment with Wordle until you get comfortable with the interface. Then run a couple of different tests with Wordle, making notes of your observations along the way:

      • Generate a cloud for your whole novel. Be sure to copy and paste only the actual words of the novel, not any of the metadata in the Project Gutenberg text file.
      • How you might “read” this? Come up with a few different observations. What kinds of words are there? Are there patterns or in/consistencies in the words? In what is relatively more or less frequent?
      • Try breaking the book into chapters or sections. Paste individual sections in, generate word clouds, and see what you can regenerate from a “distant” perspective.
      • Play with stoplists: in Wordle, toggle on/off the common English words. (You can also create your own custom stoplist, which is a little more advanced.)
    4. Reveal your texts


      Word clouds are a first step. Next, you will run (slightly) more sophisticated text analysis software on your novel using tools provided by Voyant (Voyant has had server troubles lately; if that link doesn’t work, use this link to the software on another server. Copy and paste the text of your chosen novel (again, just the novel, and not the metadata) and click “reveal.” Initially Voyant’s results will look much like Wordle’s. You’ll see a word cloud in the top left corner of the screen (You can turn on stoplists for the wordcloud by clicking the gear icon at the top of the wordcloud window), a summary of results below it, and the full text of your chosen work in the center. If you click “more…” in the summary window, however, another window will open below it showing the “words in the entire corpus.” “Corpus” means “a collection of written works,” and Voyant can be used to analyze many texts together; in this case, however, your corpus is one work.

      Look at the words by frequency. You might have to scroll through a few pages before you get past common words such as “the,” “and,” and so on. What are the first few less common words that appear most frequently in your novel? Double click one of the words listed, and a new set of tools will open on the right side of the window. You can look at “word trends,” which plots the relative frequency of words at different points in your novel. Below this you can click to open “Keywords in context,” which shows the words that appear around the word you’re analyzing within the text. If you look at the text in the center of the window, you’ll see that there’s now a “heat map” running along its left-hand margin which shows where your chosen word appears most frequently within the text. Jot down some notes about this word, and then compare those results with several other words in the “Words in the Entire Corpus” menu.

      Some questions to consider as you play with Voyant: does more focused attention to word frequency change your opinions about your book? What about scarce or infrequent words? What still don’t you know? In other words, what additional information might you need to gain insights? What insights, if any, do these tools provide? What keywords or patterns did you pursue and why? What might you suspect are the values and/or limitations of “not reading” this way? Where might it be useful in future research projects or in analyzing other kinds of texts?

    5. Visualize Your Novel as a Network

      Next, copy and paste your novel (again, without the Gutenberg metadata) into Textexture, which will visualize the words in the novel as a network. As John Handel writes in his useful introduction to Textexture,

      The gist of textexure’s process is that account of words that occur near each other. That is not to say words have to be directly next to one another, paragraph and sentence structures are both considered. In the network, the nodes are the words themselves while the edges (or connections from node to node) are determined by the co-occurrence between words either directly, in paragraphs or sentences. In terms of the visualization, this effectively does two things: 1. organizes groups of words into the communities they appear (color coded) so that with a quick glance at the network, you can pick out certain themes and 2. You can also easily see what the primary theme that links the various aspects together.

      In short, Textexture tries to automatically discern and visualize relationships between words, based on their usage, within a text. Looking at the network graph of your novel, what relationships seem most interesting? Experiment with the graph by clicking through different words. As Handel notes,

      the most useful part of Textexture is the ability to read your text through your network. Start by clicking on any node within your network, (often times the main connector nodes are useful but don’t always yield the most interesting results in this regard) now on the left, every section of your text with that word has been brought up and highlighted, so you can easily read the immediate passages that correspond to “Justification” for example. Additionally, your graph will now only show the other nodes that are connected to “Justification.”

      How do these relationships help you think about the characters, plot, settings, themes, or other aspect of your novel?

    6. Explore Ngrams

      Google’s Ngram Viewer displays the frequency of worlds over time by drawing on the massive Google Books corpus, which includes the text of more than 15 million books. For more on Ngrams, check out the Culturomics site. Choose several of the words you’ve concentrated on in your previous analyses and enter them into the Ngram viewer. Look at the frequency of those words through time, paying particular attention to their frequency when your chosen novel was published. Do any of them stand out, either as particularly common words during their time or, perhaps as interestingly, as particularly uncommon words during their time. Try a few more words from the frequency lists you generated in Voyant earlier. Then, try comparing some of the keywords from your chosen work with some keywords from your key work—do any interesting comparisons emerge? The big question here: can a tool like the Ngrams viewer, which analyzes so many texts, help you understand anything about the historical place of a book you’ve never read?

    7. Read the first chapter


      Now that you’ve not read the entire work, go back and actually read its first chapter or section. Did the textual analyses you performed prepare you to understand the themes, character, setting, or any other aspects of this first chapter? Did the trends you studied through “distant reading” cause you to focus on things in the chapter you would not otherwise have been paying attention to? Are there ideas you expected to encounter based on your textual analysis, but didn’t? Were there ideas in the first chapter that seem entirely unrelated to the analyses you performed beforehand? If you have time, read further, keeping the same questions in mind.

    8. Write a Short Reflection

      Finally, you’ll write a report describing what you did and what you learned. Please keep the emphasis on what you learned: a) about your chosen text, b.) about this kind of “distant reading.” I’m interested in your speculations, your thoughtful reflections on text analysis. Grades will be based on how thoughtfully you engage with the assignment and how clearly those thoughts are expressed in prose. You do not need a central argument (although it’s fine if you have one.) The goal of this assignment is to ruminate on what kinds of knowledge a distant reading can or cannot produce. In other words, it encourages you to think about how textual analysis changes our attention to texts. A good paper can have lots of unanswered questions. Good questions are evidence of thoughtfulness.

2 thoughts on “Lab #8: Distant Reading

Comments are closed.