Revisiting the Social Networks of Daniel Deronda

My twitterstream overflowed, in the past few days, with tweets about the uses, misuses and limits of social networking.* Coincidentally (or perhaps not, given the identity of at least one retweeter), we discussed the role of social network graphs in humanistic inquiry in this week’s session of Alan Liu’s “Intro to Digital Humanities” class. For those of you following along, we are #engl236 on Twitter and, last week, we made graphs. So I am going to interrupt my glacial progress through the possible uses of R**and put the longer-form meditation on what I am trying to do with these experiments in statistical programming on hold in order to talk about my latest adventures in social network graphing.

As longtime readers of this blog will remember, this is not my first foray into Social Network graphing. Nor is it my second. This gave me a huge advantage over many of my colleagues (sorry!) because I had already spent hours collecting and formatting the data necessary to graph these kinds of social networks. Since I wasn’t going to map new content, I thought I would at least learn a new program to handle the data. So I returned to Gephi, the network visualization tool that I had failed to master 18 months ago.

And promptly failed again.

PSA: If you have Apple’s latest OS installed, Gephi will not work on your machine. I and two of my classmates discovered this the hard way. Fortunately, the computers in the Transcriptions Lab are–like most institutional machines–about an OS and a half behind and so I resigned myself to only doing my work on my work computer.  After some trial and error, I figured out how I needed to format the csv file with all my Daniel Deronda data and imported it into Gephi. After some more trial, more error, and going back to the quickstart tutorial, I actually produced a graph I liked. Daniel Deronda in Gephi

In this graph, size signifies “betweenness centrality” which is a marker of how important a circle is in the graph according to how many connections the node has and how often that node is necessary for getting places in the network (i. e., how often the shortest path between two other nodes is through this node), which means that the node’s size indicates how vital that person is to other people’s connections as well as how many connections they themselves have. Color signifies grouping. Nodes that are the same color are nodes that have been grouped together by Gephi’s modularity algorithm…which is Gephi’s function for dividing graphs into groups.

So here we see three groups, which can be very roughly divided into Gwendolen’s social circle, Deronda’s social circle and Mirah’s social circle. There’s something delightful about the fact that the red group is made up entirely of the members of the Meyrick family and the girl they took in (Mirah). So Mirah truly becomes a member of the Meyrick family.

As this is a comparative exercise, I’m less interested in close-reading this graph and more interested in thinking through how it compares to yEd.

Gephi is certainly more aesthetically pleasing than yEd, especially given the settings I was using on the latter. And, unlike yEd, Gephi can very easily translate multiple copies of the same interaction into more heavily weighted lines, which helps provide a better idea of who speaks to whom how often in the novel (something I had been struggling with last year). At the same time, yEd’s layout algorithms seem far more interesting to me than Gephi’s “play around with Force Atlas until it looks right” approach. So while the layout does, I think, do a decent job of capturing centrality and periphery, it is less interestingly suggestive than yEd.

The other failing that Gephi has is the lack of an undo button. This might seem trivial to some of you, but being able to click on a node, delete it from the graph and then quickly undo the deletion was what made it so easy for me to do “Daniel Deronda without Daniel (and, erm, Gwendolen)”. With Gephi, I have this paranoid fear that I will lose the data forever and it will automatically save and I’ll have to do all this work over again. After a while, I finally screwed my courage to the sticking place and deleted our main characters to produce the following three graphs.

Daniel Deronda without Daniel inGephi

Daniel Deronda without Daniel

Daniel Deronda without Gwendolen

Daniel Deronda without Gwendolen

Daniel Deronda without Either

Daniel Deronda without Daniel or Gwendolen

The results are interesting, although perhaps less interesting than the disk-shaped diagrams from yEd that demonstrated changes in grouping. yEd allowed for some rather fine-grained analysis about who was regrouped with whom. On the other hand, Gephi makes it clear that both Gwendolen and Deronda tie together groups that, otherwise, are more distinct, as shown by the sudden proliferation of color in the first and third graphs particularly. Gephi makes it easy to see Deronda’s importance in tying many of the characters together. His influence on the networks is far stronger than Gwendolen’s.

Now, for the sake of comparison, here are the Gephi and yEd graphs side by side.

Daniel Deronda Gephi and yEd Comparison

I have not yet performed a more complete observational comparison of the layout, centrality measures and grouping algorithms in Gephi versus yEd (which, I admit, would begin with researching what they all mean) and the relationship between how data is presented and what questions the viewer can ask, but here are my preliminary reactions. Gephi does a far better job of pointing to Deronda’s importance within the text while yEd is better at portraying the upper-class social network in which Gwendolen in enmeshed. And while Gephi’s layout invites the viewer to think of its nodes in terms of centrality and periphery, yEd’s circular layout structures one’s thought along the lines of smaller groups within networks. Different avenues of inquiry appear based on which graph I look at.

This comparison produces three different questions.

  1. How do you know when to use which program? Can one tell at the outset whether the data will be more interesting and approachable in Gephi, e.g., or is this the perfect application of the “guess and check” approach where you always run them both and then decide which graph is more useful for the kinds of questions you want to ask. Are my conclusions here, about Gephi’s focus on centrality versus yEd’s focus on group dynamics, representative?
  2. How meaningful are the visual relationships one perceives in the network?
    1. Let’s take the graph above as an example and go for the low-hanging fruit. Young Henleigh, the illegitimate son of Grandcourt is way down at the bottom of the graph, connected unidirectionally to his father (his father speaks to him, but he does not speak back) and bidirectionally to his mother, with whom he converses. Gephi has colored him blue, indicating that, at least according to Gephi’s grouping algorithm, he is more closely associated with the other blue characters (a group made up predominantly of those who show up in Daniel’s side of the story and who I am valiantly resisting calling the Blue Man Group). Arguably, this is because those in Deronda’s circle talk slightly more about the boy since they have heard rumors of his existence, while those in Grandcourt’s social circle have not. And Henleigh’s repulsion distance is another indicator of how Grandcourt ignores his son and keeps his family at a distance.
    2. That is, I think, a fair reading of the book Daniel Deronda. My conclusions are borne out in the text itself and are justifiable within the larger narratives of Grandcourt’s treatment of others, a topic that I’ve written about several times over the course of my graduate career. But is it a fair reading of the graph? Am I taking accidents of layout as purposeful signals? Or are my claims, grounded as they are in edge distance and modularity, reasonable?
  3. In addition, did the graph actually tell me this information in a way that the book did not or did it simply remind me to look at what I already knew? This is part of an old and still unanswered question of mine – will the viewing of the social network graph ever really be useful or is it the decisions and critical moves that go into making the graph that produce results?

Obviously, this last question only applies to work like mine, where the graph is hand-coded and viewed as a model of an individual text. In cases where this work is mostly automated and several hundreds of novels are being studied for larger patterns of interactions, the question of whether the graph or the making thereof produces the information is irrelevant.

But the question of what kinds of meaning can be located in layout and pattern is still crucial, especially when one is comparing how different networks “look”. This may be a particularly pernicious problem in literary criticism and media studies: we’re trained to look at texts and images and treat them as…intentional. Words have meaning, pictures have meaning and we talk about this larger category of “media objects” in a way that assumes that their constituent parts have interpretable significance. This is not the same as claiming authorial intentionality, it’s simply an observation that, when we encounter a text, we take it as given that we can make meaning using any element of that text that impinges on our consciousness. There are no limits regarding what we can read into word choices, provided we can defend our readings and make sense out of them. Is that true of graphs? Are we entitled to make similar claims by reading interpretations into features of the layout and with the only test of said interpretation’s veracity our rhetorical ability to convince someone else to buy it? For example, could I claim that Juliet Fenn’s position on the graph between Deronda and Gwendolen shows that she, and all that she stands for, comes between them?  My instinct is to say no. But the same argument about place applied to a different character makes perfect sense. Mordecai’s place is between Deronda and the group of Jewish philosophers on the far right is emblematic of how he connects Deronda to his nation and how he is the one who rouses Deronda’s interest in Zionism.

I can think of three off-the-cuff responses to this problem. The first is to say that location is a fluke and, when it corresponds to meaning, that’s an accident. This feels unsatisfying. The second is to say that there is something about Juliet Fenn that I’m missing and, were I to apply myself to the task, I could divine the reason behind her placement. This is differently unsatisfying, not because I don’t think I can come up with a reason, but because I am afraid that I can.*** And if I succeed in making a convincing argument, is that because I unearthed something new about the book or because I’m a human being who is neurologically wired to find patterns, a tendency exacerbated by my undergraduate and graduate training in the art of rhetorical argument? In short, the position that all claims that “can” be made can be taken seriously is only marginally less absurd than the claim that all layout elements are always meaningless and, consequently, any meaning we make or find is insignificant. The third response heads off in a different direction. Perhaps my discomfort with reading these networks lies not in the network, but in my own lack of knowledge. I have not been trained in network interpretation and I need to stop thinking like a literary theorist and start thinking like a social scientist. I need to learn a new mode of reading. This, while perhaps true, also leaves me dissatisfied. I am not, fundamentally, a social scientist. I am not looking for answers, I’m looking for interesting questions/interpretive moves/ideas worth pursuing. While it would be very cool to show, in graph form, how Mordecai’s ideology spreads to Daniel and how ideas act as a kind of positive contagion in this novel, that theory is not stymied if there is insufficient data to prove it. I can take imaginative leaps that social scientists responsible for policy decisions must absolutely eschew.

Which means it is time to think about a fourth position. If we, as scholars of media in particular, are going to continue doing such work, then we need a set of protocols for understanding these visualizations in a manner that both embraces the creativity and speculative nature of our field while articulating the ways in which this model of the text corresponds to the actual text. Such a set of guidelines would  be useful not only as a as a series of trail markers for those of us, like me, who are still new to this practice and unsure of where we can step, but also as a touchstone that we can use to justify (mis)using these graphs. If the sole framework currently in existence is one that does not account for our needs, we may find ourselves accused of “doing it wrong” and, without an articulated, alternative set of guidelines, it becomes exponentially more difficult to respond. On the most basic level, this means having resources like Ted Underwood’s explanation of why humanists might not want to follow the same steps that computer scientists do when using LSA available for network analysis. Underwood explains how the literary historian’s goal differs from the computer scientist’s and how that difference affects one’s use of the tool. Is there a similar post for networks? Is there an explanation of how networks within media differ from networks outside of media and advice on how to shift our analytic practice accordingly? Do we even have a basic set of rules or best practices for this act of visualizing? And, if not, can we even claim these tools as part of our discipline without actually sitting down and remaking them in our image?

I don’t want to spend the rest of my scholarly career just borrowing someone else’s tools. I want Gephi and yEd…and MALLET and Scalar and, yes, even R to feel like they belong to us. Because right now, for all that I’ve gotten Gephi to do what I want and even succeeded in building a dynamic graph of the social network of William Faulkner’s Light in August (which told me nothing I did not already know from reading the book), I still feel like I’m playing in someone else’s sandbox.

*Granted, this is Twitter and so three posts, each retweeted several times, can make quite a little waterfall.

**I will say that the R learning curve made figuring out Gephi seem nearly painless by comparison.

***In the interest of proving a point, a short discussion of Juliet Fenn: Juliet Fenn’s location between Deronda and Gwendolen and at the center of the graph is significant precisely because she is the character who represents what each of them is not. Juliet is of the more aristocratic circle defined by Sir Hugo and his peers and, unlike Daniel, actually belongs there by birth. She beats Gwendolen in the archery contest, which proves her authenticity both in terms of talent and, again, aristocracy. Were either Daniel OR Gwendolen authentically what they present themselves as (and, coincidentally, who their co-main-character perceives them to be), Juliet Fenn would be Gwendolen’s mirror and Deronda’s ideal mate. As neither Gwendolen nor Daniel are, in fact, who they seem to be, Juliet is neither. She is merely a short blip during the early chapters of the book who can be easily ignored until her graphic location discloses the subtle purpose of her character–the idea of a “real” who Gwendolen cannot be and Deronda cannot have. Of course, neither character explicitly wants or wants to be Juliet. This isn’t meant to be explicit, merely to color our understanding of the otherness of Deronda and Gwendolen. It’s not that Juliet Fenn keeps them apart per se, but the discrepancies between who she is and who they are, as illustrated by the graph, is what makes any relationship between Gwendolen and Deronda impossible.

The Limits of Social Networks

Though we have mostly gone our separate ways over the past year, I find that I am attached to the idea of the LuAn collective and want to keep it going just a bit longer. After all, you never know when you might need a data viz blog that you co-run.

As a second year student in the English department at UCSB, I am gearing up to take (i.e. reading madly for) my qualifying exams this June. As luck would have it, I am also finishing up my course requirements this quarter, so I find myself in the…unenviable position of writing a paper on a topic that would ordinarily lie far outside my interests in the 19th century English novel: William Faulkner. So I did what any digital humanist with an unhealthy interest in visualization would do in my situation – I made a graph.

I wanted to write a final paper for this course that reflects my theoretical interests and would allow me to continue developing a subset of my digital skills. Of course, trying to get all of my interests to move in more or less the same directions is like herding kittens, but I had been seeking another opportunity to think through a novel using a social network graph and, well, I wouldn’t have to start from scratch this time. I knew how my graphing software, yEd, worked and I knew how long it took to turn a book into a collection of Excel cells denoting conversations (20% longer than you think it will take, for those of you wondering). So why not create a social network graph of one story in Yoknapatawpha?

Don’t answer that question.

Light in August is widely considered to be the most novel-like of Faulkner’s novels, which made it a good choice for my project. After all, I had experience turning a novel-like novel into a social network graph and no experience whatsoever with a text like The Sound and the Fury. Much as I was intrigued by and even enjoyed The Sound and the Fury and Absalom, Absalom!, the prospect of figuring out the rules for graphing them was…intimidating to say the least.

For all its novelistic tendencies, Light in August is still decidedly Faulknerian and, in order to work with it, I found myself either revising some of my previous rules or inventing new ones. When I worked on George Eliot’s Daniel Deronda, I had used a fairly simple set of two rules: “A bidirectional interaction occurs when one named character speaks aloud (that is, with quotation marks) to another named character. A unidirectional interaction occurs when a named character speaks aloud about another named character.”

Here are the Faulkner rules:

  1. When one character speaks to another, that interaction is marked with a thicker, dark grey arrow.
  2. When one character speaks about another, that interaction is marked with a thin, dark blue arrow.
  3. When one character speaks to another within another character’s narration (i.e. X is telling a story and, in it, Y talks to Z), that interaction is marked with a thicker, light grey arrow
  4. When one character speaks about another within another character’s narration, that interaction is marked with a thin, green arrow.

There are several changes of note here. First, I learned more about yEd and figured out how to put properties like line size and color in the spreadsheet itself so that the software would automatically map color and line weight as appropriate. This meant I could make finer and clearer distinctions than last time, at least in terms of showing kinds of communication. Second, I changed the rule about quotation marks because quotation marks don’t necessarily connote audible speech in Faulkner, nor does their absence connote internal monologue. I relied entirely on the dialogue tags in the text to decide whether a sentence was spoken aloud or not. Finally, I changed the rule about named characters. All speaking characters are represented in the graph, regardless of whether or not we are ever told their names. Had I not changed this rule, the number of characters of color represented in this graph would have fallen from 15 to 3. There are 103 distinct nodes in this graph, which means 103 characters speak in this text.

Jeffrey Stayton, in an article entitled “Southern Expressionism: Apocalyptic Hillscapes, Racial Panoramas, and Lustmord in William Faulkner’s Light in August” (which, in the interest of full-disclosure, I am still in the middle of reading), discusses how Faulkner figures racial landscapes in Light in August as a kind of Southern Expressionism. It is fitting, of course, that one of Willem de Kooning’s expressionist paintings is based on and entitled “Light in August”. But this graph highlights the relationship between fading into the background and remaining unnamed, it shows how easily racial landscapes can become racial backgrounds and how easily it is to elide the unnamed. In the Victorian novel, a certain charactorial parsimony seems to ensure that everyone who speaks is named. Daniel Deronda is 800 pages long and contains 62 character nodes. Light in August is 500 pages long and contains 103. If you remove all the unnamed characters, there are 44 character nodes. (For those of you counting, thats 38/88, close to half of the white characters, and 12/15 or four fifths of the black characters. The other 8 are groups of people, who seem to speak and are spoken to fairly often in this text.)

There are several ways to interpret this difference and I am loathe to embrace any of them without, frankly, having done more work both with Faulkner and with the Victorian novels. One of the things I find striking, though, is that Light in August seems to be making visible (though only just) things that are either not visible or entirely not-present in Daniel Deronda. Light in August is told from different characters’ viewpoints and the narration always locates itself in their perspective and confines itself to what they know. So the graph becomes a record not only of what they have seen, but also of how they have seen it.

I can hear some of you grumbling “What graph? You haven’t shown us a graph yet!”

My apologies. For that, I will give you three. Anything worth doing is worth overdoing.

1) The first graph.

Light in August Social Network Organic DiskClick to see it in full size.

In this graph, color corresponds to importance, as determined by number of interactions. The darker the color, the more interactions that character has had. That dark red mark in the middle is Joe Christmas.

2) The graph without the unnamed characters

Light in August Social Network Organic Disk Sans Unnamed

Click for full size.

Colors mean the same here that it did in the previous graph.

There are several differences between the two graphs. Obviously, the second is legible in a way that the first one is not, which is not entirely a virtue. When it comes to graphing, legibility and completeness tend not to walk hand in hand. The more you leave out, the more you can see so, contra-positively  the less you can see, the less you have left out. The best-of-both-worlds solution is to use both images.

Interestingly enough, there are no unconnected nodes in the second image, even though I deleted half of the nodes in the graph. That surprised me. I expected to find at least one person who was only connected to the network through one of the unnamed characters, but there’s no such person. And many of the people who remain are not characters I would consider to be important to the story (Why has the entire history of the Bundren family remained more or less intact? Who is Halliday, anyway?)

These are questions to be solved, or at least pondered. They are, at any rate, questions worth asking. If the network remains intact without these characters, what does their presence signify? What has changed between the first graph and the second?

After all, I do have a paper to write from all of this.

I promised you a third graph, did I not? This one moves in a rather different direction. As part of its ability to organize and rearrange your graph, yEd has a grouping functionality and will divide your graph into groups based on the criteria you choose. I had it use natural clustering.

A grouping into natural clusters should fulfill the following properties:

  • each node is a member of exactly one group,
  • each node should have many edges to other members of its group, and
  • each node should have few or even no edges to nodes of other groups.

yEd gave me 8 distinct groups, two of which had only two nodes in them.

Light in August Social Network Grouped

As always, click for full-size.

I assume that when yEd said that the groups would have few or no edges to nodes in other groups, it was doing the best it could with the material I gave it. I then had yEd rearrange the positions of the nodes so that the centrality of a node’s position within a group indicates how many connections it has.

What I love about this graph is how it divides Light in August into a set of six interconnected but distinct narratives. Each group larger than two centers around a specific character or group of characters involved in one thread of narrative. Joe Christmas, who is arguably the main character, has one section (along with a plurality of the other characters of color), Lena Grove, Bryon Bunch and Joe Brown are all grouped together in another and, while they talk about the characters in Joe Christmas’s section quite often, they have only three conversations with the characters in that group. Those are the two largest groups. Percy Grimm, for all that he only appears in one chapter, manages to collect 7 other nodes around himself and does seem, in his own way, to be the protagonist of his own story who just walked into this one for one chapter and then left again. He is also the only named character in his section.

Social network graphs are, for me, a way of re-encountering a text. They strip away most of the novel and model only a small portion of what is present in the text, but that portion becomes both visible and analytically available in a new way. (I think seeing and visibility will become a theme in this paper, once I write it.) The title of this course is “Experimental Faulkner”. I like to think that this qualifies.

What Are We Doing With Our Visualizations?

A colleague of mine pointed me towards the following post about Shock and Awe Graphs in the Digital Humanities. The author, Adam Crymble, makes some decidedly thought-provoking points about what graphs are meant to be doing and how data visualization can sometimes work as a tool of intimidation as well as elucidation.

So before you publish a visualization, please take a moment and step back. As in the cult classic, Office Space, ask yourself: Is this Good for the Company?

Is this Good for Scholarship?

Or am I just trying to overwhelm my reviewers and my audience?

The authors of the blog Clioviz respond to Crymble’s question with a post In Praise of Shock and Awe, which also (and unsurprisingly) has some very good points to make about the value of disseminating information via visualization. They note that a certain amount of “shock and awe” in inevitable in fields like ours where the mere existence of plotted data points is enough to give some scholars palpitations. The main thrust of their argument, however, is that complex, beautiful and awe-inspiring graphs are not inherently a bad thing when they are usable. If a graph is complex to the point of unreadability, that is usually because the graph-er was attempting a kind of elegant complexity and failed. (This, of course, returns us to one of the basic problems of DH: we’re doing things we were never trained to do and the success of being able to do them at all blinds us to the necessity of doing them well.)

Both pieces make certain assumptions that I think we, as the Ludic Analytics group, are not willing to make. The first is that visualizations exist to convey information to the reader and the second that visualizations must have some immediately identifiable utility. The visualization presented at the beginning of Crymble’s piece is meant as a joke, but because a) he doesn’t provide any more serious examples and b) the point I’m trying to make works just as well, I am going to pretend it is real and assume that if I can answer his reductio ad absurdum with logic, then said logic can surely be applied to more reasonable work. Like others of its ilk, this image a piece of art I would frame and hang on my wall rather than a readable graph. It offers very little in the way of interpretation to the untrained viewer and is, as Crymble says with his tongue firmly in his cheek, about 18th century cattle’s preference for south facing barns. Crymble is frustrated when asked to view graphs like this as proof. However, were there no image whatsoever–had he merely read a paper that claimed to have looked at the data and found that cattle preferred south-facing barns–I would imagine he would have had less trouble with the assertion. This visualization exists because it can, not because it makes any particular point. It is there to be beautiful. And were it real, it would also show that the researchers engaged with the data to the extent necessary to produce such a graph. It would not be proof of point, but proof of process. And I would imagine that the task of creating such a visualization and dealing with the information would give the researchers a better understanding of their data, even if the visualization lacks a trickle-down effect of understanding to the reader.

This brings me back to something I had discussed earlier, which is that our data always seems to be more useful for ourselves than for our readers. This may explain why the scholarly article and book have had such a long life; they don’t simply convey understanding, they enact it as well. Close reading recreates, in the article, the process through which we imbue texts with meaning. The act of applying historical research to a volume of literature mimics the act of research and the flash of understanding that comes when one grasps how a specific historical fact is relevant to the text at hand. Articles are processes, they are a temporal movement towards the end of an argument. Visualizations, however, lack that sense of journey. They are always, already, at the end even when you, the reader, are still at the beginning.

I can think of several possible solutions to this problem. One is to accompany visualizations with detailed descriptions of their genesis (Stephen Ramsay does this to good effect in his article “In Praise of Pattern”). Another is to create dynamic visualizations that can operate on a temporal as well as spatial scale. For example, imagine a social network graph where you can watch the edges build up between the different nodes while the nodes move around to create different groupings as the networks grow over the course of a novel. You could even have edges fade slightly if a connection has not been mentioned for over ten chapters, for example. As might be evident, I find this idea truly exciting and would love to imagine a novel performed as a network graph. A third option would be to use the visualization not as proof of theory, but as a starting point for the reader to form her own conclusions about the topic. The visualization becomes a way to share data rather than results and the reader is invited to tell her own story with it (I am drawing this idea from N. Katherine Hayles’s new book, How We Think). The data, and database, are an interface where textual exploration can happen rather than a static image of exploration someone else has already done.

These last two solutions require a somewhat radical rethinking of data presentation. Putting the visualization in as “(fig. 3)” on page 6 of the printed article is no longer going to cut it. Articles are very good at what they do, which is provide a forum in which to recreate traditional practice so that the reader can experience it along with the author. If we want our readers to experience our non-traditional readings along with us, we’re going to need non-traditional modes of delivery to do it.

Animation and Information

So I have found myself increasing drawn to the idea of these phrase nets. There’s something about the way they ask me to engage with the text on a decontextualized level that I love. Certain specific words make me wonder about how they are being used in the text and I try to remember when they might be deployed, but others just distract me and intrigue me.

Like the networks of body parts here. Perhaps if I animate it… (click image for animation)

Animated Phrase Net

I suppose my question here is “What is the difference between the animated version and the earlier ones?” Is there something more compelling about a dynamic visualization? On the simplest level, I find myself spending more time staring at things that move, but does that make this a “better” visualization if I don’t even know what I’m trying to convey with it? Then again, the reason I don’t know what it means is because I haven’t returned to the text yet to think about it. If visualizations are a tool for analysis as well as a form of…art, I suppose (My artistic skills leave what to be desired, but art nonetheless), then I need to think about their implications. Or perhaps I don’t. Perhaps my next read-through of Daniel Deronda will be more illuminating even if I’m not directly thinking about this network. Perhaps someone else will see a connection. Or perhaps this gif will just reman here, fading slowly into color.

Questions of usefulness bring me to my second point, which is an article I saw that I felt resonated with something Meaghan had brought up before. There are several people in the field of Digital Humanities, Stephen Ramsay comes to mind immediately, who are insisting that you cannot be a digital humanist if you do not know how to code. (Full disclosure–I can handle basic html and have once or twice actually uttered the phrase “Stand back, I know regular expressions“.) But aside from the fact that this more or less relegates people working with Facebook and Twitter and doing really interesting things with technology in the classroom to something else (and maybe they should have a different title, but they seem to be part of the club these days), I am bothered by this assertion and have spent some time trying to work out why…other than the fact that I find marginalization disturbing especially when aimed at me. Then a friend of mine posted the following article to his Facebook page and I got it.

Please Don’t Learn to Code

The author makes several good points, chief among them is that we don’t need any more (bad) code in the world and, I have to be honest, most of the software I’ve seen produced by those in the humanities has been just that. We don’t need “good enough” coding, we need excellent coding done by professionals who are willing to share and maintain and update their software so that we, as scholars, can have equally excellent results. Which is not to say that Digital Humanists shouldn’t know a bit about code or shouldn’t decide to make it their “skill” and become just as good as a professional. I have met amateurs in almost every field who can beat the pants off the professionals, but still only do what they do as a hobby. And if you’re that good, please go for it! But here’s my plea. If you’re just going to learn enough to hack something together to get you through a project, a clunky thing that needs you to coax it along and that can’t really be used with any reliability by your colleagues, then perhaps you should think about whether the discipline as a whole will benefit more from your code or from you teaming up with someone who really knows what they’re doing.

Attempting to Play

Over the course of the past week, I discovered something about myself. I am very bad at directionless, ludic interaction. I feel like Wendy Darling, having forgotten how to be young and play with Peter Pan. But there is it; I find it extraordinarily difficult to think about this text without some goal in mind, some way of imagining its usefulness (however broadly I defined that term). I had to believe I was dealing with the text in a potentially useful interpretive manner before I could think of anything to do to it. Once I got to dealing with the actual visualizations, however, I found “fun” came a bit more easily.

First, I take on the problem of the graph. The following graphs are the results of the Craig Zeta text analysis excel macro and what they show, briefly, is where sections of Daniel Deronda (coded by me as relating either to Deronda or Gwendolen) fall in relation to one another based on the words they use. The rest of this paragraph is skippable if you don’t actually care about the graph and want to skip to the visualizations. The X axis refers to the percentage of words in that section that are judged (by the macro itself) as being more relevant to Deronda, while the Y axis refers to the percentage of words in that section that are judged as being more relevant to Gwendolen. So a section that falls around .2 for Deronda and .05 for Gwendolen is probably about Deronda because almost 20% of the words it uses (and, I should note, it counts all the instances of words like “the” as one word) are Deronda words, what it calls marker words. Conveniently, it also provides a list of what those words it thought most relevant are. But we don’t care about the list. We care about the graphs. They follow and feel free to click for larger images.

First Graph

Startling ugly and kinda useless, isn’t she? It looks like slides I remember viewing from 9th grade biology.

Second Graph

Well, it looks less like the mating dance of the hairball and you can see where the individual sections fall (and notice the weird stuff going on in the middle…it’s not actually that weird, by the way, those are just chapters where both Gwendolen and Deronda are the viewpoint characters). But it’s very pale and still not pretty.

Third Graph

This is what is known as good-enough graphing. It’s still not pretty, but it’s legible and the colors don’t clash, so I’d say we’re moving up in the world. The larger data points also make it easier to see clustering in the places that they overlap. It’s easier to see the broader shapes made by the sections and, while the graph doesn’t say much overall, it finally provides a decent macro-view of the division of the book.

Usefulness: Craig Zeta’s charts are mostly useful because, when they separate, you (meaning I) get some validation for your hypothesis. I assumed there would be specific words that show up more frequently around Deronda than Gwendolen and I was right. (Anyone who knows this novel should be able to think of about four off the top of her head.) But CZ gives me a list of 200 words that it found distinctive between the two and those words are good starting points for further exploration into the text.

Also, the actual clustering of the text is interesting. Why does Gwendolen appear to cluster more tightly than Deronda? Who is encroaching further on whose territory?

So those are the graphs. As I said, the process is useful but the actual visualizations are just there so that you have something to show for the process unless you’re using it for its intended purpose (trying to figure out which of two authors wrote a disputed text–whichever author’s cluster the disputed text falls into is probably the author). But using things beyond their intended purposes is fun, and I am trying to have fun, after all.

Another possibly useful analysis I came across was the Phrase Net. The Phrase Net is available as part of IBM’s wonderful Many Eyes web visualization to(ol | y) and it works by taking the plain text you put in and creating a network of words connected to one another in that text. You can define the parameters for connection, though the default is two words connected by “and”. Other options are “of the,” “the,” “a” and simply a space. The following phrase nets were made using the word “and” and  I found certain elements in them interesting. Click on the links in the captions to go to Many Eyes and play with the originals.

So those are my phrase nets. They’re definitely an odd way of looking at a really large book, but I have to say what struck me the most about them was certain repeated usages of body parts. so let’s look at those sections more closely.

Phrase Nets with only repeated words in color

But, of course, the other interesting way to look at the groups is to look at words that appear in both, words which are similar in both and words that are Gwendolen or Deronda specific…at least in their usage involving the word “and”.

Phrase Net with Coloring

Words in red appear in both, words in purple are similar and words in blue only appear in one. I personally find the lips versus mouth dichotomy to be kinda cool.

So what have I learned so far from this project? Well, I learned that playfulness comes in plenty of forms and that having an original goal can actually be conducive to playing around with the results. I also learned that I can spend hours on Photoshop for no good reason. And I think…I hope…I’m getting a broader sense of how visualizations can change the way I think about a text…but more on that later.

In the spirit of ludic interaction, I offer one more visualization of our project.