Digital Wonderland and a Brave New World

Monday, April 24, 2017

Voyant

After navigating Gephi last week, this week's Digital Humanities Tool of the Week (DGTW?) is Voyant! Whereas Gephi is a more complicated, more intensive program, Voyant is readily accessible, and the perfect tool for those who want an easier introduction into the world of DH tools. Voyant is new to me as well, so I'm going to play around with it a bit, and share my results.

You might know Voyant from seeing word clouds around the internet. These can be made with the Voyant program, by inputting data and adjusting settings to see the words Voyant spits back. For example, in a work where the word "cat" is input a total of 200 times, and the word "silly" is input 150 times, "cat" will be the biggest word in the cloud. This sounds silly, but isn't simplification the best?

As I've covered in readings throughout the semester, DH is complicated. There's a lot of coding and numbers, and data involved, and this kind of work isn't suited to everyone's skill set. Sometimes, it's helpful to have a preexisting program in which to input information, and use the results. Voyant fits this need.

Here are a few examples of word clouds, from this list I found on Buzzfeed.com:

Isn't interesting to see what the frequency of words in a text can reveal about the text?

For this exercise, in keeping with my dystopian flare, I've chosen to put Brave New World by Aldous Huxley through Voyant, to see the kind of word cloud that emerges. Here is the link to the full text document of the book-- gotta love public domain!

This first image is a screengrab of everything that shows up on the screen when the corpus loads. Each individual box shows a different way of graphing the data in the story:

So, there's a lot to unpack here. First and foremost, the word cloud:

You can do a lot to edit the word cloud, such as expanding it to include more words, reformatting it to take a different shape, and edit the font and colors of the words.

There are a lot of other visualizations that can be applied to the data set, another option that jumped out at me was "Bubbles":

"Bubbles" took a while to sort through the 8000 words of Brave New World that were input into the program, so it took a while to work, but it was interesting to see the results. Here's a screengrab of the program running:

Here's another example of something you can do with Voyant. This tool is simply called "Link," and for this example I used a pre-existing corpus within the Voyant program-- a selection of eight Jane Austen novels. I thought that this corpus would be the perfect way to show how the "Link" tool works with a wider selection of works. Because screengrabbing capabilities are limited, let me explain, when you place your cursor on one of the words, the pathways "link" to other words that are connected. For example, the word "Mr." links to "said," "Mrs," "Knightly," "Weston," "Darcy," "Elton," and "Crawford," within the parameters of this data set.

For a fairly user-friendly program, there is so much that can be done with Voyant. If, like me, you're just getting into DH, I highly recommend playing around with this program. It's a user-friendly program and it shows you some of the cool things that can be done with a DH tool, without the complicated addition of coding. Whether this is the extent of your travels in DH, or just a stepping stone to learn more, it's worth of perusing.

Sunday, April 23, 2017

DiRT Directory & Gephi

I believe I mentioned this in my introductory comments at the beginning of the semester, but I am ~~obsessed with~~ interested in dystopian literature, and plan on writing my making it the topic of my thesis.

My interest in this field of literature is two-fold. First of all, dystopian worlds are particularly fascinating because they are the manifestation of people's fears of the unknown future-- usually this unknown future is filled with government control and thought-policing. These fears become all the more frightening when people start to recognize the doom of Orwell's 1984 encroaching on our own society. The second reason I am drawn to dystopian literature is because I have a great love of children's literature, and the dystopian genre has taken off in young adult and children's literature. It is interesting to me that children have always been a part of dystopian stories (for example, in 1984, children turn in their parents for wrong-think), but now they are becoming the main characters.

This is where DH comes in. I'm interested in finding programs that will help me pinpoint references to children and the theme of childhood in dystopian novels. To do this, I will need a program that takes the text I put into it, and spits out visualizations. This is where DiRT Directory comes in.

DiRT Directory is something I learned about at THATcampDC, and it has changed the course of my research. DiRT means "Digital Research Tools," and this website serves as a collection of resources that are organized by category. Each entry on the site has an about page where a synopsis of the tool is given, as well as the link to download the tool. Here's an example.

First, on the home page, you must decide the kind of tool you need for your work:

For my purposes, I looked up Visualization. This next image shows the further options that appear when a category is selected. For "Platform," I chose "Windows." For "Cost," I chose "Free."

Here are some of the programs that are listed in the results for "Visualization+Windows+Free." There are many more than are pictured, this is just a sampling.

Gephi and Weave stuck out to me as potentially helpful to my work, and so I clicked them both. Here's what the description pages look like:

DiRT Directory is an excellent resource because it offers links a multitude of programs that can be used in a multitude of ways. I decided to download Gephi because it seemed like it would be helpful, and I had heard the name tossed around at THATcampDC. With the help of DiRT Directory, I was able to pinpoint a resource which, prior to this point, had been a challenge.

Although Gephi looks daunting, I was able to harness the power of the internet to find

tutorials and examples of how to best harness the program's power. Gephi's website is pretty straight forward in explaining the goals and usages of the program, and served as a helpful jumping-off point.

Here is a fantastic step-by-step tutorial that I found, which imports the text of Les Miserables in order to analyze connections between characters. This link was particularly helpful to me because, although Gephi can be used to visualize all kinds of data, this is the kind of data I will be working with.

Here is a slightly more complex tutorial which includes information about the coding behind the program.

This next link is also a tutorial, but I am including it to show the kinds of visualizations that can be achieved by Gephi.

"Visualizing Historical Networks" is a group of projects hosted by the Center for History and Economics at Harvard University, which utilized Gephi to "map the way people in the past interacted with each other and their surroundings." I encourage you to peruse the site, the work is fascinating!

Gephi appears to be an incredibly helpful tool, and I'm excited to play around with it in my own research!

Monday, April 17, 2017

"That" Point in the Semester: Regrouping

Friends, it has reached that point in the semester where struggle is setting in hard. There are 24 days until the end of the semester (not that I'm wearily counting), and I've started to struggle at this point in my independent study. Excuses aside, here's the long and short-- I missed my blog post last week and it's time to regroup.

During the first half of the semester, I extensively studied the reasons behind the question "Why DH?" and I think I answered them pretty conclusively. As the second half of the semester rolled around, particularly after the amazing THATcampDC, I started seeking out methodologies that I could employ in my own work. Thankfully, I have a list of helpful tools and resources that should be helpful in the next step-- and it's time to take that next step.

I missed my blog last week because I was desperately trying to reformat these last few weeks of the semester, so to get the most out of them in terms of DH practice. I'm going to be using some of the programs I've learned about in my thesis work next year, and I think it will be cool to play around with some of them at the end of this semester.

Websites such as DiRT Directory and The Programming Historian have proven to house invaluable resources for learning the kinds of tools that would be good for my interests. I was also recommended to look at tools such as Voyant, Open Refine, and Gephi. Now that I have a new focus, I'm excited to finish off this semester strong!

In conclusion, this is the first of two blog post this week. It's time to put the theory where the work is (Is that a saying? I guess it is now.) and put some practice into this Intro to DH course. Additionally, Hailey Carone (another graduate student here at Kean) and I will be presenting our studies and practice in the digital humanities to some of our peers in the Writing Studies M.A. program on May 1st, which will be a fun culmination of the semester.

See you guys in a few days!

Monday, April 3, 2017

Virtual Stacks?

Source

Isn't that a cool picture? I love the digital, technological representation of the library in this picture, books made of tiny pixels, sharp and glowing as the rows lead down the hallway and into the light. In a way, this is quite an apt concept of the library throughout time. As the student wanders through the rows, their knowledge increases and they step into the light of learning. Isn't this all the more accurate with the added bonus of the virtual library?

This week's chapter of A Companion to Digital Literary Studies is Chapter 29, "The Virtual Library" by G. Sayeed Choudbury and David Seaman. Choudbury and Seaman highlight the vast amount of resources that we now have at our disposal, thanks the the internet, so that anyone interested in learning is welcome to learn. Libraries now have access to the digital copies of books that they might have only dreamed of having access to in the past. We have scans of magazines at periodicals at our fingertips that date back long before the internet was a far-off dream. A student in a tiny school in Western, P.A. can have access to a multitude of works that might totally shape their field of interest, or bring them into a new world. Stepping away from academia, children around the world need only find a computer and they can learn anything-- we are no longer bound by the physical.

This also ties in nicely to my readings on digital humanities because the people who make these books available are digital humanists. The books available online were once painstakingly typed, letter by letter, by a person who wanted to catalog them online for future learners. That is an incredibly selfless, and incredibly tedious, task to take on.

So, why do it? Why bother?

We bother because books are important and the internet is where the future is going. If we want history recorded and accessible, if we want to make the use of the tools we have, it's our responsibility to keep the virtual library in existence as a living, breathing organism.

This chapter was written 9 years ago, and it's interesting to see how much things have changed in 9 years. Choudbury and Seaman speak about the prevalence of online journals, and they have grown even more prevalent as the years have gone on. Now, if a scholar wants a relevant article written, they publish articles, and they certainly make their books available online. Much as we all love (and should preserve) traditional libraries, the traditional building simply cannot hold the vast amount of knowledge that is being produced every day by scholars with the world at their fingertips.

Freely available collections are another novel idea to the virtual library. Although copyright laws complicate matters, many works have become legally available online. Within a few clicks, anyone, anywhere in the world, can have any information that they need. Even if the book is within copyright and cannot be obtained online, we now have applications like Google Books which allows one to search, locate, and obtain the book in a matter of minutes.

The library tends to keep up with such developments and is a natural and willing partner with the humanities departments as they explore the possibilities such tools have for data mining and the display of results. Add to these software packages the blogs, wikis, and virtual communities that are being adopted, the digital tools for collaborative scholarship, for innovative ways of interrogating text, and for new teaching possibilities, and it is not difficult to see increasing potential for transformative change in the way that literary scholars research, publish, and teach.

Although I am fully in support of the idea of "the library as laboratory," I have learned that there a specific way in which this should be achieved. In sitting in a session during THATcampDC with a group of librarians, I came to understand that just as the digital world is changing, the perception of the librarians role must change as well. Librarians must be equipped with the skill set and help that they need in order to move the library along with the times. Enough librarians must be hired to help in all aspects of the virtual library, and this requires a restructuring of skills and strength.

It is no longer acceptable for the humanities to be considered "data poor," but it's up to the humanists to change this perception. As Choudbury and Seaman note, "the humanities are rich with content that is difficult to extract into digital form." This becomes all the more notable if you consider that the humanist's "data" has been uncharted for all of history. We have generations of data to work with and the sooner we start, the better.

As for literary studies, the "traditional" form of publishing — the monograph — influenced the way in which research has been conducted and conveyed. With new avenues for publishing, it is possible, even probable, that humanists will begin to explore new forms of research and dissemination.

The natural flow of life is going digital. Shouldn't we follow in it's path?

Monday, March 27, 2017

Multiplying Knowledge

Now that THATcamp has passed, it's time to get back to my articles!

If you click this link to my syllabus, you'll see that I'm reaching the end of the track I laid out at the beginning of the semester. As I thought might happen, this independent study has taken me far beyond where I expected, and introduced me to people and resources I didn't know about at the start of the semester. At THATcampDC I learned about several resources that I may spend the end of the semester exploring. I think it would be very interesting to end this semester by exploring some of the tools and methodologies I've learned about. If you're reading this and have any ideas for readings that would be beneficial for me to check out, please drop me a line in the comments!

This week, my reading selection is "'Knowledge Will Be Multiplied': Digital Literary Studies and Early Modern Literature" by Matthew Steggle.

In his chapter, Steggle defends the interpretation of data gathered though the use of DH methodologies. As I've summarized in previous posts, many scholars are wary of using digital tools in the English classroom, and this should not be the case. At one point, it may have been argued that the task of obtaining electronic copies of literature was hard and inaccessible, but at this given point in time, websites such as Project Gutenberg exist and allow scholars to access a huge amount of texts online. With some of the challenge gone, isn't it worth looking into the knowledge that could be mined through a new methodology?

DH could also bring a new kind of student into the English department. He cites a quote from Risa Bear, "I became interested in producing texts for internet distribution as an alternative to writing term papers." If we can keep students interested in literature and allow them to venture into new disciplines within the field, the study of English literature will only grow in strength.

It's also worth nothing that DH is a community effort. In the case of electronic literature, scholars depend on one another to type up and format entire books, so that they can input the typed file into a program for their own purposes. Steggle notes the Bear's transcription of The Faerie Queen, completed in 1995, as being one of many huge additions to the transcribed canon. Much of the academic world appears to be "every man for himself," and perhaps things don't have to be reduced to that. Perhaps, DH can unify people and help us to work together to meet our goals. I think it's worth a shot.

The Internet Shakespeare Experiment (ISE) is a prime example of the good intentions that can lead to books being made available online. Leader Michael Best defined the goal of the creation as follows:

to create a website with the aim of making scholarly, fully annotated texts of Shake-speare's plays freely available in a form native to the medium of the internet. A further mission was to make educational materials on Shakespeare available to teachers and students: using the global reach of the internet, I wanted to make every attempt to make my passion for Shakespeare contagious.

Another example of the good that can arise from this movement is the Interactive Shakespeare Experiment, which contains hotlinked annotations which appear in another screen. The reader has the choice to click on these links as they appear, in order to read notes of criticism on the text.

The people who work on DH projects, especially transcribing texts and composing lists, work a selfless and labor-intensive job which deserves to be recognized and hailed for the treasure that it is. In ensuing that documents are available online to the average scholar, they have opened up the academic world.

Toward the end of his article, Steggle speculates that blogs may be the next jump in the academic community. Perhaps this is a bit meta of me, but I think he may be onto something. Looking at the unexpected trajectory that academia has fallen into, perhaps it's true that blogs may one day be used as tools, or mined to tell the future about the past. After the developments we've seen in academia since the dawn of DH, I wouldn't be surprised. All in all, the title of this chapter is perfect- "Knowledge Will Be Multiplied." It certainly appears that this is the case!

THATcampDC

This past weekend, I had the privilege of attending THATcampDC at George Washington University in Washington, D.C. The weather was beautiful, the cherry blossoms were blooming, the conversations were stimulating, and I am so grateful I had the ability to meet so many interesting people who also were passionate about DH!

After meeting up in the morning and deciding on a schedule of sessions for the day, based on what people were interested in discussing, the ~75 attendees went off to pursue our respective fields of interest. Excluding an hour set aside for "Dork Shorts," in which attendees were given 2.5 minutes to talk about their current passion projects, the day was broken into 4 hour-long sessions. Based on my interests, I attended sessions on using Wordpress.com in academic settings, DH training and support for librarians, as well as a much-needed session on DH tool sharing.

What impressed me more than anything about this day was how people were open to discussion and willing to share their knowledge. As a total newcomer to DH, I walked in knowing the bare minimum but wanting, desperately, to learn. In the morning idea generation session, I expressed the desire to learn about distance reading in particular and, although a session wasn't formed, two kind individuals offered to help me out and I was able to make valuable connections.

I left GWU with a list of tools to check out, as well as the email addresses of a few people who seem absolutely wonderful. It was a fantastic experience, and I would be excited to attend another THATcamp in the future!

Monday, March 20, 2017

Digital Humanities: "Tools to Think With"

This week I'll be turning back to look at what can be achieved through use of Digital Humanities methodologies, and the first article I'll be reading is "Stylistic Analysis and Authorship Studies" by Hugh Craig.

The concept of using technology to identify stylistic patterns in a corpus is both fascinating and unheard of, considering that such a feat would take considerable work without computer programs in a time when "with no way of assembling and manipulating counts of word-variables, researchers in earlier periods assumed that very common words varied in frequency only in trivial ways among texts." Now, however, a computer can be programmed to detect intricacies that go unnoticed by human beings, DH methodologies introduce a whole new world of possibilities to linguistics studies. To this note, Craig defines computational stylistics as follows:

Computational Stylistics aims to find patterns in language that are linked to the processes of writing and reading, and thus to "style" in the wider sense, but are not demonstrable without computational methods.

In other classes in the Kean English and Writing Studies M.A. program, we have had extensive discussions on the tragedy that occurs when writing is limited to the English classroom, as well as the misfortune English departments suffer as being compartmentalized into "literature people." This chapter offers the perfect counter to this issue as stylistics. In the same breath that Craig explains that stylistics can be monitored to study associations in Shakespeare plays, the field is also important in courtrooms for the purpose of deciphering. Who says that English majors sit around analyzing literature all day?

That being said, I'm an English literature geek and analyzing literature is important to me. Craig exemplifies one potential use of stylistics by analyzing Shakespeare's plays in order to study association and differences, based on dialogue. I found this study fascinating, and I'm excited to learn more about DH methodologies because this is exactly how I'd like to tie the tools into my own work. I don't know if I'm communicating how cool I think this is--- this is SO COOL. Shakepeare lived roughly 450 years ago. His works have been analyzed a million different ways and it's only now, in our lifetime, that we have the technology to come up with entirely new scholarship. And this isn't just about Shakespeare-- as I mentioned earlier, we can use the computer to make so many things happen. This might just revolutionize the English department.

As is the case with everything, there is always room for error, especially considering that all of this work still comes down to human interpretation. Craig allows the consideration that "at best, [this methodology provides] a powerful new line of evidence in long-contested questions of style; at worst, an elaborate display of meaningless patterning, and an awkward mismatch between words and numbers and the aesthetic and the statistical." Something worth considering, but worthy of keeping us from trying nonetheless!

This all being said, every point has a counterpoint, so enter the challenger: Stanley Fish. Ah, Stanley Fish. Is it an English department discussion without Fish? I'd argue not. Craig mentions Fish in conjunction with a discussion on the challenges against linguistics. Scholars such as Fish see a fundamental flaw in considering only stylistics when analyzing an article, as he feels that this divorces meaning from the text:

[Fish believes that] when abstracted from this setting [stylistics] refer to nothing but themselves and so any further analysis of patterns within their use, or comparison with the use of others or in other texts, or relating of them to meaning, is entirely pointless. Fish does not deny that there may be formal features which can enable categorical classification such as authorship (and so he is less hostile to authorship studies using computational stylistics), but he insists that these features cannot offer any information of interest about the text – just as a fingerprint may efficiently identify an individual but reveal nothing about his or her personality.

Craig respectfully nods to Fish's genius as a humanist and scholar, and recommends Fish as crucial reading for anyone interested in stylistics, so that the young scholar is made aware of potential pitfalls along the way of research. However, he also notes that stylistics is not necessarily anti-humanist. Stylistic clues can reveal interesting facts about a text, but it's important that the researcher doesn't get caught up in the pieces of the puzzle and forget the big picture.

In general, stylistics can reveal trends and ideas worthy of study. Craig points out that "it's methods allow general conclusions about the relationship between variables," and this isn't something that should be dismissed easily. Likewise, stylistics shouldn't be thought of a merely quantitative: as Craig argues throughout his chapter, qualitative research can be derived from the numbers. I liked how he explained the field in this particular paragraph:

There is a strong instinct in human beings to reduce complexity and to simplify: this is a survival mechanism. Rules of thumb can save a great deal of time and effort. Stylistics is born of this instinct. What seems hard to explain in the individual case may be easier to understand when it is seen in a larger context. If these elements can be put in tabular form, then one can harness the power of statistics and of numerous visualizing and schematizing tools to help in the process of finding patterns.

Craig splits off in the latter part of the chapter to discuss humanities computing in relation to the question of authorship, and makes note of Shakespeare, Homer, and the Bible, to make a point of the things that may be clarified through computing. Any English major, whether they agree or not, can tell you about the debate surrounding Shakespeare, which questions if he wrote every book that is attributed to his name. Although the methodology is not infallible, humanities computing may provide answers, or at least new ways of studying the question. Looking into such questions through a new lens in fascinating, and who knows the ideas that may be derived from the new tools we have at hand!

--

The second chapter I've chosen for this week is "Print Scholarship and Digital Resources" by Claire Warwick. In this article, Warwick gives the best term I've yet found to describe the DH methodologies: "tools to think with." (I liked it enough to make it the title of this post!)

DH methodologies certainly are tools, and it's for this reason that scholars should not be afraid. One of the fears that permeates the literary world is the fear that the DH are going to knock out the old methods of study, and Warwick purports that this is not, and will never be, the case. To support her case, she refers back to the 1990s when people feared that the book was in danger of going extinct. I myself remember in the mid-2000s when e-books became popular and people, myself included, fretted that this was the end of paper books. In 2017, we see that this is not the case. The world is changing, chain booksellers are struggling, but the paperback is not going anywhere. In the same way, Warwick argues, we should not worry about the traditional methodologies.

The computer is a tool, and should be used accordingly but, sadly, many critics refuse to try. In a particularly interesting paragraph, Warwick documents how scholars are presented with the wonders that technology can produce, but these wonders aren't personal, and she suggests that this may be due more to disinterest and misunderstanding, than to fear. Rather than "apparent conservatism," it may be that "Users have been introduced to all sorts of interesting things that can be done with computer analysis or electronic resources, but very few of them have been asked what it is that they do, and want to keep doing, which is to study texts by reading them." If reading is still important, the technology should serve the reader, just as it serves the programmer and the coder; "computers are of most use when they complement what humans can do best." Computers cannot do everything, and programs are limited. As Warwick points out, a computer cannot recognize figurative use of language, and different people may interpret figurative language differently. This is where human interaction with data becomes of utmost importance.

Throughout the rest of Warwick's article, she continues to defend both the DH, as well as the importance of human interpretation, and makes a compelling case for marrying the two together to successfully move English departments into the future, that is, if scholars are brave enough to take the plunge.

Pages