Monday, March 27, 2017

Multiplying Knowledge

Now that THATcamp has passed, it's time to get back to my articles!
If you click this link to my syllabus, you'll see that I'm reaching the end of the track I laid out at the beginning of the semester. As I thought might happen, this independent study has taken me far beyond where I expected, and introduced me to people and resources I didn't know about at the start of the semester. At THATcampDC I learned about several resources that I may spend the end of the semester exploring. I think it would be very interesting to end this semester by exploring some of the tools and methodologies I've learned about. If you're reading this and have any ideas for readings that would be beneficial for me to check out, please drop me a line in the comments!

This week, my reading selection is "'Knowledge Will Be Multiplied': Digital Literary Studies and Early Modern Literature" by Matthew Steggle.

In his chapter, Steggle defends the interpretation of data gathered though the use of DH methodologies. As I've summarized in previous posts, many scholars are wary of using digital tools in the English classroom, and this should not be the case. At one point, it may have been argued that the task of obtaining electronic copies of literature was hard and inaccessible, but at this given point in time, websites such as Project Gutenberg exist and allow scholars to access a huge amount of texts online. With some of the challenge gone, isn't it worth looking into the knowledge that could be mined through a new methodology?

DH could also bring a new kind of student into the English department. He cites a quote from Risa Bear, "I became interested in producing texts for internet distribution as an alternative to writing term papers." If we can keep students interested in literature and allow them to venture into new disciplines within the field, the study of English literature will only grow in strength.

It's also worth nothing that DH is a community effort. In the case of electronic literature, scholars depend on one another to type up and format entire books, so that they can input the typed file into a program for their own purposes. Steggle notes the Bear's transcription of The Faerie Queen, completed in 1995, as being one of many huge additions to the transcribed canon. Much of the academic world appears to be "every man for himself," and perhaps things don't have to be reduced to that. Perhaps, DH can unify people and help us to work together to meet our goals. I think it's worth a shot.

The Internet Shakespeare Experiment (ISE) is a prime example of the good intentions that can lead to books being made available online. Leader Michael Best defined the goal of the creation as follows:
to create a website with the aim of making scholarly, fully annotated texts of Shake-speare's plays freely available in a form native to the medium of the internet. A further mission was to make educational materials on Shakespeare available to teachers and students: using the global reach of the internet, I wanted to make every attempt to make my passion for Shakespeare contagious.
Another example of the good that can arise from this movement is the Interactive Shakespeare Experiment, which contains hotlinked annotations which appear in another screen. The reader has the choice to click on these links as they appear, in order to read notes of criticism on the text.

The people who work on DH projects, especially transcribing texts and composing lists, work a selfless and labor-intensive job which deserves to be recognized and hailed for the treasure that it is. In ensuing that documents are available online to the average scholar, they have opened up the academic world.

Toward the end of his article, Steggle speculates that blogs may be the next jump in the academic community. Perhaps this is a bit meta of me, but I think he may be onto something. Looking at the unexpected trajectory that academia has fallen into, perhaps it's true that blogs may one day be used as tools, or mined to tell the future about the past. After the developments we've seen in academia since the dawn of DH, I wouldn't be surprised. All in all, the title of this chapter is perfect- "Knowledge Will Be Multiplied." It certainly appears that this is the case!


This past weekend, I had the privilege of attending THATcampDC at George Washington University in Washington, D.C. The weather was beautiful, the cherry blossoms were blooming, the conversations were stimulating, and I am so grateful I had the ability to meet so many interesting people who also were passionate about DH!

After meeting up in the morning and deciding on a schedule of sessions for the day, based on what people were interested in discussing, the ~75 attendees went off to pursue our respective fields of interest. Excluding an hour set aside for "Dork Shorts," in which attendees were given 2.5 minutes to talk about their current passion projects, the day was broken into 4 hour-long sessions. Based on my interests, I attended sessions on using in academic settings, DH training and support for librarians, as well as a much-needed session on DH tool sharing.

What impressed me more than anything about this day was how people were open to discussion and willing to share their knowledge. As a total newcomer to DH, I walked in knowing the bare minimum but wanting, desperately, to learn. In the morning idea generation session, I expressed the desire to learn about distance reading in particular and, although a session wasn't formed, two kind individuals offered to help me out and I was able to make valuable connections.

I left GWU with a list of tools to check out, as well as the email addresses of a few people who seem absolutely wonderful. It was a fantastic experience, and I would be excited to attend another THATcamp in the future!

Monday, March 20, 2017

Digital Humanities: "Tools to Think With"

This week I'll be turning back to look at what can be achieved through use of Digital Humanities methodologies, and the first article I'll be reading is "Stylistic Analysis and Authorship Studies" by Hugh Craig. 

The concept of using technology to identify stylistic patterns in a corpus is both fascinating and unheard of, considering that such a feat would take considerable work without computer programs in a time when "with no way of assembling and manipulating counts of word-variables, researchers in earlier periods assumed that very common words varied in frequency only in trivial ways among texts." Now, however, a computer can be programmed to detect intricacies that go unnoticed by human beings, DH methodologies introduce a whole new world of possibilities to linguistics studies. To this note, Craig defines computational stylistics as follows:
Computational Stylistics aims to find patterns in language that are linked to the processes of writing and reading, and thus to "style" in the wider sense, but are not demonstrable without computational methods.
In other classes in the Kean English and Writing Studies M.A. program, we have had extensive discussions on the tragedy that occurs when writing is limited to the English classroom, as well as the misfortune English departments suffer as being compartmentalized into "literature people." This chapter offers the perfect counter to this issue as stylistics. In the same breath that Craig explains that stylistics can be monitored to study associations in Shakespeare plays, the field is also important in courtrooms for the purpose of deciphering. Who says that English majors sit around analyzing literature all day?

That being said, I'm an English literature geek and analyzing literature is important to me. Craig exemplifies one potential use of stylistics by analyzing Shakespeare's plays in order to study association and differences, based on dialogue. I found this study fascinating, and I'm excited to learn more about DH methodologies because this is exactly how I'd like to tie the tools into my own work. I don't know if I'm communicating how cool I think this is--- this is SO COOL. Shakepeare lived roughly 450 years ago. His works have been analyzed a million different ways and it's only now, in our lifetime, that we have the technology to come up with entirely new scholarship. And this isn't just about Shakespeare-- as I mentioned earlier, we can use the computer to make so many things happen. This might just revolutionize the English department.

As is the case with everything, there is always room for error, especially considering that all of this work still comes down to human interpretation. Craig allows the consideration that "at best, [this methodology provides] a powerful new line of evidence in long-contested questions of style; at worst, an elaborate display of meaningless patterning, and an awkward mismatch between words and numbers and the aesthetic and the statistical." Something worth considering, but worthy of keeping us from trying nonetheless!

This all being said, every point has a counterpoint, so enter the challenger: Stanley Fish. Ah, Stanley Fish. Is it an English department discussion without Fish? I'd argue not. Craig mentions Fish in conjunction with a discussion on the challenges against linguistics. Scholars such as Fish see a fundamental flaw in considering only stylistics when analyzing an article, as he feels that this divorces meaning from the text:
[Fish believes that] when abstracted from this setting [stylistics] refer to nothing but themselves and so any further analysis of patterns within their use, or comparison with the use of others or in other texts, or relating of them to meaning, is entirely pointless. Fish does not deny that there may be formal features which can enable categorical classification such as authorship (and so he is less hostile to authorship studies using computational stylistics), but he insists that these features cannot offer any information of interest about the text – just as a fingerprint may efficiently identify an individual but reveal nothing about his or her personality.
Craig respectfully nods to Fish's genius as a humanist and scholar, and recommends Fish as crucial reading for anyone interested in stylistics, so that the young scholar is made aware of potential pitfalls along the way of research. However, he also notes that stylistics is not necessarily anti-humanist. Stylistic clues can reveal interesting facts about a text, but it's important that the researcher doesn't get caught up in the pieces of the puzzle and forget the big picture.

In general, stylistics can reveal trends and ideas worthy of study. Craig points out that "it's methods allow general conclusions about the relationship between variables," and this isn't something that should be dismissed easily. Likewise, stylistics shouldn't be thought of a merely quantitative: as Craig argues throughout his chapter, qualitative research can be derived from the numbers. I liked how he explained the field in this particular paragraph:
There is a strong instinct in human beings to reduce complexity and to simplify: this is a survival mechanism. Rules of thumb can save a great deal of time and effort. Stylistics is born of this instinct. What seems hard to explain in the individual case may be easier to understand when it is seen in a larger context. If these elements can be put in tabular form, then one can harness the power of statistics and of numerous visualizing and schematizing tools to help in the process of finding patterns.
Craig splits off in the latter part of the chapter to discuss humanities computing in relation to the question of authorship, and makes note of Shakespeare, Homer, and the Bible, to make a point of the things that may be clarified through computing. Any English major, whether they agree or not, can tell you about the debate surrounding Shakespeare, which questions if he wrote every book that is attributed to his name. Although the methodology is not infallible, humanities computing may provide answers, or at least new ways of studying the question. Looking into such questions through a new lens in fascinating, and who knows the ideas that may be derived from the new tools we have at hand!


The second chapter I've chosen for this week is "Print Scholarship and Digital Resources" by Claire Warwick. In this article, Warwick gives the best term I've yet found to describe the DH methodologies: "tools to think with." (I liked it enough to make it the title of this post!)

DH methodologies certainly are tools, and it's for this reason that scholars should not be afraid. One of the fears that permeates the literary world is the fear that the DH are going to knock out the old methods of study, and Warwick purports that this is not, and will never be, the case. To support her case, she refers back to the 1990s when people feared that the book was in danger of going extinct. I myself remember in the mid-2000s when e-books became popular and people, myself included, fretted that this was the end of paper books. In 2017, we see that this is not the case. The world is changing, chain booksellers are struggling, but the paperback is not going anywhere. In the same way, Warwick argues, we should not worry about the traditional methodologies.

The computer is a tool, and should be used accordingly but, sadly, many critics refuse to try. In a particularly interesting paragraph, Warwick documents how scholars are presented with the wonders that technology can produce, but these wonders aren't personal, and she suggests that this may be due more to disinterest and misunderstanding, than to fear. Rather than "apparent conservatism," it may be that "Users have been introduced to all sorts of interesting things that can be done with computer analysis or electronic resources, but very few of them have been asked what it is that they do, and want to keep doing, which is to study texts by reading them." If reading is still important, the technology should serve the reader, just as it serves the programmer and the coder; "computers are of most use when they complement what humans can do best." Computers cannot do everything, and programs are limited. As Warwick points out, a computer cannot recognize figurative use of language, and different people may interpret figurative language differently. This is where human interaction with data becomes of utmost importance.

Throughout the rest of Warwick's article, she continues to defend both the DH, as well as the importance of human interpretation, and makes a compelling case for marrying the two together to successfully move English departments into the future, that is, if scholars are brave enough to take the plunge.

Friday, March 17, 2017

Text Markup and THATcamps

As I mentioned in last week's blog post, text markup is incredibly complicated and incredibly technical. Having a very basic understanding of coding, I understand the theory behind the work that goes into building markup tools, however, for my purposes, I believe that examples of practical application would be most helpful.

I found myself running into a wall in the past week as I attempted to figure out the best way to learn more about TEI and text markup. I'm excited about these methodologies and I've already considered using them in my thesis next semester. However, understanding theory will only go so far-- I need practice. There are several articles that teach theory but, when it comes to learning the programs that implement it, the field is pretty DIY. That all being said, to take a step toward solving my problem I've registered for a THATcamp taking place in Washington D.C. on the weekend of March 25th, and I'm very excited to meet others in the field. It'll be great to meet others interested in DH, and I'll certainly blog my experience afterwards!

This blog is going to be two parts this week. Because I'm piggybacking off of my last two blogs, the reading material I have for this post is one main article and two resources, which have been helpful to me in understanding the building blocks of text markup.

First up, the article! My main reading for this week was “The Text Encoding Initiative and the Study of Literature” by James Cummings.

Cummings introduces his article with a brief history of the Text Encoding Initiative (TEI) by introducing some of the guidelines and sponsors that together make up the initiative. He states the chapter's thesis as follows:
This chapter will examine some of the history and theoretical and methodological assumptions embodied in the text-encoding framework recommended by the TEI. It is it intended to be a general introduction...nor is it exhaustive in its consideration of issues...This chapter includes a sampling of some of the history, a few of the issues, and some of the methodological assumptions...that the TEI makes.
 It is still fascinating to me that TEI is such a young endeavor. According to Cummings, it was formed at a conference at Vassar College in 1987, and very few of the principles established at that time have changed. This is exciting because the field is new and accessible-- the people who dive in are free to determine how the tools are used.

I've chosen this article because I feel that it's important to not only have a grasp of the technologies, but also to understand the history. The article includes technical language relating to different markup languages, SGML (Standard Generalized Markup Language) and XML (Extensible Markup Language), explains the history of these languages, and describes how they are used. I was interested by Cumming's explanation of the transition from GML (Generalized Markup Language), a noted "milestone system based on character flagging, enabling basic structual markup of electronic documents for display and printing," to SGML, which was "originally intended for the sharing of documents throughout large organizations." As time went on, SGML was not universal enough and XML was adopted and is still used, because of it's flexible nature.
XML has been increasingly popular as a temporary storage format for web-based user interfaces. Its all-pervasive applicability has meant not only that there are numerous tools which read, write, transform, or otherwise manipulate XML as an application-, operating-system- and hardware-independent format, but also that there is as much training and support available.
Throughout the article Cummings highlights key points and goal of TEI. The design goals section examined the standards set for TEI to be as straightforward and accessible as possible for anyone interested in learning the text encoding methodology. He examines the community-centric nature of TEI, and the emphasis on keeping the field open and collaborative. I'm excited to be coming into the academic world at this time because although the DH field has a distinct technological learning curve, I'd rather face the curve in a community setting, rather than the traditional closed off world of academic hazing.

Cummings also discusses the user-centic nature of the TEI. Due to the community-based nature of the field, it must deliver what users of all different disciplines need. This can be a challenge, but it also exemplifies the versatile nature of the beast. As I have explained, I'm interested in using text markup and the TEI in order to see what it can uncover about texts that have been close-read to death. In the field of literature, we all know close reading, we all know how to compare elements of book. I want to take this to the next level- I want to see what technology can show me, and I want to learn how to use the programs.

Cummings explains that the TEI may have been influenced by New Criticism, a school of literary criticism with which I am quite familiar, and Cummings purports that the TEI, instead of reacting against this structuralism, as many poststructuralists might desire, in fact is compatible with New Criticism, as "the TEI's assumptions of markup theory as basically structuralist in nature as it pairs record (the text) with interpretation (markup and metadata)." This is something I would like to delve into further, because I can understand both sides of the New Criticism comparison argument.

I highly suggest reading this article, as Cummings successfully accomplishes his proposed thesis statement. I came away with feeling as if I learned the key points in the history of the TEI, without being drowned in technical conversation. I am increasingly interested in learning to code, as I am amazed by the things we can achieve with computer programs.

If you are interested in the technical side, Stanford University Digital Humanities department website includes many helpful resources, particularly "Metadata and Text Markup," which further explains buzzwords and phrases in the field, and "Content Based Analysis," which explains more about text content mining.

Additionally, the TEI website has several helpful links that may take one down many rabbit holes. I got stuck for a long while going through project examples which use TEI encoding.