Tuesday, February 21, 2017

Data and Dimension- Pt. 2

Welcome back! If by some odd chance you've ended up here first, please click here to see part 1 of this blog post.


The second text that I'll be exploring this week is "Marking Texts of Many Dimensions" by Jerome McGann. Within the first few paragraphs of this article, I'm already lost in the meta nature of the text:
Consider the phrase "marked text"...How many recognize it as a redundancy? All text is marked text, as you may see by reflecting on the very text you are now reading. As you follow this conceptual exposition, watch the physical embodiments that shape the ideas and the process of thought. Do you see the typeface, do you recognize it? Does it mean anything to you, and if not, why not? Now scan away (as you keep reading) and take a quick measure of the general page layout: the font sizes, the characters per line, the lines per page, the leading, the headers, footers, margins. And there is so much more to be seen, registered, understood simply at the documentary level of your reading: paper, ink, book design, or the markup that controls not the documentary status of the text but its linguistic status. What would you be seeing and reading if I were addressing you in Chinese, Arabic, Hebrew – even Spanish or German? What would you be seeing and reading if this text had been printed, like Shakespeare's sonnets, in 1609?
Alright McGann, you have my attention.

This chapter in Blackwell's Companion is, much like "Databases," incredibly technical, although I expected this when I selected and paired the two together. As I mentioned above, DH is a technical field and I think it's important to have an introductory backbone that addresses how technical it can be.

Text markup involves the breakdown of language into words and even smaller units, in order to analyze the bits that work together to communicate ideas. This may sounds a bit scientific and don't be mistaken, it is. In face, McGann compares it to physics.
Words can be usefully broken down into more primitive parts and therefore understood as constructs of a second or even higher order. The view is not unlike the one continually encountered by physicists who search out basic units of matter. Our analytic tradition inclines us to understand that forms of all kinds are "built up" from "smaller" and more primitive units, and hence to take the self-identity and integrity of these parts, and the whole that they comprise, for objective reality.
I might even compare it to chemistry, studying the molecules that make up compounds in order to understand why the compounds act as they do. How interesting, to compare that to words.

In text markup, the objective is to instruct the computer to identify basic elements of natural language text, in order to understand the redundancy and ambiguity that is inherently tied to language. This is not unlike the example database Ramsey analyzes in the first article. He spends a good bit of time examining the redundancies caused by errors in the programming of his data. In this article however, we are finally introduced to the TEI, or Text Encoding Initiative, which I've come to learn is major in DH methodologies. The TEI system is, according to McGann, "designed to 'disambiguate' entirely the materials to be encoded."

This is still very murky and confusing. Luckily McGann backtracks a bit, and explains what he calls "traditional textual devices," in order to later unpack the intricacies of TEI and SGML- standard generalized markup language, the overarching title for markup languages. The power of traditional textual devices lies in their ability to make records of their progress and process these records without the use of technology.
A library processes traditional texts by treating them strictly as records. It saves things and makes them accessible. A poem, by contrast, processes textual records as a field of dynamic simulations. The one is a machine of memory and information, the other a machine of creation and reflection... Most texts – for instance, this chapter you are reading now – are fields that draw upon the influence of both of those polarities.
SGML, on the other hand, looks at texts through the scope of data and coding and uses these tools to process and record although the use of the tools requires a humanist to curate the work. TEI, more specifically, can be programmed to focus directly on things that stand apart, to mark them as different, so the humanist can later come in and analyze meanings that may be tied to.

It's at this point I realize I'm going to need to dial it back and come back to this article. The syllabus that I've been compiling for this course is an ever-changing being and, after getting about halfway through this article, I see that I need to find a more basic explanation of TEI and SGML. Despite reading through the article several times, I feel completely lost-- which just means I need to learn more, and go down another rabbit hole.

McGann does pull my interest back when he applies markup to the poem, "The Innocence," by Robert Creeley. Although I am murky on how markup is done, McGann's six readings through the text showed the different elements that come to light through markup, which wouldn't be immediately obvious otherwise.

Of his choice of this poem, McGann explains:
I choose "The Innocence" because it illustrates what Creeley and others called "field poetics." As such, it is especially apt for clarifying the conception of the autopoietic model of textuality being offered here. "Composition by field" poetics has been much discussed, but for present purposes it suffices to say that it conceives poetry as a self-unfolding discourse. "The poem" is the "field" of action and energy generated in the poetic transaction of the field that the poem itself exhibits. "Composition by field", whose theoretical foundations may be usefully studied through Charles Olson's engagements with contemporary philosophy and science, comprised both a method for understanding (rethinking) the entire inheritance of poetry, and a program for contemporary and future poetic discourse (its writing and its reading).
As I came to the end of this chapter, I found the appendices to be helpful in unraveling some of the more complex parts of the discussion but, overall, I think that I need to find a more basic reading that will start from square one of markup, so that I'll be able to build a stronger base understanding of the methodology. I knew I was in for a lot in this field of study, and the intricacies haven't scared me off yet!

As was the case with "Databases," I'm going to need some examples or practical application because the theory is quite dense, but it's incredible to know the things that can be accomplished when technology is married to the humanities. It certainly seems that the digital humanities use both sides of the brain, fusing logical and creative, to create something entirely new.


As a side note to this blog, it's going to be really funny when I read this later on and have a deeper understanding of DH, and see how much I'm struggling to unpack all of the theory

Monday, February 20, 2017

Data and Dimension - Pt. 1

Going into the field of DH, one must be just as aware of the "Digital" side, as the "Humanities." The digital end of digital humanities is no small matter and there's quite a bit of work that goes into tools for textual analysis. This week, I'll be walking through two articles from Blackwell's A Companion to Digital Humanities, both of which talk about the technological side.

A few semesters back, I decided to learn about coding. I took lessons at Code Academy and learned a little about HTML and CSS, and I loved it. It was amazing to learn a little about the "language" of computers, and I honestly wish I had had this opportunity when I was younger. Part of what draws me to DH is the opportunity to learn to use technology in a way that marries it to my first love, literature. 

The reason I bring up coding for the reason that it, much like databases, is so incredibly complicated. In taking a few lessons on basic coding, I know a fraction of a fraction of what goes into making one single webpage. I get the same feeling from this reading on databases, they can be quite simple, but so much goes into making a truly responsive database.

To delve more into that point we have our first article, "Databases," by Stephen Ramsey. Databases have existed, in one form or another, for a long time and serve as a way to categorize and store data for easy retrieval. Computerized databases add another element to the mix, a need for systems that "facilitate interaction with multiple end users, provide platform-independent representations of data, and allow for dynamic insertion and deletion of information." Databases play a large role in the DH, as the compilation of data can aid in charting relationships and themes throughout a number of books, or fields of data. Although this, as previously discussed, may seem daunting to the humanist, it is actually quite an exciting addition to the field. As Ramsey notes: 
The most exciting database work in humanities computing necessarily launches upon less certain territory. Where the business professional might seek to capture airline ticket sales or employee data, the humanist scholar seeks to capture historical events, meetings between characters, examples of dialectical formations, or editions of novels; where the accountant might express relations in terms like "has insurance" or "is the supervisor of", the humanist interposes the suggestive uncertainties of "was influenced by", "is simultaneous with", "resembles", "is derived from."
The first model of database discussed in this article is the relational model, which studies relationships between databanks-- or sets of data. The man who first proposed this model reasoned that "could be thought of as a set of propositions…and hence that all of the apparatus of formal logic could be directly applied to the problem of database access and related problems." Sounds logical to me!

Databases are quite complicated entities, and I'm going to try to keep my secondhand explanation as simple as possible. Ramsey delves into the finer points of what goes into a database and, at its most simple, a database includes a system which can store and query data, as well as one that can answer simple questions by linking together stored data. However simple as these systems are, databases can get quite large, and that's when the algorithms start getting more complicated.

Ramsey goes into the different categorizations of data in his example of a database that stores information about current editions of American novels. By showing the problems that can arise from the most simplistic categorizations, he explains that there are other ways in which data can be categorized that are more complicated, but yield better results. Further, there are different ways in which data can be compared, which complicates things even more. In his American novels example, he talks about comparing one author to many works (1:M), or comparing many publishers to many works (M:M), and how the system would logically go about making the calculations needed to give a result.

Are you still with me? It's going to get much more technical. The next subject to be discussed is schema design. This is where we get into programming, database schema, which is created using Structured Query Language, or SQL. The best, least daunting way I can think to describe this is by saying that in using SQL the user is telling the machine what to do. The humanist tells the computer what they want it to do with the data they will be using. Even though it includes a lot of code, it's somehow less daunting if you think of it as giving direction. Below is an example of SQL, a basic structure that will later be filled in with data.

Ramsey explains this next part well, so I'm going to direct you to him for this bit:
Like most programming languages, SQL includes the notion of a datatype. Datatype declarations help the machine to use space more efficiently and also provide a layer of verification for when the actual data is entered (so that, for example, a user cannot enter character data into a date field). In this example, we have specified that the last_name, first_name, title, city, and name fields will contain character data of varying length (not to exceed 80 characters), and that the year_of_birth, year_of_cleath, and pub_year fields will contain integer data. Other possible datatypes include DATE (for day, month, and year data), TEXT (for large text blocks of undetermined length), and BOOLEAN (for true/ false values). Most of these can be further specified to account for varying date formats, number bases, and so forth. PostgreSQL, in particular, supports a wide range of datatypes, including types for geometric shapes, Internet addresses, and binary strings.
There's a lot more technical discussion in this article that I struggle to explain in my own words so I direct you to the text if you're interested in learning more about the programming side of SQL. What I've come to see is that it's incredible interesting and incredible precise. Much the same as coding, there is a fine art to speaking the language of the computer and communicating effectively. I'd be excited to take a lesson in this and have hands-on instruction. I'm hoping for a THATcamp in my area, as I think this would be the best opportunity to learn from others.

Ramsey broaches the discussion of data management in the last few paragraphs of his article. With great power comes great responsibility, so to speak, and anyone who has ever played around with HTML can tell you that the smallest error can throw off a large amount of work. The same is true with database management, and Ramsey suggests giving full access to very few people, for the sake of data and code security. After all, not many people need full access to an entire system. The less room for error, the better.

As we can see from the brief introduction to databases, here's a lot that goes into database programming, and there's certainly a learning curve that is not easily overcome. Luckily, there are a ton of resources to help the aspiring learner. Ramsey cites three of the most commonly used SQL tools, MySQL, mSQL, and Post-greSQL, as helpful options for those interested in using this methodology.

Because the readings this week are so dense, I'm going to split up the Great Wall of Text and direct you here for part two of this week's blog post!

Monday, February 13, 2017

Why Digital Humanities?

At last, I begin my first week of reading the articles I have compiled in my reading list-- a very exciting time.

Two of the readings this week will come from Blackwell's Companion to Digital Humanities, and the third is an article that I believe will be helpful in integrating DH into the English program curriculum. Without further ado, on to the first reading!

"The Digital Humanities and Humanities Computing"

Our first selection is written by Susan Schreibman, Ray Siemens, and John Unsworth, and it's a helpful introduction to the Blackwell companion, explaining some of the "hows" and "whys" behind the DH field. Although the field is immense, there is an overarching goal of using the technologies offered by the Digital Age to help researchers in their quest for knowledge.

The "Digital Age" (or "Information Age") has taken the world by storm, and scholars have decided that it's time to integrate new technologies into fields that have become tired and worn out after years of ceaseless analysis. However, this storm has been met with resistance by many because if there's one thing scholars like, it's time -honored tradition. The editors reflect on this in the following selection:
Thomas documents the intense methodological debates sparked by the introduction of computing in history, debates which computing ultimately lost (in the United States, at least), after which it took a generation for historians to reconsider the usefulness of the computer to their discipline. The rhetoric of revolution proved more predictive in other disciplines, though – for example, in philosophy and religion. Today, one hears less and less of it, perhaps because (as Ess notes) the revolution has succeeded: in almost all disciplines, the power of computers, and even their potential, no longer seem revolutionary at all. 
Luckily for us all, the DH field has thrived and opened up countless new opportunities for study. The next obstacle to overcome is learning to use the various tools, and this is no small challenge. However, daunting as these tools might initially seem, the editors are quick to point out that the purpose of the DH is to weave to methodologies together with practical application. Much like traditional methodologies, the purpose is to use to tools at hand to discover new things about the field being explored. Although the tools and methodologies are important, the results are just as crucial.
The growing field of knowledge representation, which draws on the field of artificial intelligence and seeks to "produce models of human understanding that are tractable to computation" (Unsworth 2001), provides a lens through which we might understand such implications.
The editors and their cited sources conclude that the computational techniques and resulting data structures can have a great deal of impact on the way we interpret "human information." They conclude their introduction by dwelling on the powerful nature of the DH, positioning it next to other time-honored forms of inquiry, and suggesting that, due to the power of the analytics at hand, it may prove itself to be more powerful than any we have seen.

"Literary Studies"

With such a grandiose introduction, I'm expecting greatness from this book, and this field in general. Now, I'll move on the the next chapter, "Literary Studies," by Thomas Rommel, which will hopefully narrow down this broad field.

In the course of the article, Rommel discusses the wide range of opportunity that has been granted to the humanities by the introduction of the technology into the realm of criticism. He details how, upon it's birth in the 1960s and 70s, electronic media has changed the nature of the classroom. Once upon a time, scholars and students alike were limited to a set amount of options for textual data- relegated to however many they could feasible access and read in order to draw conclusions. This is not to deny the great critical works that were born in this time, but the rise of the internet has granted us the birth of a new age:

The "many details", the complete sets of textual data of some few works of literature, suddenly became available to every scholar. It was no longer acceptable, as John Burrows pointed out, to ignore the potential of electronic media and to continue with textual criticism based on small sets of examples only, as was common usage in traditional literary criticism: "It is a truth not generally acknowledged that, in most discussions of works of English fiction, we proceed as if a third, two-fifths, a half of our material were not really there" (Burrows 1987).
Another interesting fact that I came across is that, with the rise of technology, we have begun to move away from close reading, a methodology now firmly linked to traditional criticism. Close reading dictates that a scholar read a work (or works) thoroughly, in order to pull key ideas from the texts, in order to mine ideas. On the other hand, DH methodologies allow for interpretation of the text (or texts) as a whole, by way of surveying the entire corpus, regardless of length or number of works.
Comparative approaches spanning large literary corpora have become possible, and the proliferation of primary texts in electronic form has contributed significantly to the corpus of available digital texts. In order to be successful, literary computing needs to use techniques and procedures commonly associated with the natural sciences and fuse them with humanities research, thereby bringing into contact the Two Cultures: "What we need is a principal use of technology and criticism to form a new kind of literary study absolutely comfortable with scientific methods yet completely suffused with the values of the humanities" (Potter 1989).
As explained by Potter, cited in the above pull quote, the use of scientific (or technological) methods does not take away from the importance of the data. Much like a trip in the car does not take away from the experience of the vacation, the use of a computer does not negate the importance of the knowledge gathered. Further:
If a literary text carries meaning that can be detected by a method of close reading, then computer-assisted studies have to be seen as a practical extension of the theories of text that assume that "a" meaning, trapped in certain words and images and only waiting to be elicited by the informed reader, exists in literature. By focusing primarily on empirical textual data, computer studies of literature tend to treat text in a way that some literary critics see as a reapplication of dated theoretical models.
Within the text, Rommel cites another critic who brings the argument even closer to home by directly  citing popular forms of literary criticism, and likening them to DH methodologies:"'One might argue that the computer is simply amplifying the critic's powers of perception and recall in concert with conventional perspectives. This is true, and some applications of the concept can be viewed as a lateral extension of Formalism, New Criticism, Structuralism, and so forth. (Smith 1989)'"
One major way to view literary texts through this lens is to examine repeated structures, and analyze the meaning of the results. Repeated structures could be characters, words, or phrases that are chosen from a work, or series of works. For example, one could examine the appearance of the word "home" in relation to the appearance of female characters in a collection of texts of the Victoria era, and analyze what a correlation could mean. 
Chapters, characters, locations, thematic units, etc., may thus be connected, parallels can be established, and a systematic study of textual properties, such as echoes, contributes substantially to the understanding of the intricate setup of (literary) texts.
All praise for technological analysis standing, Rommel notes that it's still important to not get caught up in the tools, and set less importance on the results. The tools can only provide the information that is already in existence, the rest of the work needs to be done by the human mind. The humanities do, after all, bear a lot of weight in the Digital Humanities. One might say that the way the words are arranged tells the story of the field: Digital, the front, technological work that goes into an effort, Humanities; the analysis that must be done in order to validate the technological side.

Once upon a time, scholars had excuses for not using technologies- they were widely inaccessible. Prior to TEI (Text Encoding Initiative- one such DH methodology), the programs offered were complex and required much study to be understood. The endeavors were expensive, and required more than one person in order to be successful. Nowadays, we no longer have excuses for not using the technologies at hand, and yet these technologies are still only marginally discussed. The results of using technology as a tool for literary criticism are notable, so the remaining excuse is aversion to change. People are fearful of new technologies, choosing instead to stick to the path most traveled. Ironically enough, isn't that everything we're warned against in the world of literary criticism. Freud, Derrida, Fish, Bakhtin-- these people did not influence schools of thought by sitting around, saying and doing the things that people wanted to hear. They stirred things up, and I think it's time we do that with the technological advantages we have at our disposal.

"What Is Digital Humanities and What’s It Doing in English Departments?"

My final reading for this week is an article called "What is Digital Humanities and What's it Doing in English Departments," by Matthew G. Kirschenbaum. When I found this article, it jumped out to me because it's exactly the question that many ask about the field of DH-- what is this strange idea doing in my English classroom? In fact, as we've seen thus far, because of this very question, DH is not in many English classrooms. 

The popular alarm of the English classroom sounds a little something like, "Computers? Never! Not in this class!" and book enthusiasts are likely to chime in, "E-book? Never. It's just not like holding a real book."

While these concerns can be valid personal preferences, they reflect a scarily real amount of opposition to technological advancement, that can be harmful to students of language and literature. Let's get candid for a moment here- we all know the field is saturated. We all know the starry-eyed Austin or Hemingway-crazed undergrad who pursues his or her dream through grad school, only to graduate and have their bubble burst, left unemployed and desolate in the job search. Does that sound real? Perhaps I know one too many people who fit the bill. 

There are always going to be positions opening up, people retiring after long and illustrious careers but, even so, if you know that the pool is large and the opportunities are few, wouldn't it be beneficial to differentiate in some way? Wouldn't it also be wise to see the world changing around us, and realize that this could open opportunities to jobs that traditional scholars can't fill? I give you, reader, the digital humanities. 

Kirschenbaum argues that computers are not the enemy of the English department, in fact, they're one of its biggest opportunities. He discusses text analysis tools such as we discussed above, and praises the networked connections that are birthed from a field that values interaction and working together to learn and grown. Because the DH is a newer field, people have more of a tendency to rely on one another to grow and teach, rather than the tired "sit and listen as I tell you everything you need to know" way of the past. 

In more recent years, beyond the 2004 publication of the Blackwell Companion, associations and alliances have been formed which support the DH, along with the Digital Humanities Initiative, which created an official support system for the field, elevating it to a higher and more recognized standard. Additionally, Kirschenbaum included the following segment in his article:
Digital humanities was also (you may have heard) big news at the 2009 MLA Annual Convention in Philadelphia. On 28 December, midway through the convention, William Pannapacker, one of the Chronicle of Higher Education’s officially appointed bloggers, wrote the following for the online “Brainstorm” section: “Amid all the doom and gloom of the 2009 MLA Convention, one field seems to be alive and well: the digital humanities. More than that: Among all the contending subfields, the digital humanities seem like the first ‘next big thing’ in a long time.” 
In the same way that they can be scary to those wary of change, the digital humanities are exciting to people who have been looking for a new way of channeling their love of literature in up and coming ways. The DH field brings an air of vitality to a beloved, but somewhat tired world, and the more people who support it, the better!

In conclusion to his article, Kirschenbaum answers the question he posts in the title of his article in 6 ways, which I will summarize here:

  • The DH gives us new ways to process and analyze texts, the backbone of English departments.
  • There's a powerful association between computers and composition which should used to its fullest extent.
  • We've been looking for a meeting point between technology and conventional editorial theory and methods, and here it is.
  • Electronic literature (E-Lit) is an up and coming field that is bright, interesting, and diverse.
  • English departments have long been open to new cultural developments and movements. Why should this be an exception?
  • The explosion of interest in e-readers and text digitization have supported the development of text mining tools and programs that are able to work with digital data.

In short, you ask, why DH? I ask you, why not?

Monday, February 6, 2017

Reading List Pt. 2

This week's blog is dedicated to part two of my reading list. This list is going to be updated throughout the week, as I need to find more resources to add but, first and foremost, if you're reading this and you're involved in the Digital Humanities, please feel free to contact me with resources and advice. In compiling this list, I've come to have a decent understanding of the theory behind the DH, and my next step is practical application. I'm looking for resources that explain the how behind the methodologies. I'm working on it, but I would love to talk with anyone in the field. Anything you can share with me is much appreciated!

That being said, on to the links.
The first few links are, once again, from Blackwell's A Companion to Digital Humanities, compiled by editors Susan Schreibman, Ray Siemens, and John Unsworth. These articles delve further into the technology involved in the DH field.
"Designing Sustainable Projects and Publications" by Daniel V. Pitti
"Conversion of Primary Sources" by Marilyn Deegan and Simon Tanner
"Text Tools" by John Bradley
I've found the following articles to provide helpful supplementary information to the text, particularly the above chapters:

"Text and Data Mining and Fair Use in the United States"

"Text and Data Mining" by Maurizio Borghi

From what I've seen of A Companion to Digital Humanities, I like this book. It seems to have a good combination of introductory texts, helpful entry points into the field. My next step in this process is going to be unpacking the methodologies, which has proved to be a challenge and is the part of this process that I'm going to work on throughout the week.

Last week I mentioned Blackwell's A Companion to Digital Literary Studies, compiled by editors Susan Schreibman and Ray Siemens. I'm going to include a few articles from this compilation in my syllabus, that I've found to be personally interesting, as my interests in the DH are linked to literary analysis, and I'm planning on using the DH in my MA thesis. However, I'm not going to explore these links until the latter part of the semester, as my primary focus is an intro to the field, in general.
"Knowledge will be multiplied": Digital Literary Studies and Early Modern Literature by Matthew Steggle
"The Virtual Library" G. Sayeed Choudhury and David Seaman
(To Be Continued)  
Here is the link to my working syllabus, which includes the readings I will be doing for each week. I've assigned myself two readings per week and I've tried to pair them based on subject matter. The syllabus is open for comments and feedback is welcome. This is an interactive field, and I would love to meet others who are also interested in exploring the DH.

Next week's post is going to include a summary and analysis of the first two articles that I have chosen. I'm excited to dive in!