The Issue

How can historians use research data (i.e. photos, oral histories, text) in new ways?

I think a lot about data these days. As a CLIR Fellow in Digital Scholarship at the University at Buffalo, I am concerned with the role libraries can/should play in making research data accessible, usable, and re-usable. But I cannot help but approach that issue from my experience as a graduate student and social historian. Over the years, I have gathered spreadsheets and spreadsheets of data, tons of images, and oral history interviews in order to understand the individual and collective experiences of people from the past – and to tell stories.

As I collected data, I knew I wanted to use it in creative ways beyond the printed word. I was especially interested in historical mapping and visualizing migration. But how would I get from the the primary source to a completed project? When and where does a history graduate student learn how to gather and document data in a way that will facilitate creative re-use?

My hope is that history departments will work with libraries to incorporate digital methodologies into graduate curricula and to help students plan for flexibility in the interpretation and dissemination of their research. While the monograph is not going away in the immediate future, it will likely change form (see, for example, the Fulcrum platform at the University of Michigan). And there are so many dynamic ways to analyze and share research – through websites/online exhibits, text mining, podcasts, audio documentaries, blogs, topic modelling, historical maps, published datasets, digital collections, social media and more. But when we do our research we’re not always thinking about how it might be re-purposed and communicated in the digital realm. This is especially true for new graduate students.

Quality Matters

Learning about the possibilities early on can save time and money later. It is increasingly important to think not just about the content, but also the form and structure of our research. For example, when I record an oral history interview, I might be most concerned with capturing voices, but less attentive to background noise like a TV. After all, I will transcribe the interviews and incorporate the information into my dissertation or book manuscript. The quality of the audio may not seem as important as the spoken words. Still, spoken words are so powerful and contain tone and inflections not adequately conveyed in text.

What if I decide a few years later that I want to create an audio documentary or podcast episode using interviews? Or incorporate media into my dissertation? Then quality matters, a lot. But I can’t go back and recreate these interviews.

When I’m transcribing data from a property deed or court record into a spreadsheet, I will save on time and stress if I do so in a format that lends itself to computer analysis or visualization, or at the very least that is consistent. Here are examples of how my understanding of spreadsheet data has changed over time. One is a spreadsheet of biographical community data I started in 2007 to try to understand social and kinship networks (some of the columns are hidden for display clarity). The other is a more recent spreadsheet of WWI draft card data (started years ago, but “cleaned up” in recent years).
Click on the spreadsheets for a clearer view.

Spreadsheet of messy data
Spreadsheet of Messy Data
Spreadsheet with organized data
Spreadsheet of Organized Data

When I was a graduate student I was fortunate to learn something about structuring data because I took an Introduction to Quantitative Historical Methods at the ICPSR Summer Program in 2008, which resulted in a poster on Rural African American Migration within the South (presented at the 2008 Social Science History Association meeting).

Poster showing statistics on African American rural migration within the South

That poster drew upon aggregate census data, but also World War I draft registration card data I had transcribed. In that class, I learned a bit about putting data into spreadsheets, but the emphasis was on basic statistical analysis and not the larger world of digital projects and data visualization and analysis. Of course that was 2008. Now that the digital humanities has caught fire, more librarians are teaching these skills, and perhaps a handful of history faculty. But it’s still uncommon. I have been fortunate that I’ve received scholarships to attend other institutes, but not everyone can travel for logistical and financial reasons. I want to give credit to Paige Morgan and Yvonne Lam for expanding my skills in the Digital Humanities Summer Institute course Making Choices About Your Data.

Plan for the Future

Being aware of what the digital possibilities are for historical research can open up doors down the road – new publication forms, collaborative digital projects, new forms of community engagement. Libraries are a great place to learn about tools and methodologies, but they also need to be integrated into departmental curricula.

Of course we still have to reckon with the fact that we are humans pressed for time. Technology won’t solve that, and in some ways it complicates the choices we make. At times I have chosen expediency over flexibility and content over form, and later regretted it. I felt prompted to write this post because I’m interested in analyzing court testimony from a federal peonage case from 1906. Perhaps I will encode the testimony with TEI, or at the very least, make searchable text available online. But I took pictures of documents with an old-model, poor resolution camera when I could have used a scanner at the archives. Taking pictures of 1000 pages was faster. As I think about the possibilities of text analysis, I regret that I did not choose the option that would give me more flexibility down the road (the poor image quality makes OCR a mess, although still might try some image manipulation). I am a fast typist, so maybe….and in fact I have been transcribing (this is testimony by Emory Nichols in United States vs. Charles M. Smith, Sr., et al.)

The pressure to maximize time in the archives will always be present. And the impulse to put data into spreadsheets in a way that is quick and useful for us to visually absorb, but is not machine readable, will remain. But I hope that history departments will start to incorporate digital research methodologies into curricula and help scholars think beyond the text. What are the possibilities and how might we plan for them?

