Working With Raw Data in a Digital World

Albert CamusThis particular week I am exploring some new ideas with my project. I originally wanted to work with two different sets of letters from soldiers who lived up in the northern area of the United States by data mining the contents of these hand written letters and comparing them to letters, which were published in book form. The outcome would be placed in a series of word clouds and compared for analysis. Since then I’ve reconstructed the project and have chosen a set of letters from a rural farm boy in Berlin, Mass. and a young corporal from Liberty, SC. to compare to the those of the letters published in book form, along with using a storyboard from Google’s Social Explorer to help with some of the analysis. The Social Explorer can be used in similar ways as a Prezi story board. I am creating a map story board to help with analysis of the areas that each young man came from to see if any noticeable difference can be observed in the word cloud which would correlate with the background of the areas they lived in.

I think the difficult part of this project is the transcription of the letters. Currently there is not advanced technology that can recognize hand writing well enough to help with converting the data. Transcription of these letters can be a time consuming process. With the collection I have (just got another 100 or so WWI letters in the mail) it will be a constant job of transcribing. Another difficult aspect is the handwriting itself. Some words are totally illegible, some pages have been eaten by rodents and parts of passages are missing, and sometimes the words are just a scribble and its a guessing game as to what is being written. I couldn’t imagine a computer trying to do what the human element is more capable of, with regard to these problems. The computer output would still need to be reviewed for accuracy though, should this become a viable technology in the future, just to combat these similar problems.


