Jonathan Hope from Strathclyde University gave a presentation to the Renaissance Graduate Seminar at the English Faculty last night about DocuScope, a major collaborative project based at Carnegie Mellon University for the ‘computer-aided rhetorical analysis’ of texts. The team behind DocuScope includes computer scientists, linguistics specialists, and literary scholars, and the idea is based on a system (or a ‘text analysis environment’, as the project website puts it) that was originally created for the analysis of students’ creative writing. In essence, a teacher could run writing samples through a computer programme, and use its statistical analysis of rhetorical features as the basis for further discussion with students – getting them to think about why one writing sample features a much greater frequency of a certain linguistic feature than others, for example.
In its current shape DocuScope is much more mind-bogglingly complex. Hope illustrated this by showing us what its analysis of the whole known corpus of early modern drama looks like. I won’t try to explain how the system actually works (you can read expert accounts elsewhere, like here) but it was interesting to see the forms of output that can be generated, such as dendrograms, which arrange the works according to how similar they are to each other, and depict strong and weak connections between texts based on their linguistic features. The entire corpus of Shakespeare’s plays has also been filtered through this mysterious machine, and Hope showed us some of the colourful visual representations of these results. According to DocuScope’s categories of rhetorical analysis, A Midsummer Night’s Dream and The Merry Wives of Windsor are least similar to other plays, and The Merchant of Venice is least dissimilar.
Some of the limitations and possible pitfalls of this tool are obvious, others less so. But the point of DocuScope is clearly to raise questions, not provide answers – indeed, Hope referred to it as ‘a problem factory’, which serves to provoke further debates. DocuScope has much potential; in the future it could, for example, provide another slant in investigations of authorship or dating of texts. One more general point that Hope’s paper raised was the growing necessity for scholars in the arts and humanities to develop their skills of statistical interpretation. An understanding of how statistics may be used and abused will increasingly become essential for teachers and researchers working with digital tools and resources.