The Digital Cavendish Project. Shawn Moore, Jacob Tootalian, et al.
The Digital Cavendish Project (DCP) is the fruit of an exciting and very hybrid collaboration of university-affiliated and independent scholars, all intent on making Margaret Cavendish’s wide and varied corpus available in its finest digital form while also highlighting adventurous new research, digital methods and pedagogical practices. As with most hybrid fruits (think plumcots, tomtatoes and limequats), the initial reaction to the DCP may simply be: “how do I consume this thing?” The website proposes a startling variety of resources under three deceptively simple navigational categories – “Resources and Publications,” “Research and Scholarly Resources” and “Electronic Texts and Editions”. Under those labels lie such diverse offerings as links to a near-comprehensive Cavendish bibliography and to a Cavendish-inspired graphic novel; invitations to partake in crowdsourced editing of digitised versions of Cavendish’s works; a foray into advanced principal component analysis of Cavendish’s use of genre; a comprehensive index of Cavendish’s printers and editors; screenshots of archival materials and of Gephi-generated social networks of the Cavendish circle; and much, much more. The DCP‘s most obvious contribution rests simply in aggregating all of these projects under one tent; the more promising contribution, however, – and it is a prospective one – will lie in organising these projects coherently and making sense of the vast web of interests that the Cavendish revival of the last forty years has generated. Few early modern figures have garnered so interdisciplinary a coterie of scholars; few have posed such a challenge to categorisation. The promise of the DCP, over the long term, lies in its flexible hospitality to the myriad projects – scholarly, public and pedagogical – that Cavendish’s polyvalence will continue to inspire. In the meantime, students of the Duchess should make full use of the DCP‘s remarkable and elegantly searchable Cavendish_Corpus, the closest thing we have to a Complete Works, and arguably much more practical.
The DCP is unabashedly collaborative. Its founder and director, Dr. Shawn Moore (Florida Southwestern State College), has drawn on academic connections throughout the southeastern peninsula, bringing in Jacob Tootalian (University of South Florida), Steward Duncan (University of Florida) and Barry Hughes Shelton (University of Georgia), while also reaching out to the West Coast, United Kingdom and Spain for contributors including Martine van Elk, Erin McCarthy, Jose Saiz Molina, Cameron Kroetsch, Brandie Siegfried and Lisa Walters. The DCP houses various projects and contributions without necessarily seeking to unify or reconcile their visions of Cavendish or of literary scholarship. The book-historical interests of Erin McCarthy stand alongside Shawn Moore’s social network graphs without mutual reference, as though in silos. The advantage of this looser framework is that it leaves room to grow and include more projects, more contributions. That comes at the cost (if indeed it is a cost) of clarity of purpose and unity of voice or message. This much remains clear: the DCP presents itself not as a monolithic authority on all things Cavendish but as a kind of semi-curated variorum. Pick your plumcots at your peril.
The core feature of the DCP may well be its searchable corpus of Cavendish’s works, the Cavendish_Corpus. You will find it, somewhat confusingly, not under “Resources and Publications”, but under “Research and Scholarly Resources” (already “resources” begins to prove a strained word). Scholars questing after key words and key topics are invited to toggle among four search options: concordance, key-word in context (KWIC), collocation and time series. I, for instance, was interested in “pudding”, which appears seven times in Cavendish’s oeuvre. As I consult the concordance results, I can opt for the KWIC line-by-line view of all the known occurrences of “pudding” as they appear in their original sentences, which facilitates cross-comparison across different works. Alternatively, I can choose a more contextually-rich (i.e. metadata-rich) presentation that neatly distinguishes instances of “pudding” according to their chronological appearance in the corpus and presents each instance of the word within a longer excerpt.
Thus far, the Cavendish_Corpus may seem analogous to the EEBO-TCP search engine that scholars are more familiar with, or its helpful extensions like the text-mining initiative Early Modern Print, directed by Joseph Lowenstein and Anupam Basu, which introduced many early modernists to KWIC searching. But let’s say I am more indulgently browsing and speculating for connections and associations between key words, connections which I have yet to conceive. The collocation feature of the DCP allows me to enter a single search-term and then consult the top 100 collocated terms, modulating my inquiry to search for terms either within the same sentence or within a certain word-range. The results are presented both in list-form and in a handy word-cloud, highlighting the most frequent collocates. The list-form remains conveniently on the side of the screen as I inquire deeper into any given collocate. As I click over to the Cavendish_Corpus’s fourth major search feature, the time-series, I am presented with a histogram of my search-term’s yearly frequency within the corpus from 1653 to 1678; the bar chart invites me to ponder why pudding was three times more on Cavendish’s mind in 1664 than in 1663 or 1671. Even with a seemingly trivial search-term such as “pudding”, these collocation and time-series features promote associative thinking and foster intuitions, as the best discovery-based tools ought to do.
A warning, obvious but worth repeating, for those who would use the Cavendish_Corpus in a more advanced way: word-clouds and histograms lie. For instance, it is not clear which stop-words (e.g. the, and, but, nor, etc.) have been excluded from the word-cloud feature, let alone from the concordance as a whole. Those scholars interested in examining Cavendish’s deictics, stylistics or grammar may find themselves more hamstrung than helped by these and other inexplicit aspects of the search interface. Similarly, a bar chart like the one used in the time-series feature forces scholars to think of years (1653, 1654, etc.) as time-units significant in themselves (as opposed to months or seasons) when they might not have been nearly as significant for Cavendish herself.
Those shortcomings notwithstanding, the Cavendish_Corpus begins to fill an important scholarly gap, namely the lack of a complete works. It was already on Elizabeth Hageman’s mind thirty years ago, as she wrote ELR‘s first “Recent Studies” on early modern women writers, voicing the absence of printed critical editions of Cavendish. Six years ago, Wendy Weise noted the persisting lack in her follow-up piece to Hageman’s review, but with a cautionary spin: “[A]s Cavendish scholarship continues to develop in interdisciplinary complexity, the need for critical editions of all her writing becomes increasingly important.” Today, a selection of Cavendish’s most popular works are available with needed critical apparatus. Yet the dawning realisation that Cavendish’s plays, fancies, poems, science-fiction, philosophical and scientific treatises, biographical fictions and letters constitute a seventeenth-century omnium-gatherum, requiring expert multi-disciplinary attention, faces the steeper-than-ever challenge of seducing academic presses. As we await a press and a team of scholars bold enough to take up the enterprise of a complete works, the DCP‘s free, online, fully searchable and meticulously curated Cavendish_Corpus offers a noteworthy work-around.
The DCP also proposes digital reading editions of Cavendish’s works – large fonts, wide spacing and clean formatting make these reading editions lighter on the eyes than many existing print editions widely employed for teaching. The selection of works is limited at present (some are cut short, others are missing entirely) yet the hope remains that the DCP team will fill out the shelf soon, making Cavendish’s texts accessible free of charge for increasingly digitally-native classrooms. In addition to these reading editions, the DCP serves as a pointer for one of Shawn Moore’s pet-projects, the Dawn of the Unread, a series of graphic novels that bring to life neglected historical figures for a broad, non-academic public. The link to the Margaret Cavendish-theme issue lies under the catch-all miscellany category “Resources and Publications”: that classification is perhaps the clearest sign that no one knows quite how to make good use of this “resource” and “publication”. A similar concern could be voiced for the super-high-quality but unorganised snapshots of manuscript marginalia in the Chawton House Library edition of the 1668 Playes Never Before Printed. It is in instances like these that the DCP begins to seem a messy grab-bag, more of a curiosity cabinet for Cavendish aficionados than a repository of essential tools with which to teach, study and learn about her work. It is more of these kinds of sorting and organisational decisions that the DCP will have to make going forward if it wishes to benefit the public humanities while continuing to host and address the academic community.
In its efforts to make Cavendish’s writings more widely available, the DCP has followed an old farming-and-parenting adage: “if they grow it, they’ll eat it”. Its “Crowdsourced Cavendish” editing project invites newcomers and specialists alike to partake in curating Cavendish’s corpus. Using an intuitive TypeWright interface provided by 18thConnect, students can amend and update the existing transcriptions of Cavendish’s works generated by the Early Modern OCR Project (EMOP), pruning errors made by the optical character recognition software that was used to rapidly transcribe the tricky-to-read early modern originals. The need for such manual emendations, combined with the pedagogical benefits of inviting students into the editorial craft, has made these kinds of projects a fly-away success. We are left wondering only to what extent hosting this kind of project exclusively for rising-stars like Cavendish takes attention away from other large-scale collaborative curation projects like Martin Mueller’s ambition to sift the entire EEBO-TCP corpus.
The DCP not only advertises itself as a repository for primary sources and images; it also serves as an exhibit space for experimental digital projects, focusing thus far especially on social network analysis, genre analysis and corpus visualisation. Partly because the website’s infrastructure remains inadequately developed to showcase these projects’ complicated visualisations, partly because the projects themselves are quite complex and only in their initial stages, this aspect of the DCP is the least convincing of all. Two short and intriguing blog-posts on Cavendish’s social network are plagued by low-resolution screenshots and an uncharacteristic sloppiness in presentation. The more cynical will likely conclude that the colorful network graphs provide few real insights that good old-fashioned thinking couldn’t offer before: of course Cavendish’s social networks need to be tracked across time and geographical movement; of course the shape of her intellectual network has been distorted by those, like Samuel Pepys and Dorothy Osborne, who claimed she was mad! The more optimistic, on the other hand, may reply that different trails, even if they lead to the same destination, offer different vistas: the network-theoretic approach may yet surprise us. Meanwhile, those invested in higher-order statistics and principal component analysis may find hidden meaning in the cluttered DocuScope visualisations of Cavendish’s philosophical works by Jacob Tootalian; I found, disappointingly, a rather inflexible notion of genres and a too flexible notion of discursive patterns. As for the “Visualizing Margaret Cavendish’s Systematic Treatises” project – an imbedded YouTube video that purports to present the changes, cuts, and indexical recombinations made across four of Cavendish’s systematic philosophical treatises, but which actually just pans a static, illegible graphic for thirty seconds – there’s work to be done.
In some ways, this review comes too early in the maturation process of the DCP: the hybrid fruit is still unripe, even if it strikes many savoury notes. Thankfully, its cultivators seem aware of that. Wisely, they remain conscious that the DCP‘s maturation will require both more time and more labourers: hence the crowdsourcing and the forthright invitations to double check their data and their math. The most usable aspects remain the fully searchable corpus and the more traditionally scholarly elements such as the Bibliography Initiative, the list of printers and editors, etc. It would be markedly un-Cavendishian, however, to give up on the more unorthodox and avant-garde attractions of the DCP, and I hope scholars will take up the contributors’ offers to share their data in view of new projects. Theirs is a stand-out effort to discern what a digitally-native Cavendish variorum could and should include, and the curatorial decisions to come will likely only get harder as the Digital Cavendish Project gets better.
University of Notre Dame
 What is Principal Component Analysis? PCA, as it is used in stylometry and linguistics, is a statistical procedure used to reconfigure a set of observations about language-use in a text or corpus whose variables may be correlated (i.e. the scarcity of one word or language-action type may be responsible for the scarcity of another) into a set of values whose variables are uncorrelated (the scarcity of one is not directly influencing the scarcity of the other). These sets of values, or principal components, measure certain traits about the text or corpus and are ranked according to their degree of variance, i.e. how much the text or corpus differs with regard to this particular trait. Usually, the two highest-variance principal components are used as the coordinates (think x and y axes) for a new vector graph that serves to compare two uncorrelated traits. This graph allows one to measure distance and relatedness within a big population of observations, with a certain degree of security that the distribution of data along one coordinate is not directly influencing the distribution of the data on the other axis, and therefore any correlations or clusterings you observe are not merely tautological but potentially meaningful. And what is Gephi? Gephi is a free, open-source network-visualisation software, or as its advertisers call it, “Photoshop™ for graphs”. Its high-quality rendering of images and integrated tools for measuring network-specific metrics (centrality, degree, etc.) have made it a leading software across academic disciplines such as sociology, criminology and biology.
 The Early English Books Online Text Creation Partnership (EEBO-TCP) is a public-private collaboration between Proquest and 150 research libraries, providing searchable, SGML/XML-encoded texts corresponding to more than 125,00 early modern English books. The TCP not only expanded the original EEBO’s search possibilities by adding features such as proximity and boolean searching but by manually keying in its transcriptions, rather than using unreliable optical-character-recognition technology. The Early Modern Print project further refined the search possibilities on the EEBO corpus, with text-mining and visualisation software that reveal words per year and text counts as well as enabling KWIC searching and n-gram browsing.
 Elizabeth Hageman, ELR 18:1 (Winter, 1988), pp. 138-167.
 Wendy Weise, ELR 42:1 (Winter, 2012), p. 76.