- Uriel Abulof, LISD
- Christiane D. Fellbaum, Computer Science
- Ben Johnson, Digital Humanities Center
- RSVP Required
- Faculty/Student Only
The “text as data” approach promises new perspectives on empirically based research in the humanities and the social sciences. Corpus linguistics and Natural Language Processing techniques enable the sophisticated analysis of texts on a large scale and allow researchers to tap into modern discourse in novel ways. The DiGCor project is at the cutting-edge of this promise. Its mission is to construct a unique, very large corpus of multilingual discourse covering a variety of texts from sources around the world throughout the last century. DiGCor does not seek to reinvent the wheel, but intends to refine and integrate four extant wheels such that they will support and move forward a new vehicle for social knowledge. Our aim is to go beyond mere word statistics and access a deeper, conceptual level of meaning, using a four-pronged approach:
- Analysis of collocational properties of words;
- Dynamic topic modeling: probabilistic patterns of word use over time;
- Sentiment and reason analysis: identify the writers’ emotions and judgments buried in texts;
- Diffusion analysis: the tempo-spatial spread and contraction of keywords.
DiGCor currently has rich data resources but welcomes the contribution of additional digital corpora to extend its analyses and share its findings. Interested scholars and students are encouraged to attend to this introductory session.