CANCELLED: Text as Data: Introduction to Diachronic Global Corpus

Event Date: 
Wednesday, February 5, 2014
Event Time: 
4:30 p.m.
Location: 
012 Bendheim Hall

Due to the University closing for inclement weather, this event is cancelled for this evening.  To be rescheduled.   The Liechtenstein Institute on Self-Determination will host an information session, "Text as Data: Introduction to Diachronic Global Corpus," on Wednesday, February 5, 2013, at 4:30 p.m. in 012 Bendheim Hall. The session will introduce the new digital humanities project, Diachronic Global Corpus (DiGCor), co-directed by LISD Visiting Scholar, Uriel Abulof, Christiane D. Fellbaum, Senior Research Scholar in Computer Science, and Ben Johnston Manager of the Princeton University Humanities Resource Center. To attend, RSVP by February 4 to Angella Matheney. Light refreshments will be served. 

The “text as data” approach promises new perspectives on empirically based research in the humanities and the social sciences. Corpus linguistics and Natural Language Processing techniques enable the sophisticated analysis of texts on a large scale and allow researchers to tap into modern discourse in novel ways. The DiGCor project is at the cutting-edge of this promise. Its mission is to construct a unique, very large corpus of multilingual discourse covering a variety of texts from sources around the world throughout the last century. DiGCor does not seek to reinvent the wheel, but intends to refine and integrate four extant wheels such that they will support and move forward a new vehicle for social knowledge. Our aim is to go beyond mere word statistics and access a deeper, conceptual level of meaning, using a four-pronged approach:

  1. Analysis of collocational properties of words;
  2. Dynamic topic modeling: probabilistic patterns of word use over time;
  3. Sentiment and reason analysis: identify the writers’ emotions and judgments buried in texts;
  4. Diffusion analysis: the tempo-spatial spread and contraction of keywords.

DiGCor currently has rich data resources but welcomes the contribution of additional digital corpora to extend its analyses and share its findings. Interested scholars and students are encouraged to attend to this introductory session.