eHumanities Desktop

The eHumanities Desktop has been developed as an online system to support research in the growing field of digital humanities. It provides an intuitive desktop environment, very much like the Windows Desktop, that supports uploading and organizing resources as well as sharing them with other users. Based on this basic resource management a growing number of applications are offered to process, analyze and explore documents online. The applications range from linguistic preprocessing of text into TEI P5 [1] and lexical resource management to text classification and image databases.

eHumanities Desktop

Introduction

Figure 1: the architecture of the eHumanities Desktop

The eHumanities Desktop supports the modelling, management, retrieval, and editing of multimedia text and image data, as well as the efficient allocation of resources, supporting at times large volumes of data. The eHumanities Desktop [2, 3], which is under development at the Text Technology Lab at the Goethe University Frankfurt, offers a Service Oriented Architecture (SOA) for functionality in the area of the Digital Humanities. It integrates the Linguistic Networks system [4, 5] for the management, searchability, and visualization of linguistic networks. For the time being, the eHumanities Desktop integrates five application modules and makes them accessible via the web as well as via an API. These include:

  1. the Neo4J-based [6] foundation data model
    eHuBase [2]
  2. the integrated document and lexicon module TEILex [7], also based on a Neo4J database
  3. the module Neo4Wikipedia for the management of wiki-based present-day language
    corpora
  4. the OWL-based annotation module OWLnotator, [8] for the annotation of, where applicable, any number of multimedia resources of the eHumanities Desktop
  5. the Linguistic Networks module [4] that is a part of the eHumanities Desktop also rests on Neo4J.

As the interface for the display, management, and searchability of lexica, the eHumanities Desktop offers the eLexicon Browser. All these modules were developed using the architectural model displayed in Figure 1. For an illustration of the SOA of the eHumanities Desktop, see Figure 2.

Figure 2: the SOA of the eHumanities Desktop.

eHuBase

Figure 3: eHuBase.

The core of the eHumanities Desktop is contained in the fundamental data management system, eHuBase. Here we can view and manage information about users, group membership, and rights (read, write, delete, etc.), about resources, and about base application functionality. eHuBase handles the core data of all the resources that are processed in the eHumanities Desktop. This includes documents, repositories, discrete program functionality, and annotations. The eHumanities Desktop currently has 312 registered users that, in 27 project groups, have access to approximately 800,000 documents. The eHumanities Desktop handles documents in all popular formats, so that users can upload, share, manage, or process text, images, videos, or sound files. The architecture of the eHumanities Desktop allows for a level of abstraction in handling documents, which has to do with the way documents are saved. Binary data (images or video) are saved redundantly in a clustered file system. For this, we use the Apache Hadoop Framework [9] .

TEILex and eLexicon Browser

Figure 4: the eLexicon Browser.

A central feature of the eHumanities Desktop is the integration of lexica and corpora, which can be annotated using the lexica either automatically or manually [10, 7]. Additions, corrections, and revisions to lexica can be done automatically using a linked TEI document as a source, without requiring changes to annotations or a reindexing of the affected corpora. In this way, annotators should be able to edit a lemmatisation without errors or gaps, so that the corrections appear immediately when browsing or searching the corpus. The TEILex module was developed to supply this functionality (see Figure 5). TEILex integrates the data model for TEI P5-conforming documents and for lexica using the same graph database. The Lexical Markup Framework (LMF) [11] serves as an alternative format for TEILex. But the innovation of TEILex is in the integration of documents and lexica, which are used together in processing. Every incidence of a word in a text is linked logically to the corresponding syntactic word in the lexicon. In the case of changes to the lexicon, there is no need for a following data synchronisation. This makes it much easier to make corrections and additions directly to the lexica using the linked document. Documents annotated in this way can be downloaded at any time as TEI P5 documents. The functionality of TEILex is available within the eLexicon Browser as well.

Figure 5: The workflow of document annotation without (above) and with (below) TEILex.

ImageDB

The ImageDB module was developed to support the creation of multimedia corpora. The ImageDB allows users to segment images recursively and to link images and image segments to each other. Segments can take the form of rectangles, circles, ellipses, or any sort of user-defined polygonal shape. To segment an image creates an intra-aggregate relation between the parent and child image. The ImageDB makes it possible to create, manage, and search these inter- and intra-aggregate relations [12]. Consequently the eHumanities Desktop can display the entire range of text-text, image-image, and text-image relationships. The OWLnotator module provides the means for this linking.


[1] Unknown bibtex entry with key [teiConsortium2010]
[BibTeX]
[2] [pdf] A. Mehler, S. Schwandt, R. Gleim, and B. Jussen, “Der eHumanities Desktop als Werkzeug in der historischen Semantik: Funktionsspektrum und Einsatzszenarien,” Journal for Language Technology and Computational Linguistics (JLCL), vol. 26, iss. 1, pp. 97-117, 2011.
[BibTeX]
@ARTICLE{Mehler:Schwandt:Gleim:Jussen:2011,
    journal={Journal for Language Technology and Computational Linguistics (JLCL)},
    pdf={http://media.dwds.de/jlcl/2011_Heft1/8.pdf },
    pages={97-117},
    number={1},
    author={Mehler, Alexander and Schwandt, Silke and Gleim, Rüdiger and Jussen, Bernhard},
    volume={26},
    year={2011},
    title={Der eHumanities Desktop als Werkzeug in der historischen Semantik: Funktionsspektrum und Einsatzszenarien},
    abstract={Die Digital Humanities bzw. die Computational Humanities entwickeln sich zu eigenst{\"a}ndigen Disziplinen an der Nahtstelle von Geisteswissenschaft und Informatik. Diese Entwicklung betrifft zunehmend auch die Lehre im Bereich der geisteswissenschaftlichen Fachinformatik. In diesem Beitrag thematisieren wir den eHumanities Desktop als ein Werkzeug für diesen Bereich der Lehre. Dabei geht es genauer um einen Brückenschlag zwischen Geschichtswissenschaft und Informatik: Am Beispiel der historischen Semantik stellen wir drei Lehrszenarien vor, in denen der eHumanities Desktop in der geschichtswissenschaftlichen Lehre zum Einsatz kommt. Der Beitrag schliesst mit einer Anforderungsanalyse an zukünftige Entwicklungen in diesem Bereich.}}
[3] [pdf] R. Gleim and A. Mehler, “Computational Linguistics for Mere Mortals – Powerful but Easy-to-use Linguistic Processing for Scientists in the Humanities,” in Proceedings of LREC 2010, Malta, 2010.
[BibTeX]
@INPROCEEDINGS{Gleim:Mehler:2010:b,
    publisher={ELDA},
    pdf={https://hucompute.org/wp-content/uploads/2015/08/gleim_mehler_2010.pdf},
    booktitle={Proceedings of LREC 2010},
    author={Gleim, Rüdiger and Mehler, Alexander},
    year={2010},
    title={Computational Linguistics for Mere Mortals – Powerful but Easy-to-use Linguistic Processing for Scientists in the Humanities},
    address={Malta},
    abstract={Delivering linguistic resources and easy-to-use methods to a broad public in the humanities is a challenging task. On the one hand users rightly demand easy to use interfaces but on the other hand want to have access to the full flexibility and power of the functions being offered. Even though a growing number of excellent systems exist which offer convenient means to use linguistic resources and methods, they usually focus on a specific domain, as for example corpus exploration or text categorization. Architectures which address a broad scope of applications are still rare. This article introduces the eHumanities Desktop, an online system for corpus management, processing and analysis which aims at bridging the gap between powerful command line tools and intuitive user interfaces. }
    }
[4] A. Mehler, S. Schwandt, R. Gleim, and A. Ernst, “Inducing Linguistic Networks from Historical Corpora: Towards a New Method in Historical Semantics,” in Proceedings of the Conference on New Methods in Historical Corpora, P. Bennett, M. Durrell, S. Scheible, and R. J. Whitt, Eds., Tübingen: Narr, 2012, vol. 3, pp. 257-274.
[BibTeX]
@INCOLLECTION{Mehler:Schwandt:Gleim:Ernst:2012,
    publisher={Narr},
    booktitle={Proceedings of the Conference on New Methods in Historical Corpora},
    pages={257--274},
    author={Mehler, Alexander and Schwandt, Silke and Gleim, Rüdiger and Ernst, Alexandra},
    series={Corpus linguistics and Interdisciplinary perspectives on language (CLIP)},
    editor={Paul Bennett and Martin Durrell and Silke Scheible and Richard J. Whitt},
    year={2012},
    volume={3},
    title={Inducing Linguistic Networks from Historical Corpora: Towards a New Method in Historical Semantics},
    address={Tübingen}}
[5] M. Lux, J. Laußmann, A. Mehler, and C. Menßen, “An Online Platform for Visualizing Time Series in Linguistic Networks,” in Proceedings of the Demonstrations Session of the 2011 IEEE / WIC / ACM International Conferences on Web Intelligence and Intelligent Agent Technology, 22 – 27 August 2011, Lyon, France, 2011.
[BibTeX]
@INPROCEEDINGS{Lux:Laussmann:Mehler:Menssen:2011,
    booktitle={Proceedings of the Demonstrations Session of the 2011 IEEE / WIC / ACM International Conferences on Web Intelligence and Intelligent Agent Technology, 22 - 27 August 2011, Lyon, France},
    website={http://dl.acm.org/citation.cfm?id=2052396},
    author={Lux, Markus and Lau{\ss}mann, Jan and Mehler, Alexander and Men{\ss}en, Christian},
    year={2011},
    poster={https://hucompute.org/wp-content/uploads/2015/08/wi-iat-poster-2011.pdf},
    title={An Online Platform for Visualizing Time Series in Linguistic Networks}}
[6] Unknown bibtex entry with key [Robinson:2013]
[BibTeX]
[7] [pdf] R. Gleim, A. Mehler, and A. Ernst, “SOA implementation of the eHumanities Desktop,” in Proceedings of the Workshop on Service-oriented Architectures (SOAs) for the Humanities: Solutions and Impacts, Digital Humanities 2012, Hamburg, Germany, 2012.
[BibTeX]
@INPROCEEDINGS{Gleim:Mehler:Ernst:2012,
    booktitle={Proceedings of the Workshop on Service-oriented Architectures (SOAs) for the Humanities: Solutions and Impacts, Digital Humanities 2012, Hamburg, Germany},
    author={Gleim, Rüdiger and Mehler, Alexander and Ernst, Alexandra},
    year={2012},
    title={SOA implementation of the eHumanities Desktop},
    abstract={The eHumanities Desktop is a system which allows users to upload, organize and share resources using a web interface. Furthermore resources can be processed, annotated and analyzed in various ways. Registered users can organize themselves in groups and collaboratively work on their data. The eHumanities Desktop is platform independent and runs in a web browser. This paper presents the system focusing on its service orientation and process management.},
    pdf={https://hucompute.org/wp-content/uploads/2015/08/dhc2012.pdf}}
[8] [doi] G. Abrami, A. Mehler, and D. Pravida, “Fusing Text and Image Data with the Help of the OWLnotator,” in Human Interface and the Management of Information. Information and Knowledge Design, S. Yamamoto, Ed., Springer International Publishing, 2015, vol. 9172, pp. 261-272.
[BibTeX]
@INCOLLECTION{Abrami:Mehler:Pravida:2015:b,
    booktitle={Human Interface and the Management of Information. Information and Knowledge Design},
    publisher={Springer International Publishing},
    editor={Yamamoto, Sakae},
    pages={261-272},
    series={Lecture Notes in Computer Science},
    Volume={9172},
    Doi={10.1007/978-3-319-20612-7_25},
    ISBN={978-3-319-20611-0},
    language={English},
    website={http://dx.doi.org/10.1007/978-3-319-20612-7_25},
    author={Abrami, Giuseppe and Mehler, Alexander and Pravida, Dietmar},
    year={2015},
    title={Fusing Text and Image Data with the Help of the OWLnotator}}
[9] Unknown bibtex entry with key [White:2009]
[BibTeX]
[10] [pdf] R. Gleim, A. Hoenen, N. Diewald, A. Mehler, and A. Ernst, “Modeling, Building and Maintaining Lexica for Corpus Linguistic Studies by Example of Late Latin,” in Corpus Linguistics 2011, 20-22 July, Birmingham, 2011.
[BibTeX]
@INPROCEEDINGS{Gleim:Hoenen:Diewald:Mehler:Ernst:2011,
    booktitle={Corpus Linguistics 2011, 20-22 July, Birmingham},
    author={Gleim, Rüdiger and Hoenen, Armin and Diewald, Nils and Mehler, Alexander and Ernst, Alexandra},
    year={2011},
    title={Modeling, Building and Maintaining Lexica for Corpus Linguistic Studies by Example of Late Latin},
    pdf={https://hucompute.org/wp-content/uploads/2015/08/Paper-48.pdf}}
[11] Unknown bibtex entry with key [Francopoulo:2006]
[BibTeX]
[12] [pdf] G. Abrami, M. Freiberg, and P. Warner, “Managing and Annotating Historical Multimodal Corpora with the eHumanities Desktop – An outline of the current state of the LOEWE project Illustrations of Goethe s Faust,” in Historical Corpora, 2015, pp. 353-363.
[BibTeX]
@INPROCEEDINGS{Abrami:Freiberg:Warner:2015,
    pdf={https://hucompute.org/wp-content/uploads/2015/08/AbramiFreibergWarner_HC_2012.pdf},
    website={http://www.narr-shop.de/historical-corpora.html},
    booktitle={Historical Corpora},
    pages={353 - 363},
    author={Abrami, Giuseppe and Freiberg, Michael and Warner, Paul},
    year={2015},
    title={Managing and Annotating Historical Multimodal Corpora with the eHumanities Desktop - An outline of the current state of the LOEWE project Illustrations of Goethe s Faust},
    abstract={Text corpora are structured sets of text segments that can be annotated or interrelated. Expanding on this, we can define a database of images as an iconographic multimodal corpus with annotated images and the relations between images as well as between images and texts. The Goethe-Museum in Frankfurt holds a significant collection of art work and texts relating to Goethe’s Faust from the early 19th century until the present. In this project we create a database containing digitized items from this collection, and extend a tool, the ImageDB in the eHumanities Desktop, to annotate and provide relations between resources. This article gives an overview of the project and provides some technical details. Furthermore we show newly implemented features, explain the challenge of creating an ontology on multimodal corpora and give a forecast for future work.}}