Rüdiger Gleim

Rüdiger Gleim

Rüdiger Gleim

Scientific Assistant

Robert Mayer Str. 10
Tel: +49 69-798-28926
Room 402
Office Hour Thursday, 3-4 PM

 

 

ContactCurriculum VitaeResearch InterestsPublications


Linguistic Databases

Almost any study in corpus linguistics boils down to construct, annotate, represent and analyze linguistic data. The requirements on a proper database are often contradicting:

  • It should be able to scale well with ever growing corpora such as the Wikipedia- while still being flexible for annotation and edition.
  • It should serve a broad spectrum of analyses by minimizing the need to transform data for a specific kind of analysis, while still being space efficient.
  • The data model should be able to mediate between standard formats while not becoming over-generic and difficult to handle.
  • ….

Designing and developing linguistic databases has become a major topic for me. Realizing that there is no such thing as the ultimate solution Iam interested in all kinds of database management systems and paradigms including relational-, graph-, distributed- and NoSQL databases as well as APIs for persistent storage.

Total: 24

2018 (1)

  • A. Mehler, W. Hemati, R. Gleim, and D. Baumartz, “Auf dem Weg zu einer Infrastruktur für die verteilte interaktive evolutionäre Verarbeitung natürlicher Sprache,” in Forschungsinfrastrukturen und digitale Informationssysteme in der germanistischen Sprachwissenschaft , H. Lobin, R. Schneider, and A. Witt, Eds., Berlin: De Gruyter, 2018, vol. 6.
    [BibTeX]

    @InCollection{Mehler:Hemati:Gleim:Baumartz:2018,
        Title                    = {{Auf dem Weg zu einer Infrastruktur für die verteilte interaktive evolutionäre Verarbeitung natürlicher Sprache}},
        Author                   = {Alexander Mehler and Wahed Hemati and Rüdiger Gleim and Daniel Baumartz},
        Booktitle                = {Forschungsinfrastrukturen und digitale Informationssysteme in der germanistischen
            Sprachwissenschaft },
        volume = {6},
        Year                     = {2018},
        Address                  = {Berlin},
        Editor                   = {Henning Lobin and Roman Schneider and Andreas Witt},
        Publisher                = {De Gruyter}
    }

2017 (1)

  • A. Mehler, R. Gleim, W. Hemati, and T. Uslu, “Skalenfreie online soziale Lexika am Beispiel von Wiktionary,” in Proceedings of 53rd Annual Conference of the Institut für Deutsche Sprache (IDS), March 14-16, Mannheim, Germany, Berlin, 2017.
    [BibTeX]

    @InProceedings{Mehler:Gleim:Hemati:Uslu:2017,
        Title                    = {{Skalenfreie online soziale Lexika am Beispiel von Wiktionary}},
        Author                   = {Alexander Mehler and Rüdiger Gleim and Wahed Hemati and Tolga Uslu},
        Booktitle                = {Proceedings of 53rd Annual Conference of the Institut für Deutsche Sprache (IDS), March 14-16, Mannheim, Germany},
        Year                     = {2017},
        Address                  = {Berlin},
        Editor                   = {Stefan Engelberg and Henning Lobin and Kathrin Steyer and Sascha Wolfer},
        Publisher                = {De Gruyter}
    }

2016 (3)

  • [http://dh2016.adho.org/abstracts/250] A. Mehler, B. Wagner, and R. Gleim, “Wikidition: Towards A Multi-layer Network Model of Intertextuality,” in Proceedings of DH 2016, 12-16 July, 2016.
    [Abstract] [BibTeX]

    The paper presents Wikidition, a novel text mining tool for generating online editions of text corpora. It explores lexical, sentential and textual relations to span multi-layer networks (linkification) that allow for browsing syntagmatic and paradigmatic relations among the constituents of its input texts. In this way, relations of text reuse can be explored together with lexical relations within the same literary memory information system. Beyond that, Wikidition contains a module for automatic lexiconisation to extract author specific vocabularies. Based on linkification and lexiconisation, Wikidition does not only allow for traversing input corpora on different (lexical, sentential and textual) levels. Rather, its readers can also study the vocabulary of authors on several levels of resolution including superlemmas, lemmas, syntactic words and wordforms. We exemplify Wikidition by a range of literary texts and evaluate it by means of the apparatus of quantitative network analysis.
    @InProceedings{Mehler:Wagner:Gleim:2016,
      Title =     {Wikidition: Towards A Multi-layer Network Model of Intertextuality},
      Author =     {Mehler, Alexander and Wagner, Benno and Gleim, R\"{u}diger},
      Booktitle =     {Proceedings of DH 2016, 12-16 July},
      Year =     2016,
      location =     {Kraków},
      series =     {DH 2016},
      url = {http://dh2016.adho.org/abstracts/250},
      abstract = {The paper presents Wikidition, a novel text mining tool for generating online editions of text corpora. It explores lexical, sentential and textual relations to span multi-layer networks (linkification) that allow for browsing syntagmatic and paradigmatic relations among the constituents of its input texts. In this way, relations of text reuse can be explored together with lexical relations within the same literary memory information system. Beyond that, Wikidition contains a module for automatic lexiconisation to extract author specific vocabularies. Based on linkification and lexiconisation, Wikidition does not only allow for traversing input corpora on different (lexical, sentential and textual) levels. Rather, its readers can also study the vocabulary of authors on several levels of resolution including superlemmas, lemmas, syntactic words and wordforms. We exemplify Wikidition by a range of literary texts and evaluate it by means of the apparatus of quantitative network analysis.}
    }
  • [PDF] S. Eger, R. Gleim, and A. Mehler, “Lemmatization and Morphological Tagging in German and Latin: A comparison and a survey of the state-of-the-art,” in Proceedings of the 10th International Conference on Language Resources and Evaluation, 2016.
    [BibTeX]

    @InProceedings{Eger:Mehler:Gleim:2016,
      Title =     {Lemmatization and Morphological Tagging in {German}
                      and {Latin}: A comparison and a survey of the
                      state-of-the-art},
      Author =     {Eger, Steffen and Gleim, R\"{u}diger and Mehler, Alexander},
      Booktitle =     {Proceedings of the 10th International Conference on
                      Language Resources and Evaluation},
      Year =     2016,
      location =     {Portoro\v{z} (Slovenia)},
      series =     {LREC 2016},
      pdf = {http://hucompute.org/wp-content/uploads/2016/04/lrec_eger_gleim_mehler.pdf}
    }
  • [DOI] A. Mehler, R. Gleim, T. vor der Brück, W. Hemati, T. Uslu, and S. Eger, “Wikidition: Automatic Lexiconization and Linkification of Text Corpora,” Information Technology, pp. 70-79, 2016.
    [Abstract] [BibTeX]

    We introduce a new text technology, called Wikidition, which automatically generates large scale editions of corpora of natural language texts. Wikidition combines a wide range of text mining tools for automatically linking lexical, sentential and textual units. This includes the extraction of corpus-specific lexica down to the level of syntactic words and their grammatical categories. To this end, we introduce a novel measure of text reuse and exemplify Wikidition by means of the capitularies, that is, a corpus of Medieval Latin texts.
    @Article{Mehler:et:al:2016,
      Title                    = {Wikidition: Automatic Lexiconization and Linkification of Text Corpora},
      Author                   = {Alexander Mehler and Rüdiger Gleim and Tim vor der Brück and Wahed Hemati and Tolga Uslu and Steffen Eger},
      Journal                  = {Information Technology},
      Year                     = {2016},
      pages                    = {70-79},
      doi                      = {10.1515/itit-2015-0035},
      abstract       = {We introduce a new text technology, called Wikidition, which automatically generates large
    scale editions of corpora of natural language texts. Wikidition combines a wide range of
    text mining tools for automatically linking lexical, sentential and textual units. This
    includes the extraction of corpus-specific lexica down to the level of syntactic words and
    their grammatical categories. To this end, we introduce a novel measure of text reuse and
    exemplify Wikidition by means of the capitularies, that is, a corpus of Medieval Latin
    texts.}
    }

2015 (3)

  • A. Mehler and R. Gleim, “Linguistic Networks — An Online Platform for Deriving Collocation Networks from Natural Language Texts,” in Towards a Theoretical Framework for Analyzing Complex Linguistic Networks, A. Mehler, A. Lücking, S. Banisch, P. Blanchard, and B. Frank-Job, Eds., Springer, 2015.
    [BibTeX]

    @INCOLLECTION{Mehler:Gleim:2015:a,
        publisher={Springer},
        editor={Mehler, Alexander and Lücking, Andy and Banisch, Sven and Blanchard, Philippe and Frank-Job, Barbara},
        year={2015},
        booktitle={Towards a Theoretical Framework for Analyzing Complex Linguistic Networks},
        title={Linguistic Networks -- An Online Platform for Deriving Collocation Networks from Natural Language Texts},
        series={Understanding Complex Systems},
        author={Mehler, Alexander and Gleim, Rüdiger}}
  • A. Mehler, T. vor der Brück, R. Gleim, and T. Geelhaar, “Towards a Network Model of the Coreness of Texts: An Experiment in Classifying Latin Texts using the TTLab Latin Tagger,” in Text Mining: From Ontology Learning to Automated text Processing Applications, C. Biemann and A. Mehler, Eds., Berlin/New York: Springer, 2015, pp. 87-112. appears
    [Abstract] [BibTeX]

    The analysis of longitudinal corpora of historical texts requires the integrated development of tools for automatically preprocessing these texts and for building representation models of their genre- and register-related dynamics. In this chapter we present such a joint endeavor that ranges from resource formation via preprocessing to network-based text representation and classification. We start with presenting the so-called TTLab Latin Tagger (TLT) that preprocesses texts of classical and medieval Latin. Its lexical resource in the form of the Frankfurt Latin Lexicon (FLL) is also briefly introduced. As a first test case for showing the expressiveness of these resources, we perform a tripartite classification task of authorship attribution, genre detection and a combination thereof. To this end, we introduce a novel text representation model that explores the core structure (the so-called coreness) of lexical network representations of texts. Our experiment shows the expressiveness of this representation format and mediately of our Latin preprocessor.
    @INCOLLECTION{Mehler:Brueck:Gleim:Geelhaar:2015,
        publisher={Springer},
        series={Theory and Applications of Natural Language Processing},
        booktitle={Text Mining: From Ontology Learning to Automated text Processing Applications},
        pages={87-112},
        editor={Chris Biemann and Alexander Mehler},
        author={Mehler, Alexander and vor der Brück, Tim and Gleim, Rüdiger and Geelhaar, Tim},
        note={appears},
        address={Berlin/New York},
        year={2015},
        title={Towards a Network Model of the Coreness of Texts: An Experiment in Classifying Latin Texts using the TTLab Latin Tagger},
        abstract={The analysis of longitudinal corpora of historical texts requires the integrated development of tools for automatically preprocessing these texts and for building representation models of their genre- and register-related dynamics. In this chapter we present such a joint endeavor that ranges from resource formation via preprocessing to network-based text representation and classification. We start with presenting the so-called TTLab Latin Tagger (TLT) that preprocesses texts of classical and medieval Latin. Its lexical resource in the form of the Frankfurt Latin Lexicon (FLL) is also briefly introduced. As a first test case for showing the expressiveness of these resources, we perform a tripartite classification task of authorship attribution, genre detection and a combination thereof. To this end, we introduce a novel text representation model that explores the core structure (the so-called coreness) of lexical network representations of texts. Our experiment shows the expressiveness of this representation format and mediately of our Latin preprocessor.},
        website={http://link.springer.com/chapter/10.1007/978-3-319-12655-5_5}}
  • [PDF] R. Gleim and A. Mehler, “TTLab Preprocessor – Eine generische Web-Anwendung für die Vorverarbeitung von Texten und deren Evaluation,” in Accepted in the Proceedings of the Jahrestagung der Digital Humanities im deutschsprachigen Raum, 2015.
    [BibTeX]

    @INPROCEEDINGS{Gleim:Mehler:2015,
        booktitle={Accepted in the Proceedings of the Jahrestagung der Digital Humanities im deutschsprachigen Raum},
        author={Gleim, Rüdiger and Mehler, Alexander},
        year={2015},
        pdf={https://hucompute.org/wp-content/uploads/2015/08/Gleim_Mehler_PrePro_DHGraz2015.pdf},
        title={TTLab Preprocessor – Eine generische Web-Anwendung für die Vorverarbeitung von Texten und deren Evaluation}}

2013 (1)

  • A. Mehler, C. Stegbauer, and R. Gleim, “Zur Struktur und Dynamik der kollaborativen Plagiatsdokumentation am Beispiel des GuttenPlag Wiki: eine Vorstudie,” in Die Dynamik sozialer und sprachlicher Netzwerke. Konzepte, Methoden und empirische Untersuchungen am Beispiel des WWW, B. Frank-Job, A. Mehler, and T. Sutter, Eds., Wiesbaden: VS Verlag, 2013.
    [BibTeX]

    @INCOLLECTION{Mehler:Stegbauer:Gleim:2013,
        publisher={VS Verlag},
        booktitle={Die Dynamik sozialer und sprachlicher Netzwerke. Konzepte, Methoden und empirische Untersuchungen am Beispiel des WWW},
        author={Mehler, Alexander and Stegbauer, Christian and Gleim, Rüdiger},
        editor={Frank-Job, Barbara and Mehler, Alexander and Sutter, Tilman},
        year={2013},
        title={Zur Struktur und Dynamik der kollaborativen Plagiatsdokumentation am Beispiel des GuttenPlag Wiki: eine Vorstudie},
        address={Wiesbaden}}

2012 (3)

  • [PDF] A. Mehler, C. Stegbauer, and R. Gleim, “Latent Barriers in Wiki-based Collaborative Writing,” in Proceedings of the Wikipedia Academy: Research and Free Knowledge. June 29 – July 1 2012, Berlin, 2012.
    [BibTeX]

    @INPROCEEDINGS{Mehler:Stegbauer:Gleim:2012:b,
        pdf={https://hucompute.org/wp-content/uploads/2015/08/12_Paper_Alexander_Mehler_Christian_Stegbauer_Ruediger_Gleim.pdf},
        booktitle={Proceedings of the Wikipedia Academy: Research and Free Knowledge. June 29 - July 1 2012},
        author={Mehler, Alexander and Stegbauer, Christian and Gleim, Rüdiger},
        month={July},
        year={2012},
        title={Latent Barriers in Wiki-based Collaborative Writing},
        address={Berlin}}
  • [PDF] R. Gleim, A. Mehler, and A. Ernst, “SOA implementation of the eHumanities Desktop,” in Proceedings of the Workshop on Service-oriented Architectures (SOAs) for the Humanities: Solutions and Impacts, Digital Humanities 2012, Hamburg, Germany, 2012.
    [Abstract] [BibTeX]

    The eHumanities Desktop is a system which allows users to upload, organize and share resources using a web interface. Furthermore resources can be processed, annotated and analyzed in various ways. Registered users can organize themselves in groups and collaboratively work on their data. The eHumanities Desktop is platform independent and runs in a web browser. This paper presents the system focusing on its service orientation and process management.
    @INPROCEEDINGS{Gleim:Mehler:Ernst:2012,
        booktitle={Proceedings of the Workshop on Service-oriented Architectures (SOAs) for the Humanities: Solutions and Impacts, Digital Humanities 2012, Hamburg, Germany},
        author={Gleim, Rüdiger and Mehler, Alexander and Ernst, Alexandra},
        year={2012},
        title={SOA implementation of the eHumanities Desktop},
        abstract={The eHumanities Desktop is a system which allows users to upload, organize and share resources using a web interface. Furthermore resources can be processed, annotated and analyzed in various ways. Registered users can organize themselves in groups and collaboratively work on their data. The eHumanities Desktop is platform independent and runs in a web browser. This paper presents the system focusing on its service orientation and process management.},
        pdf={https://hucompute.org/wp-content/uploads/2015/08/dhc2012.pdf}}
  • A. Mehler, S. Schwandt, R. Gleim, and A. Ernst, “Inducing Linguistic Networks from Historical Corpora: Towards a New Method in Historical Semantics,” in Proceedings of the Conference on New Methods in Historical Corpora, P. Bennett, M. Durrell, S. Scheible, and R. J. Whitt, Eds., Tübingen: Narr, 2012, vol. 3, pp. 257-274.
    [BibTeX]

    @INCOLLECTION{Mehler:Schwandt:Gleim:Ernst:2012,
        publisher={Narr},
        booktitle={Proceedings of the Conference on New Methods in Historical Corpora},
        pages={257--274},
        author={Mehler, Alexander and Schwandt, Silke and Gleim, Rüdiger and Ernst, Alexandra},
        series={Corpus linguistics and Interdisciplinary perspectives on language (CLIP)},
        editor={Paul Bennett and Martin Durrell and Silke Scheible and Richard J. Whitt},
        year={2012},
        volume={3},
        title={Inducing Linguistic Networks from Historical Corpora: Towards a New Method in Historical Semantics},
        address={Tübingen}}

2011 (3)

  • [PDF] A. Mehler, S. Schwandt, R. Gleim, and B. Jussen, “Der eHumanities Desktop als Werkzeug in der historischen Semantik: Funktionsspektrum und Einsatzszenarien,” Journal for Language Technology and Computational Linguistics (JLCL), vol. 26, iss. 1, pp. 97-117, 2011.
    [Abstract] [BibTeX]

    Die Digital Humanities bzw. die Computational Humanities entwickeln sich zu eigenständigen Disziplinen an der Nahtstelle von Geisteswissenschaft und Informatik. Diese Entwicklung betrifft zunehmend auch die Lehre im Bereich der geisteswissenschaftlichen Fachinformatik. In diesem Beitrag thematisieren wir den eHumanities Desktop als ein Werkzeug für diesen Bereich der Lehre. Dabei geht es genauer um einen Brückenschlag zwischen Geschichtswissenschaft und Informatik: Am Beispiel der historischen Semantik stellen wir drei Lehrszenarien vor, in denen der eHumanities Desktop in der geschichtswissenschaftlichen Lehre zum Einsatz kommt. Der Beitrag schliesst mit einer Anforderungsanalyse an zukünftige Entwicklungen in diesem Bereich.
    @ARTICLE{Mehler:Schwandt:Gleim:Jussen:2011,
        journal={Journal for Language Technology and Computational Linguistics (JLCL)},
        pdf={http://media.dwds.de/jlcl/2011_Heft1/8.pdf },
        pages={97-117},
        number={1},
        author={Mehler, Alexander and Schwandt, Silke and Gleim, Rüdiger and Jussen, Bernhard},
        volume={26},
        year={2011},
        title={Der eHumanities Desktop als Werkzeug in der historischen Semantik: Funktionsspektrum und Einsatzszenarien},
        abstract={Die Digital Humanities bzw. die Computational Humanities entwickeln sich zu eigenst{\"a}ndigen Disziplinen an der Nahtstelle von Geisteswissenschaft und Informatik. Diese Entwicklung betrifft zunehmend auch die Lehre im Bereich der geisteswissenschaftlichen Fachinformatik. In diesem Beitrag thematisieren wir den eHumanities Desktop als ein Werkzeug für diesen Bereich der Lehre. Dabei geht es genauer um einen Brückenschlag zwischen Geschichtswissenschaft und Informatik: Am Beispiel der historischen Semantik stellen wir drei Lehrszenarien vor, in denen der eHumanities Desktop in der geschichtswissenschaftlichen Lehre zum Einsatz kommt. Der Beitrag schliesst mit einer Anforderungsanalyse an zukünftige Entwicklungen in diesem Bereich.}}
  • A. Mehler, N. Diewald, U. Waltinger, R. Gleim, D. Esch, B. Job, T. Küchelmann, O. Abramov, and P. Blanchard, “Evolution of Romance Language in Written Communication: Network Analysis of Late Latin and Early Romance Corpora,” Leonardo, vol. 44, iss. 3, 2011.
    [Abstract] [BibTeX]

    In this paper, the authors induce linguistic networks as a prerequisite for detecting language change by means of the Patrologia Latina, a corpus of Latin texts from the 4th to the 13th century.
    @ARTICLE{Mehler:Diewald:Waltinger:et:al:2010,
        publisher={MIT Press},
        journal={Leonardo},
        number={3},
        author={Mehler, Alexander and Diewald, Nils and Waltinger, Ulli and Gleim, Rüdiger and Esch, Dietmar and Job, Barbara and Küchelmann, Thomas and Abramov, Olga and Blanchard, Philippe},
        volume={44},
        year={2011},
        title={Evolution of Romance Language in Written Communication: Network Analysis of Late Latin and Early Romance Corpora},
        pdf={https://hucompute.org/wp-content/uploads/2015/08/mehler_diewald_waltinger_gleim_esch_job_kuechelmann_pustylnikov_blanchard_2010.pdf}
        website={http://www.mitpressjournals.org/doi/abs/10.1162/LEON_a_00175#.VLzsoivF_Cc},
        abstract={In this paper, the authors induce linguistic networks as a prerequisite for detecting language change by means of the Patrologia Latina, a corpus of Latin texts from the 4th to the 13th century.}}
  • [PDF] R. Gleim, A. Hoenen, N. Diewald, A. Mehler, and A. Ernst, “Modeling, Building and Maintaining Lexica for Corpus Linguistic Studies by Example of Late Latin,” in Corpus Linguistics 2011, 20-22 July, Birmingham, 2011.
    [BibTeX]

    @INPROCEEDINGS{Gleim:Hoenen:Diewald:Mehler:Ernst:2011,
        booktitle={Corpus Linguistics 2011, 20-22 July, Birmingham},
        author={Gleim, Rüdiger and Hoenen, Armin and Diewald, Nils and Mehler, Alexander and Ernst, Alexandra},
        year={2011},
        title={Modeling, Building and Maintaining Lexica for Corpus Linguistic Studies by Example of Late Latin},
        pdf={https://hucompute.org/wp-content/uploads/2015/08/Paper-48.pdf}}

2010 (2)

  • [PDF] A. Mehler, R. Gleim, U. Waltinger, and N. Diewald, “Time Series of Linguistic Networks by Example of the Patrologia Latina,” in Proceedings of INFORMATIK 2010: Service Science, September 27 – October 01, 2010, Leipzig, 2010, pp. 609-616.
    [BibTeX]

    @INPROCEEDINGS{Mehler:Gleim:Waltinger:Diewald:2010,
        publisher={GI},
        booktitle={Proceedings of INFORMATIK 2010: Service Science, September 27 - October 01, 2010, Leipzig},
        author={Mehler, Alexander and Gleim, Rüdiger and Waltinger, Ulli and Diewald, Nils},
        editor={F{\"a}hnrich, Klaus-Peter and Franczyk, Bogdan},
        year={2010},
        volume={2},
        pages={609-616},
        title={Time Series of Linguistic Networks by Example of the Patrologia Latina},
        series={Lecture Notes in Informatics},
        pdf={http://subs.emis.de/LNI/Proceedings/Proceedings176/586.pdf}}
  • [PDF] R. Gleim, P. Warner, and A. Mehler, “eHumanities Desktop – An Architecture for Flexible Annotation in Iconographic Research,” in Proceedings of the 6th International Conference on Web Information Systems and Technologies (WEBIST ’10), April 7-10, 2010, Valencia, 2010.
    [BibTeX]

    @INPROCEEDINGS{Gleim:Warner:Mehler:2010,
        pdf={https://hucompute.org/wp-content/uploads/2015/08/gleim_warner_mehler_2010.pdf},
        booktitle={Proceedings of the 6th International Conference on Web Information Systems and Technologies (WEBIST '10), April 7-10, 2010, Valencia},
        author={Gleim, Rüdiger and Warner, Paul and Mehler, Alexander},
        year={2010},
        title={eHumanities Desktop - An Architecture for Flexible Annotation in Iconographic Research},
        website={https://www.researchgate.net/publication/220724277_eHumanities_Desktop_-_An_Architecture_for_Flexible_Annotation_in_Iconographic_Research}}

2009 (2)

  • [PDF] R. Gleim, U. Waltinger, A. Ernst, A. Mehler, D. Esch, and T. Feith, “The eHumanities Desktop – An Online System for Corpus Management and Analysis in Support of Computing in the Humanities,” in Proceedings of the Demonstrations Session of the 12th Conference of the European Chapter of the Association for Computational Linguistics EACL 2009, 30 March – 3 April, Athens, 2009.
    [BibTeX]

    @INPROCEEDINGS{Gleim:Waltinger:Ernst:Mehler:Esch:Feith:2009,
        pdf={https://hucompute.org/wp-content/uploads/2015/08/gleim_waltinger_ernst_mehler_esch_feith_2009.pdf},
        booktitle={Proceedings of the Demonstrations Session of the 12th Conference of the European Chapter of the Association for Computational Linguistics EACL 2009, 30 March – 3 April, Athens},
        author={Gleim, Rüdiger and Waltinger, Ulli and Ernst, Alexandra and Mehler, Alexander and Esch, Dietmar and Feith, Tobias},
        year={2009},
        title={The eHumanities Desktop – An Online System for Corpus Management and Analysis in Support of Computing in the Humanities}}
  • [PDF] U. Waltinger, A. Mehler, and R. Gleim, “Social Semantics And Its Evaluation By Means of Closed Topic Models: An SVM-Classification Approach Using Semantic Feature Replacement By Topic Generalization,” in Proceedings of the Biennial GSCL Conference 2009, September 30 – October 2, Universität Potsdam, 2009.
    [BibTeX]

    @INPROCEEDINGS{Waltinger:Mehler:Gleim:2009:a,
        booktitle={Proceedings of the Biennial GSCL Conference 2009, September 30 – October 2, Universit{\"a}t Potsdam},
        author={Waltinger, Ulli and Mehler, Alexander and Gleim, Rüdiger},
        year={2009},
        pdf={https://hucompute.org/wp-content/uploads/2015/08/GSCL_2009_WaltingerMehlerGleim_camera_ready.pdf},
        title={Social Semantics And Its Evaluation By Means of Closed Topic Models: An SVM-Classification Approach Using Semantic Feature Replacement By Topic Generalization}}

2008 (1)

  • [PDF] A. Mehler, R. Gleim, A. Ernst, and U. Waltinger, “WikiDB: Building Interoperable Wiki-Based Knowledge Resources for Semantic Databases,” Sprache und Datenverarbeitung. International Journal for Language Data Processing, vol. 32, iss. 1, pp. 47-70, 2008.
    [Abstract] [BibTeX]

    This article describes an API for exploring the logical document and the logical network structure of wikis. It introduces an algorithm for the semantic preprocessing, filtering and typing of these building blocks. Further, this article models the process of wiki generation based on a unified format of syntactic, semantic and pragmatic representations. This three-level approach to make accessible syntactic, semantic and pragmatic aspects of wiki-based structure formation is complemented by a corresponding database model – called WikiDB – and an API operating thereon. Finally, the article provides an empirical study of using the three-fold representation format in conjunction with WikiDB.
    @ARTICLE{Mehler:Gleim:Ernst:Waltinger:2008,
        pdf={http://www.ulliwaltinger.de/pdf/Konvens_2008_WikiDB_Building_Semantic_Databases_MehlerGleimErnstWaltinger.pdf},
        journal={Sprache und Datenverarbeitung. International Journal for Language Data Processing},
        pages={47-70},
        number={1},
        author={Mehler, Alexander and Gleim, Rüdiger and Ernst, Alexandra and Waltinger, Ulli},
        volume={32},
        year={2008},
        title={WikiDB: Building Interoperable Wiki-Based Knowledge Resources for Semantic Databases},
        abstract={This article describes an API for exploring the logical document and the logical network structure of wikis. It introduces an algorithm for the semantic preprocessing, filtering and typing of these building blocks. Further, this article models the process of wiki generation based on a unified format of syntactic, semantic and pragmatic representations. This three-level approach to make accessible syntactic, semantic and pragmatic aspects of wiki-based structure formation is complemented by a corresponding database model – called WikiDB – and an API operating thereon. Finally, the article provides an empirical study of using the three-fold representation format in conjunction with WikiDB.}}

2007 (1)

  • [PDF] R. Gleim, A. Mehler, H. Eikmeyer, and H. Rieser, “Ein Ansatz zur Repräsentation und Verarbeitung großer Korpora multimodaler Daten,” in Data Structures for Linguistic Resources and Applications. Proceedings of the Biennial GLDV Conference 2007, 11.–13. April, Universität Tübingen, Tübingen, 2007, pp. 275-284.
    [BibTeX]

    @INPROCEEDINGS{Gleim:Mehler:Eikmeyer:Rieser:2007,
        publisher={Narr},
        pdf={https://hucompute.org/wp-content/uploads/2015/08/gleim_mehler_eikmeyer_rieser_2007.pdf},
        booktitle={Data Structures for Linguistic Resources and Applications. Proceedings of the Biennial GLDV Conference 2007, 11.–13. April, Universit{\"a}t Tübingen},
        pages={275-284},
        author={Gleim, Rüdiger and Mehler, Alexander and Eikmeyer, Hans-Jürgen and Rieser, Hannes},
        editor={Rehm, Georg and Witt, Andreas and Lemnitzer, Lothar},
        year={2007},
        title={Ein Ansatz zur Repr{\"a}sentation und Verarbeitung gro{\ss}er Korpora multimodaler Daten},
        address={Tübingen}}

2006 (2)

  • [PDF] R. Gleim, “HyGraph – Ein Framework zur Extraktion, Repräsentation und Analyse webbasierter Hypertextstrukturen,” in Sprachtechnologie, mobile Kommunikation und linguistische Ressourcen. Beiträge zur GLDV-Tagung 2005, Universität Bonn, Frankfurt a. M., 2006, pp. 42-53.
    [BibTeX]

    @INPROCEEDINGS{Gleim:2006,
        publisher={Lang},
        pdf={http://www.hucompute.org/data/gleim/pdf/GLDV2005-HyGraph-Framework.pdf},
        booktitle={Sprachtechnologie, mobile Kommunikation und linguistische Ressourcen. Beitr{\"a}ge zur GLDV-Tagung 2005, Universit{\"a}t Bonn},
        pages={42-53},
        author={Gleim, Rüdiger},
        editor={Fisseni, Bernhard and Schmitz, Hans-Christian and Schröder, Bernhard and Wagner, Petra},
        year={2006},
        title={HyGraph - Ein Framework zur Extraktion, Repr{\"a}sentation und Analyse webbasierter Hypertextstrukturen},
        website={https://www.researchgate.net/publication/268294000_HyGraph__Ein_Framework_zur_Extraktion_Reprsentation_und_Analyse_webbasierter_Hypertextstrukturen},
        pdf = {https://hucompute.org/wp-content/uploads/2016/10/GLDV2005-HyGraph-Framework.pdf},
        address={Frankfurt a. M.}}
  • A. Mehler, M. Dehmer, and R. Gleim, “Towards Logical Hypertext Structure – A Graph-Theoretic Perspective,” in Proceedings of the Fourth International Workshop on Innovative Internet Computing Systems (I2CS ’04), Berlin/New York, 2006, pp. 136-150.
    [Abstract] [BibTeX]

    Facing the retrieval problem according to the overwhelming set of documents online the adaptation of text categorization to web units has recently been pushed. The aim is to utilize categories of web sites and pages as an additional retrieval criterion. In this context, the bag-of-words model has been utilized just as HTML tags and link structures. In spite of promising results this adaptation stays in the framework of IR specific models since it neglects the content-based structuring inherent to hypertext units. This paper approaches hypertext modelling from the perspective of graph-theory. It presents an XML-based format for representing websites as hypergraphs. These hypergraphs are used to shed light on the relation of hypertext structure types and their web-based instances. We place emphasis on two characteristics of this relation: In terms of realizational ambiguity we speak of functional equivalents to the manifestation of the same structure type. In terms of polymorphism we speak of a single web unit which manifests different structure types. It is shown that polymorphism is a prevalent characteristic of web-based units. This is done by means of a categorization experiment which analyses a corpus of hypergraphs representing the structure and content of pages of conference websites. On this background we plead for a revision of text representation models by means of hypergraphs which are sensitive to the manifold structuring of web documents.
    @INPROCEEDINGS{Mehler:Dehmer:Gleim:2006,
        publisher={Springer},
        booktitle={Proceedings of the Fourth International Workshop on Innovative Internet Computing Systems (I2CS '04)},
        website={http://rd.springer.com/chapter/10.1007/11553762_14},
        pages={136-150},
        author={Mehler, Alexander and Dehmer, Matthias and Gleim, Rüdiger},
        series={Lecture Notes in Computer Science 3473},
        editor={Böhme, Thomas and Heyer, Gerhard},
        year={2006},
        title={Towards Logical Hypertext Structure - A Graph-Theoretic Perspective},
        address={Berlin/New York},
        abstract={Facing the retrieval problem according to the overwhelming set of documents online the adaptation of text categorization to web units has recently been pushed. The aim is to utilize categories of web sites and pages as an additional retrieval criterion. In this context, the bag-of-words model has been utilized just as HTML tags and link structures. In spite of promising results this adaptation stays in the framework of IR specific models since it neglects the content-based structuring inherent to hypertext units. This paper approaches hypertext modelling from the perspective of graph-theory. It presents an XML-based format for representing websites as hypergraphs. These hypergraphs are used to shed light on the relation of hypertext structure types and their web-based instances. We place emphasis on two characteristics of this relation: In terms of realizational ambiguity we speak of functional equivalents to the manifestation of the same structure type. In terms of polymorphism we speak of a single web unit which manifests different structure types. It is shown that polymorphism is a prevalent characteristic of web-based units. This is done by means of a categorization experiment which analyses a corpus of hypergraphs representing the structure and content of pages of conference websites. On this background we plead for a revision of text representation models by means of hypergraphs which are sensitive to the manifold structuring of web documents.}}

2004 (1)

  • [PDF] M. Dehmer, A. Mehler, and R. Gleim, “Aspekte der Kategorisierung von Webseiten,” in INFORMATIK 2004 – Informatik verbindet, Band 2, Beiträge der 34. Jahrestagung der Gesellschaft für Informatik e.V. (GI). Workshop Multimedia-Informationssysteme, 2004, pp. 39-43.
    [Abstract] [BibTeX]

    Im Zuge der Web-basierten Kommunikation tritt die Frage auf, inwiefern Webpages zum Zwecke ihrer inhaltsorientierten Filterung kategorisiert werden können. Diese Studie untersucht zwei Phänomene, welche die Bedingung der Möglichkeit einer solchen Kategorisierung betreffen (siehe [6]): Mit dem Begriff der funktionalen Aquivalenz beziehen wir uns auf das Phänomen, dass dieselbe Funktions- oder Inhaltskategorie durch völlig verschiedene Bausteine Web-basierter Dokumente manifestiert werden kann. Mit dem Begriff des Polymorphie beziehen wir uns auf das Phänomen, dass dasselbe Dokument zugleich mehrere Funktions- oder Inhaltskategorien manifestieren kann. Die zentrale Hypothese lautet, dass beide Phänomene für Web-basierte Hypertextstrukturen charakteristisch sind. Ist dies der Fall, so kann die automatische Kategorisierung von Hypertexten [2, 10] nicht mehr als eindeutige Zuordnung verstanden werden, bei der einem Dokument genau eine Kategorie zugeordnet wird. In diesem Sinne thematisiert das Papier die Frage nach der adäquaten Modellierung multimedialer Dokumente.
    @INPROCEEDINGS{Dehmer:Mehler:Gleim:2004,
        publisher={GI},
        booktitle={INFORMATIK 2004 – Informatik verbindet, Band 2, Beitr{\"a}ge der 34. Jahrestagung der Gesellschaft für Informatik e.V. (GI). Workshop Multimedia-Informationssysteme},
        pages={39-43},
        author={Dehmer, Matthias and Mehler, Alexander and Gleim, Rüdiger},
        series={Lecture Notes in Informatics},
        volume={51},
        editor={Dadam, Peter and Reichert, Manfred},
        year={2004},
        title={Aspekte der Kategorisierung von Webseiten},
        pdf={http://subs.emis.de/LNI/Proceedings/Proceedings51/GI-Proceedings.51-11.pdf},
        website={https://www.researchgate.net/publication/221385316_Aspekte_der_Kategorisierung_von_Webseiten},
        abstract={Im Zuge der Web-basierten Kommunikation tritt die Frage auf, inwiefern Webpages zum Zwecke ihrer inhaltsorientierten Filterung kategorisiert werden können. Diese Studie untersucht zwei Ph{\"a}nomene, welche die Bedingung der Möglichkeit einer solchen Kategorisierung betreffen (siehe [6]): Mit dem Begriff der funktionalen Aquivalenz beziehen wir uns auf das Ph{\"a}nomen, dass dieselbe Funktions- oder Inhaltskategorie durch völlig verschiedene Bausteine Web-basierter Dokumente manifestiert werden kann. Mit dem Begriff des Polymorphie beziehen wir uns auf das Ph{\"a}nomen, dass dasselbe Dokument zugleich mehrere Funktions- oder Inhaltskategorien manifestieren kann. Die zentrale Hypothese lautet, dass beide Ph{\"a}nomene für Web-basierte Hypertextstrukturen charakteristisch sind. Ist dies der Fall, so kann die automatische Kategorisierung von Hypertexten [2, 10] nicht mehr als eindeutige Zuordnung verstanden werden, bei der einem Dokument genau eine Kategorie zugeordnet wird. In diesem Sinne thematisiert das Papier die Frage nach der ad{\"a}quaten Modellierung multimedialer Dokumente.}}