# Publications

## Total: 246

### 2018 (1)

• A. Mehler, W. Hemati, R. Gleim, and D. Baumartz, “Auf dem Weg zu einer Infrastruktur für die verteilte interaktive evolutionäre Verarbeitung natürlicher Sprache,” in Forschungsinfrastrukturen und digitale Informationssysteme in der germanistischen Sprachwissenschaft , H. Lobin, R. Schneider, and A. Witt, Eds., Berlin: De Gruyter, 2018, vol. 6.
[BibTeX]

@InCollection{Mehler:Hemati:Gleim:Baumartz:2018,
Title                    = {{Auf dem Weg zu einer Infrastruktur für die verteilte interaktive evolutionäre Verarbeitung natürlicher Sprache}},
Author                   = {Alexander Mehler and Wahed Hemati and Rüdiger Gleim and Daniel Baumartz},
Booktitle                = {Forschungsinfrastrukturen und digitale Informationssysteme in der germanistischen
Sprachwissenschaft },
volume = {6},
Year                     = {2018},
Editor                   = {Henning Lobin and Roman Schneider and Andreas Witt},
Publisher                = {De Gruyter}
}

### 2017 (11)

• A. Lücking, “Indexicals as Weak Descriptors,” in Proceedings of the 12th International Conference on Computational Semantics, Montpellier (France), 2017.
[BibTeX]

@InProceedings{Luecking:2017:c,
author={L\"{u}cking, Andy},
title={Indexicals as Weak Descriptors},
booktitle={Proceedings of the 12th International Conference on Computational Semantics},
year={2017},
series={IWCS 2017},
}
• A. Mehler, R. Gleim, W. Hemati, and T. Uslu, “Skalenfreie online soziale Lexika am Beispiel von Wiktionary,” in Proceedings of 53rd Annual Conference of the Institut für Deutsche Sprache (IDS), March 14-16, Mannheim, Germany, Berlin, 2017.
[BibTeX]

@InProceedings{Mehler:Gleim:Hemati:Uslu:2017,
Title                    = {{Skalenfreie online soziale Lexika am Beispiel von Wiktionary}},
Author                   = {Alexander Mehler and Rüdiger Gleim and Wahed Hemati and Tolga Uslu},
Booktitle                = {Proceedings of 53rd Annual Conference of the Institut für Deutsche Sprache (IDS), March 14-16, Mannheim, Germany},
Year                     = {2017},
Editor                   = {Stefan Engelberg and Henning Lobin and Kathrin Steyer and Sascha Wolfer},
Publisher                = {De Gruyter}
}
• A. Mehler, O. Zlatkin-Troitschanskaia, W. Hemati, D. Molerov, A. Lücking, and S. Schmidt, “Integrating Computational Linguistic Analysis of Multilingual Learning Data and Educational Measurement Approaches to Explore Student Learning in Higher Education,” in Positive Learning in the Age of Information (PLATO) — A blessing or a curse?, O. Zlatkin-Troitschanskaia, G. Wittum, and A. Dengel, Eds., Wiesbaden: Springer, 2017. in press
[Abstract] [BibTeX]

This chapter develops a computational linguistic model for analyzing and comparing multilingual data as well as its application to a large body of standardized assessment data from higher education. The approach employs both an automatic and a manual annotation of the data on several linguistic layers (including parts of speech, text structure and content). Quantitative features of the textual data are explored that are related to both the students’ (domain-specific knowledge) test results and their level of academic experience. The respective analysis involves statistics of distance correlation, text categorization with respect to text types (questions and distractors) as well as languages (English and German), and network analysis as a means to assess dependencies between features. The results indicate a correlation between correct test results of students and linguistic features of the verbal presentations of tests indicating a language influence on higher education test performance. It is also found that this influence relates to special language. Thus, this integrative modeling approach contributes a test basis for a large-scale analysis of learning data and points to a number of subsequent more detailed research.
@InCollection{Mehler:Zlatkin-Troitschanskaia:Hemati:Molerov:Luecking:Schmidt:2017,
Title                    = {Integrating Computational Linguistic Analysis of Multilingual Learning Data and Educational Measurement Approaches to Explore Student Learning in Higher Education},
Author                   = {Alexander Mehler and Olga Zlatkin-Troitschanskaia and Wahed Hemati and Dimitri Molerov and Andy Lücking and Susanne Schmidt},
Booktitle                = {Positive Learning in the Age of Information ({PLATO}) -- A blessing or a curse?},
Publisher                = {Springer},
Note = {in press},
Abstract = {This chapter develops a computational linguistic model for analyzing and comparing multilingual data as well as its application to a large body of standardized assessment data from higher education. The approach employs both an automatic and a manual annotation of the data on several linguistic layers (including parts of speech, text structure and content). Quantitative features of the textual data are explored that are related to both the students’ (domain-specific knowledge) test results and their level of academic experience. The respective analysis involves statistics of distance correlation, text categorization with respect to text types (questions and distractors) as well as languages (English and German), and network analysis as a means to assess dependencies between features. The results indicate a correlation between correct test results of students and linguistic features of the verbal presentations of tests indicating a language influence on higher education test performance. It is also found that this influence relates to special language. Thus, this integrative modeling approach contributes a test basis for a large-scale analysis of learning data and points to a number of subsequent more detailed research.},
Year                     = {2017},
Editor                   = {Zlatkin-Troitschanskaia, Olga and Wittum, Gabriel and Dengel, Andreas},
}
• A. Hoenen, S. Eger, and R. Gehrke, “How Many Stemmata with Root Degree k?,” in Proceedings of the 15th Meeting on the Mathematics of Language, 2017, pp. 11-21.
[BibTeX]

@inproceedings{hoenen2017c,
author = {Hoenen, Armin and Eger, Steffen and Gehrke, Ralf},
title = {{How Many Stemmata with Root Degree k?}},
booktitle = {Proceedings of the 15th Meeting on the Mathematics of Language},
year = {2017},
publisher = {Association for Computational Linguistics},
pages = {11--21},
location = {London, UK},
url = {http://aclweb.org/anthology/W17-3402}
}
• A. Hoenen, “Using Word Embeddings for Computing Distances Between Texts and for Authorship Attribution,” in International Conference on Applications of Natural Language to Information Systems, 2017, pp. 274-277.
[BibTeX]

@inproceedings{hoenen2017b,
title={{Using Word Embeddings for Computing Distances Between Texts and for Authorship Attribution}},
author={Hoenen, Armin},
booktitle={International Conference on Applications of Natural Language to Information Systems},
pages={274--277},
year={2017},
organization={Springer},
}
• T. Uslu, W. Hemati, A. Mehler, and D. Baumartz, “TextImager as a Generic Interface to R,” in Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), 2017. Accepted
[BibTeX]

@inproceedings{Uslu:Hemati:Mehler:Baumartz:2017,
author={Tolga Uslu and Wahed Hemati and Alexander Mehler and Daniel Baumartz},
title={{TextImager} as a Generic Interface to {R}},
booktitle={Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017)},
year={2017},
location={Valencia, Spain},
note={Accepted}
}
• A. Hoenen, “Beyond the tree – a theoretical model of contamination and a software to generate multilingual stemmata,” in Book of Abstracts of the annual conference of the AIUCD 2017, Sapienza, Rome, AIUCD, 2017.
[BibTeX]

@INCOLLECTION{Hoenen:2017aiucd,
author={Hoenen, Armin},
title={{Beyond the tree – a theoretical model of contamination and
a software to generate multilingual stemmata}},
booktitle={{Book of Abstracts of the annual conference of the AIUCD
2017, Sapienza, Rome}},
year={2017},
publisher={AIUCD},
url={http://aiucd2017.aiucd.it/wp-content/uploads/2017/01/book-of-abstract-AIUCD-2017.pdf}}
• A. Mehler and A. Lücking, “Modelle sozialer Netzwerke und Natural Language Processing: eine methodologische Randnotiz,” Soziologie, vol. 46, iss. 1, pp. 43-47, 2017.
[BibTeX]

@Article{Mehler:Luecking:2017,
author =     {Alexander Mehler and Andy Lücking},
title =     {Modelle sozialer Netzwerke und Natural Language Processing: eine methodologische Randnotiz},
journal =     {Soziologie},
year =     2017,
volume =     46,
number =     1,
pages =     {43-47}
}
• W. Hemati, A. Mehler, and T. Uslu, “CRFVoter: Chemical Entity Mention, Gene and Protein Related Object recognition using a conglomerate of CRF based tools,” in BioCreative V.5. Proceedings, 2017. accepted
[BibTeX]

@InProceedings{Hemati:Mehler:Uslu:2017,
author =   {Wahed Hemati and Alexander Mehler and Tolga Uslu},
title =   {{CRFVoter}: Chemical Entity Mention, Gene and
Protein Related Object recognition using a
conglomerate of CRF based tools},
booktitle =   {BioCreative V.5. Proceedings},
note =   {accepted},
year =         {2017}
}
• W. Hemati, T. Uslu, and A. Mehler, “TextImager as an interface to BeCalm,” in BioCreative V.5. Proceedings, 2017. accepted
[BibTeX]

@InProceedings{Hemati:Uslu:Mehler:2017,
author =   {Wahed Hemati and Tolga Uslu and Alexander Mehler},
title =   {{TextImager} as an interface to {BeCalm}},
booktitle =   {BioCreative V.5. Proceedings},
year =    {2017},
note =   {accepted}
}
• A. Mehler, G. Abrami, S. Bruendel, L. Felder, T. Ostertag, and C. Spiekermann, “Stolperwege: An App for a Digital Public History of the Holocaust,” in Proceedings of the 28th ACM Conference on Hypertext and Social Media, New York, NY, USA, 2017, pp. 319-320.
[Abstract] [Poster][BibTeX]

We present the Stolperwege app, a web-based framework for ubiquitous modeling of historical processes. Starting from the art project \textitStolpersteine of Gunter Demnig, it allows for virtually connecting these stumbling blocks with information about the biographies of victims of Nazism. According to the practice of \textitpublic history, the aim of Stolperwege is to deepen public knowledge of the Holocaust in the context of our everyday environment. Stolperwege uses an information model that allows for modeling social networks of agents starting from information about portions of their life. The paper exemplifies how Stolperwege is informationally enriched by means of historical maps and 3D animations of (historical) buildings.
@InProceedings{Mehler:et:al:2017:a,
author        = {Alexander Mehler and Giuseppe Abrami and Steffen Bruendel and Lisa Felder and Thomas Ostertag and Christian Spiekermann},
title         = {{Stolperwege:} An App for a Digital Public History of the {Holocaust}},
booktitle = {Proceedings of the 28th ACM Conference on Hypertext and Social Media},
series = {HT '17},
year = {2017},
isbn = {978-1-4503-4708-2},
location = {Prague, Czech Republic},
pages = {319--320},
numpages = {2},
url = {http://doi.acm.org/10.1145/3078714.3078748},
doi = {10.1145/3078714.3078748},
acmid = {3078748},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {3d, geocaching, geotagging, historical maps, historical processes, public history of the holocaust, ubiquitous computing},
abstract      = {We present the Stolperwege app, a web-based framework for ubiquitous modeling of historical processes. Starting from the art project \textit{Stolpersteine} of Gunter Demnig, it allows for virtually connecting these stumbling blocks with information about the biographies of victims of Nazism. According to the practice of \textit{public history}, the aim of Stolperwege is to deepen public knowledge of the Holocaust in the context of our everyday environment. Stolperwege uses an information model that allows for modeling social networks of agents starting from information about portions of their life. The paper exemplifies how Stolperwege is informationally enriched by means of historical maps and 3D animations of (historical) buildings.}
}

### 2016 (20)

• “Corpora and Resources for (Historical) Low Resource Languages,” , vol. 31, iss. 2, 2016.
[BibTeX]

@collection{GSCL:JLCL:2016:2,
editor={Armin Hoenen and Alexander Mehler and Jost Gippert},
title={{Corpora and Resources for (Historical) Low Resource Languages}},
publisher={JLCL},
volume={31},
number={2},
year={2016},
issn={2190-6858},
bibsource={GSCL, http://www.gscl.info/},
pdf = {http://www.jlcl.org/2016_Heft2/Heft2-2016.pdf}
}
• A. Hoenen, A. Mehler, and J. Gippert, “Editorial,” JLCL, vol. 31, iss. 2, p. iii–iv, 2016.
[BibTeX]

@article{Hoenen:Mehler:Gippert:2016,
AUTHOR = {Armin Hoenen and Alexander Mehler and Jost Gippert},
TITLE = {{Editorial}},
JOURNAL = {JLCL},
YEAR = {2016},
VOLUME = {31},
NUMBER = {2},
PAGES = {iii--iv},
pdf = {http://www.jlcl.org/2016_Heft2/Heft2-2016.pdf}
}
• A. Hoenen and L. Samushia, “Gepi: An Epigraphic Corpus for Old Georgian and a Tool Sketch for Aiding Reconstruction,” JLCL, vol. 31, iss. 2, pp. 25-38, 2016.
[BibTeX]

@ARTICLE{Hoenen:Samushia:2016,
AUTHOR = {Armin Hoenen and Lela Samushia},
TITLE = {{Gepi: An Epigraphic Corpus for Old Georgian and a Tool Sketch for Aiding Reconstruction}},
JOURNAL = {JLCL},
YEAR = {2016},
VOLUME = {31},
NUMBER = {2},
PAGES = {25--38}
}
• S. Eger, A. Hoenen, and A. Mehler, “Language classification from bilingual word embedding graphs,” in Proceedings of COLING 2016, 2016.
[BibTeX]

@InProceedings{Eger:Hoenen:Mehler:2016,
author =     {Steffen Eger and Armin Hoenen and Alexander Mehler},
title =     {Language classification from bilingual word embedding graphs},
booktitle =     {Proceedings of COLING 2016},
year =     2016,
location =     {Osaka},
publisher = {ACL},
}
• W. Hemati, T. Uslu, and A. Mehler, “TextImager: a Distributed UIMA-based System for NLP,” in Proceedings of the COLING 2016 System Demonstrations, 2016.
[BibTeX]

@inproceedings{Hemati:Uslu:Mehler:2016,
author={Wahed Hemati and Tolga Uslu and Alexander Mehler},
title={TextImager: a Distributed UIMA-based System for NLP},
booktitle={Proceedings of the COLING 2016 System Demonstrations},
year={2016},
location={Osaka, Japan},
organization={Federated Conference on Computer Science and Information Systems}
}
• A. Lücking, “Modeling Co-Verbal Gesture Perception in Type Theory with Records,” in Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, 2016, pp. 383-392. Best Paper Award
[BibTeX]

@inproceedings{Luecking:2016:b,
author={L\"{u}cking, Andy},
title={Modeling Co-Verbal Gesture Perception in Type Theory with Records},
booktitle={Proceedings of the 2016 Federated Conference on Computer Science and Information Systems},
year={2016},
pages={383-392},
series={Annals of Computer Science and Information Systems},
volume={8},
editor = {M. Ganzha and L. Maciaszek and M. Paprzycki},{Gdansk, Poland},
publisher={IEEE},
note={Best Paper Award},
url={http://annals-csis.org/Volume_8/drp/83.html},
doi={10.15439/2016F83},
pdf={http://annals-csis.org/Volume_8/pliks/83.pdf}
}
• A. Mehler, T. Uslu, and W. Hemati, “Text2voronoi: An Image-driven Approach to Differential Diagnosis,” in Proceedings of the 5th Workshop on Vision and Language (VL’16) hosted by the 54th Annual Meeting of the Association for Computational Linguistics (ACL), Berlin, 2016.
[BibTeX]

@inproceedings{Mehler:Uslu:Hemati:2016,
title={Text2voronoi: An Image-driven Approach to Differential Diagnosis},
author={Alexander Mehler and Tolga Uslu and Wahed Hemati},
booktitle={Proceedings of the 5th Workshop on Vision and Language (VL'16) hosted by the 54th Annual Meeting of the Association for Computational Linguistics (ACL), Berlin},
pdf = {https://aclweb.org/anthology/W/W16/W16-3212.pdf},
year={2016}
}
• S. Eger and A. Mehler, “On the linearity of semantic change: Investigating meaning variation via dynamic graph models,” in Proceedings of ACL 2016, 2016.
[BibTeX]

@InProceedings{Eger:Mehler:2016,
author =     {Steffen Eger and Alexander Mehler},
title =     {On the linearity of semantic change: {I}nvestigating
meaning variation via dynamic graph models},
booktitle =     {Proceedings of ACL 2016},
year =     2016,
pdf = {https://www.aclweb.org/anthology/P/P16/P16-2009.pdf},
location =     {Berlin}
}
• S. Eger, T. vor der Brück, and A. Mehler, “A Comparison of Four Character-Level String-to-String Translation Models for (OCR) Spelling Error Correction,” The Prague Bulletin of Mathematical Linguistics, vol. 105, pp. 77-99, 2016.
[BibTeX]

@Article{Eger:vorDerBrueck:Mehler:2016,
Title                    = {A Comparison of Four Character-Level String-to-String Translation Models for (OCR) Spelling Error Correction},
Author                   = {Eger, Steffen and vor der Brück, Tim and Mehler, Alexander},
Journal                  = {The Prague Bulletin of Mathematical Linguistics},
Year                     = {2016},
Pages                    = {77-99},
pdf = {https://ufal.mff.cuni.cz/pbml/105/art-eger-vor-der-brueck.pdf},
doi = {10.1515/pralin-2016-0004},
Volume                   = {105}
}
• A. Hoenen, “Silva Portentosissima – Computer-Assisted Reflections on Bifurcativity in Stemmas,” in Digital Humanities 2016: Conference Abstracts. Jagiellonian University & Pedagogical University, 2016, pp. 557-560.
[Abstract] [BibTeX]

In 1928, the philologue Joseph Bédier explored contemporary stemmas and found them to contain a suspiciously large amount of bifurcations. In this paper, the argument is investigated that, with a large amount of lost manuscripts, the amount of bifurcations in the true stemmas would naturally be high because the probability for siblings to survive becomes very low is assessed via a computer simulation.
@InProceedings{Hoenen:2016DH,
Title =     {{Silva Portentosissima – Computer-Assisted Reflections on Bifurcativity in Stemmas}},
Author =     {Hoenen, Armin},
Booktitle =     {Digital Humanities 2016: Conference Abstracts. Jagiellonian University \& Pedagogical University},
Year =     2016,
location =     {Kraków},
pages =     {557-560},
series =     {DH 2016},
abstract = {In 1928, the philologue Joseph Bédier explored contemporary stemmas and found them to contain a suspiciously large amount of bifurcations. In this paper, the argument is investigated that, with a large amount of lost manuscripts, the amount of bifurcations in the true stemmas would naturally be high because the probability for siblings to survive becomes very low is assessed via a computer simulation.}
}
• A. Mehler, B. Wagner, and R. Gleim, “Wikidition: Towards A Multi-layer Network Model of Intertextuality,” in Proceedings of DH 2016, 12-16 July, 2016.
[Abstract] [BibTeX]

The paper presents Wikidition, a novel text mining tool for generating online editions of text corpora. It explores lexical, sentential and textual relations to span multi-layer networks (linkification) that allow for browsing syntagmatic and paradigmatic relations among the constituents of its input texts. In this way, relations of text reuse can be explored together with lexical relations within the same literary memory information system. Beyond that, Wikidition contains a module for automatic lexiconisation to extract author specific vocabularies. Based on linkification and lexiconisation, Wikidition does not only allow for traversing input corpora on different (lexical, sentential and textual) levels. Rather, its readers can also study the vocabulary of authors on several levels of resolution including superlemmas, lemmas, syntactic words and wordforms. We exemplify Wikidition by a range of literary texts and evaluate it by means of the apparatus of quantitative network analysis.
@InProceedings{Mehler:Wagner:Gleim:2016,
Title =     {Wikidition: Towards A Multi-layer Network Model of Intertextuality},
Author =     {Mehler, Alexander and Wagner, Benno and Gleim, R\"{u}diger},
Booktitle =     {Proceedings of DH 2016, 12-16 July},
Year =     2016,
location =     {Kraków},
series =     {DH 2016},
abstract = {The paper presents Wikidition, a novel text mining tool for generating online editions of text corpora. It explores lexical, sentential and textual relations to span multi-layer networks (linkification) that allow for browsing syntagmatic and paradigmatic relations among the constituents of its input texts. In this way, relations of text reuse can be explored together with lexical relations within the same literary memory information system. Beyond that, Wikidition contains a module for automatic lexiconisation to extract author specific vocabularies. Based on linkification and lexiconisation, Wikidition does not only allow for traversing input corpora on different (lexical, sentential and textual) levels. Rather, its readers can also study the vocabulary of authors on several levels of resolution including superlemmas, lemmas, syntactic words and wordforms. We exemplify Wikidition by a range of literary texts and evaluate it by means of the apparatus of quantitative network analysis.}
}
• T. vor der Brück and A. Mehler, “TLT-CRF: A Lexicon-supported Morphological Tagger for Latin Based on Conditional Random Fields,” in Proceedings of the 10th International Conference on Language Resources and Evaluation, 2016.
[BibTeX]

@InProceedings{vorderBrueck:Mehler:2016,
Title =     {{TLT-CRF}: A Lexicon-supported Morphological Tagger
for {Latin} Based on Conditional Random Fields},
Author =     {vor der Br\"{u}ck, Tim and Mehler, Alexander},
Booktitle =     {Proceedings of the 10th International Conference on
Language Resources and Evaluation},
Year =     2016,
location =     {{Portoro\v{z} (Slovenia)}},
series =     {LREC 2016},
}
• S. Eger, R. Gleim, and A. Mehler, “Lemmatization and Morphological Tagging in German and Latin: A comparison and a survey of the state-of-the-art,” in Proceedings of the 10th International Conference on Language Resources and Evaluation, 2016.
[BibTeX]

@InProceedings{Eger:Mehler:Gleim:2016,
Title =     {Lemmatization and Morphological Tagging in {German}
and {Latin}: A comparison and a survey of the
state-of-the-art},
Author =     {Eger, Steffen and Gleim, R\"{u}diger and Mehler, Alexander},
Booktitle =     {Proceedings of the 10th International Conference on
Language Resources and Evaluation},
Year =     2016,
location =     {Portoro\v{z} (Slovenia)},
series =     {LREC 2016},
}
• A. Lücking, A. Mehler, D. Walther, M. Mauri, and D. Kurfürst, “Finding Recurrent Features of Image Schema Gestures: the FIGURE corpus,” in Proceedings of the 10th International Conference on Language Resources and Evaluation, 2016.
[BibTeX]

@InProceedings{Luecking:Mehler:Walther:Mauri:Kurfuerst:2016,
author =     {L\"{u}cking, Andy and Mehler, Alexander and Walther,
D\'{e}sir\'{e}e and Mauri, Marcel and Kurf\"{u}rst,
Dennis},
title =     {Finding Recurrent Features of Image Schema Gestures:
the {FIGURE} corpus},
booktitle =     {Proceedings of the 10th International Conference on
Language Resources and Evaluation},
year =     2016,
series =     {LREC 2016},
location =     {Portoro\v{z} (Slovenia)}
}
• A. Lücking, A. Hoenen, and A. Mehler, “TGermaCorp — A (Digital) Humanities Resource for (Computational) Linguistics,” in Proceedings of the 10th International Conference on Language Resources and Evaluation, 2016.
[BibTeX]

@InProceedings{Luecking:Hoenen:Mehler:2016,
author =     {L\"{u}cking, Andy and Hoenen, Armin and Mehler,
Alexander},
title =     {{TGermaCorp} -- A (Digital) Humanities Resource for
(Computational) Linguistics},
booktitle =     {Proceedings of the 10th International Conference on
Language Resources and Evaluation},
year =     2016,
series =     {LREC 2016},
islrn={536-382-801-278-5},
location =     {Portoro\v{z} (Slovenia)}
}
• B. Wagner, A. Mehler, and H. Biber, “Transbiblionome Daten in der Literaturwissenschaft. Texttechnologische Erschließung und digitale Visualisierung intertextueller Beziehungen digitaler Korpora,” in DHd 2016, 2016.
[BibTeX]

@InProceedings{Wagner:Mehler:Biber:2016,
Title                    = {{Transbiblionome Daten in der Literaturwissenschaft. Texttechnologische Erschließung und digitale Visualisierung intertextueller Beziehungen digitaler Korpora}},
Author                   = {Wagner, Benno and Mehler, Alexander and Biber, Hanno},
Booktitle                = {DHd 2016},
Year                     = {2016},
url         = {http://www.dhd2016.de/abstracts/sektionen-005.html#index.xml-body.1_div.4}
}
• A. Mehler, R. Gleim, T. vor der Brück, W. Hemati, T. Uslu, and S. Eger, “Wikidition: Automatic Lexiconization and Linkiﬁcation of Text Corpora,” Information Technology, pp. 70-79, 2016.
[Abstract] [BibTeX]

We introduce a new text technology, called Wikidition, which automatically generates large scale editions of corpora of natural language texts. Wikidition combines a wide range of text mining tools for automatically linking lexical, sentential and textual units. This includes the extraction of corpus-specific lexica down to the level of syntactic words and their grammatical categories. To this end, we introduce a novel measure of text reuse and exemplify Wikidition by means of the capitularies, that is, a corpus of Medieval Latin texts.
@Article{Mehler:et:al:2016,
Title                    = {Wikidition: Automatic Lexiconization and Linkiﬁcation of Text Corpora},
Author                   = {Alexander Mehler and Rüdiger Gleim and Tim vor der Brück and Wahed Hemati and Tolga Uslu and Steffen Eger},
Journal                  = {Information Technology},
Year                     = {2016},
pages                    = {70-79},
doi                      = {10.1515/itit-2015-0035},
abstract       = {We introduce a new text technology, called Wikidition, which automatically generates large
scale editions of corpora of natural language texts. Wikidition combines a wide range of
text mining tools for automatically linking lexical, sentential and textual units. This
includes the extraction of corpus-specific lexica down to the level of syntactic words and
their grammatical categories. To this end, we introduce a novel measure of text reuse and
exemplify Wikidition by means of the capitularies, that is, a corpus of Medieval Latin
texts.}
}
• A. Hoenen, “Repetition Analyses Function,” in Proceedings of the 2015 Herrenhäuser Symposium Visual Linguistics, 2016.
[Abstract] [BibTeX]

The ReAF is a dynamic heat map developed to represent exact and bag-of-words based repetitions in digitisations of verse bound text. Verse itself is a repetition in linguistic patterning, text itself is a visualisation of speech. In this sense, line breaks are a visualisation technique based on the repetition of linguistic patterning, which the ReAF maintains. Verse bound text existed prior to the invention of script; the first written literary produce of cultures is usually in verse. In their seminal work, Lord (1960) and Parry (1971) attempted to explain the peculiarities of one such text, the Odyssey, by investigating a living oral tradition in Yugoslavia. They invented the Oral Formulaic Theory and showed how bardic composition in performance works. No single author exists but formula and story lines are passed on from generation to generation; the actual performance is always a unique text and no two performances of the same epic are the same. Their conclusion is that one original text of the Odyssey does not exist, has never existed and cannot even exist. Lord and Parry developed tests for the orality of a given text, where they used underlining of repeated passages or formula. To compile this visualisation in the print age required a lot of manual labour, so they largely limited themselves to shorter passages such as the beginning of the Odyssey. This limitation was criticised later on for instance by Finnegan (1992) who misses a complete statistical analyses. The ReAF is a holistic extension of that late print age visualisation of repetition in verse bound text. It uses HTML and JavaScript in order to generate a very simple preprocessing, platform and browser independent interactive visualisation, where the user can navigate the text to verify or falsify his/her assumptions on text genesis and text category. References Lord, A. B. (1960). The Singer of Tales. Harvard University Press. Parry, M. (1971). The making of Homeric verse: the collected papers of Milman Parry. Clarendon Press. Finnegan, R. (1992). Oral Poetry. Indiana University Press.
@INPROCEEDINGS{Hoenen:2016forth,
author={Hoenen, Armin},
title={Repetition Analyses Function},
booktitle={Proceedings of the 2015 Herrenh{\"a}user Symposium Visual Linguistics},
year={2016},
publisher={IDS Mannheim},
abstract = {The ReAF is a dynamic heat map developed to represent exact and bag-of-words based repetitions in digitisations of verse bound text. Verse itself is a repetition in linguistic patterning, text itself is a visualisation of speech. In this sense, line breaks are a visualisation technique based on the repetition of linguistic patterning, which the ReAF maintains. Verse bound text existed prior to the invention of script; the first written literary produce of cultures is usually in verse. In their seminal work, Lord (1960) and Parry (1971) attempted to explain the peculiarities of one such text, the Odyssey, by investigating a living oral tradition in Yugoslavia. They invented the Oral Formulaic Theory and showed how bardic composition in performance works. No single author exists but formula and story lines are passed on from generation to generation; the actual performance is always a unique text and no two performances of the same epic are the same. Their conclusion is that one original text of the Odyssey does not exist, has never existed and cannot even exist. Lord and Parry developed tests for the orality of a given text, where they used underlining of repeated passages or formula. To compile this visualisation in the print age required a lot of manual labour, so they largely limited themselves to shorter passages such as the beginning of the Odyssey. This limitation was criticised later on for instance by Finnegan (1992) who misses a complete statistical analyses. The ReAF is a holistic extension of that late print age visualisation of repetition in verse bound text. It uses HTML and JavaScript in order to generate a very simple preprocessing, platform and browser independent interactive visualisation, where the user can navigate the text to verify or falsify his/her assumptions on text genesis and text category. References Lord, A. B. (1960). The Singer of Tales. Harvard University Press. Parry, M. (1971). The making of Homeric verse: the collected papers of Milman Parry. Clarendon Press. Finnegan, R. (1992). Oral Poetry. Indiana University Press.}}
• A. Hoenen, “Wikipedia Titles As Noun Tag Predictors,” in Proceedings of the 10th International Conference on Language Resources and Evaluation, 2016.
[BibTeX]

@InProceedings{Hoenen:2016x,
author =     {Hoenen, Armin},
title =     {{Wikipedia Titles As Noun Tag Predictors}},
booktitle =     {Proceedings of the 10th International Conference on Language Resources and Evaluation},
year =     2016,
series =     {LREC 2016},
pdf =     {http://www.lrec-conf.org/proceedings/lrec2016/pdf/18_Paper.pdf},
location =     {Portoro\v{z} (Slovenia)}
}
• A. Hoenen, “Das erste dynamische Stemma, Pionier des digitalen Zeitalters?,” in Accepted in the Proceedings of the Jahrestagung der Digital Humanities im deutschsprachigen Raum, 2016.
[BibTeX]

@INPROCEEDINGS{Hoenen:2016y,
booktitle={Accepted in the Proceedings of the Jahrestagung der Digital Humanities im deutschsprachigen Raum},
author={Hoenen, Armin},
year={2016},
title={Das erste dynamische Stemma, Pionier des digitalen Zeitalters?},
url = {http://www.dhd2016.de/abstracts/posters-060.html}
}

### 2015 (24)

• N. Dundua, A. Hoenen, and L. Samushia, “A Parallel Corpus of the Old Georgian Gospel Manuscripts and their Stemmatology,” The Georgian Journal for Language Logic Computation, vol. IV, pp. 176-185, 2015.
[BibTeX]

@ARTICLE{Dundua:Hoenen:Samushia:2015,
author={Dundua, Natia and Hoenen, Armin and Samushia, Lela},
title={{A Parallel Corpus of the Old Georgian Gospel Manuscripts and their Stemmatology}},
journal={The Georgian Journal for Language Logic Computation},
year={2015},
volume={IV},
pages={176-185},
publisher={CLLS, Tbilisi State University and Kurt G{\"o}del Society}}
• T. vor der Brück, S. Eger, and A. Mehler, “Complex Decomposition of the Negative Distance Kernel,” in IEEE International Conference on Machine Learning and Applications, 2015.
[BibTeX]

@INPROCEEDINGS{vor:der:Bruck:Eger:Mehler:2015,
author={vor der Br{\"u}ck, Tim and Eger, Steffen and Mehler, Alexander},
title={Complex Decomposition of the Negative Distance Kernel},
booktitle={IEEE International Conference on Machine Learning and Applications},
location={Miami, Florida, USA},
year={2015}}
• S. Eger, “Do we need bigram alignment models? On the effect of alignment quality on transduction accuracy in G2P,” in Proceedings of EMNLP, 2015.
[BibTeX]

@INPROCEEDINGS{Eger:2015_EMNLP,
author={Eger, Steffen},
title={Do we need bigram alignment models? On the effect of alignment quality on transduction accuracy in G2P},
booktitle={Proceedings of EMNLP},
year={2015},
pre-pub={accepted}}
• T. vor der Brück and S. Eger, “Deriving a primal form for the quadratic power kernel,” in Proceedings of the 38th German Conference on Artificial Intelligence (KI), 2015.
[BibTeX]

@INPROCEEDINGS{vorDerBrueck:Eger:2015,
author={vor der Brück, Tim and Eger, Steffen},
title={Deriving a primal form for the quadratic power kernel},
booktitle={Proceedings of the 38th German Conference on Artificial Intelligence ({KI})},
year={2015},
pre-pub={accepted}}
• S. Eger, “Improving G2P from Wiktionary and other (web) resources,” in Proceedings of Interspeech, 2015.
[BibTeX]

@INPROCEEDINGS{Eger:2015_Interspeech,
author={Eger, Steffen},
title={Improving G2P from Wiktionary and other (web) resources},
booktitle={Proceedings of Interspeech},
year={2015},
pre-pub={accepted}}
• S. Eger, T. vor der Brück, and A. Mehler, “Lexicon-assisted tagging and lemmatization in Latin: A comparison of six taggers and two lemmatization methods,” in Proceedings of the 9th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH 2015), Beijing, China, 2015.
[BibTeX]

@INPROCEEDINGS{Eger:vor:der:Brueck:Mehler:2015,
booktitle={Proceedings of the 9th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities ({LaTeCH 2015})},
year={2015},
title={Lexicon-assisted tagging and lemmatization in {Latin}: A comparison of six taggers and two lemmatization methods},
author={Eger, Steffen and vor der Brück, Tim and Mehler, Alexander},
pdf={https://hucompute.org/wp-content/uploads/2015/08/Lexicon-assisted_tagging.pdf}}
• Towards a Theoretical Framework for Analyzing Complex Linguistic Networks, A. Mehler, A. Lücking, S. Banisch, P. Blanchard, and B. Frank-Job, Eds., Springer, 2015.
[BibTeX]

@BOOK{Mehler:Luecking:Banisch:Blanchard:Frank-Job:2015,
editor={Mehler, Alexander and Lücking, Andy and Banisch, Sven and Blanchard, Philippe and Frank-Job, Barbara},
year={2015},
ISBN={978-36-662-47237-8},
publisher={Springer},
title={Towards a Theoretical Framework for Analyzing Complex Linguistic Networks},
series={Understanding Complex Systems}}
• A. Mehler and R. Gleim, “Linguistic Networks — An Online Platform for Deriving Collocation Networks from Natural Language Texts,” in Towards a Theoretical Framework for Analyzing Complex Linguistic Networks, A. Mehler, A. Lücking, S. Banisch, P. Blanchard, and B. Frank-Job, Eds., Springer, 2015.
[BibTeX]

@INCOLLECTION{Mehler:Gleim:2015:a,
publisher={Springer},
editor={Mehler, Alexander and Lücking, Andy and Banisch, Sven and Blanchard, Philippe and Frank-Job, Barbara},
year={2015},
booktitle={Towards a Theoretical Framework for Analyzing Complex Linguistic Networks},
title={Linguistic Networks -- An Online Platform for Deriving Collocation Networks from Natural Language Texts},
series={Understanding Complex Systems},
author={Mehler, Alexander and Gleim, Rüdiger}}
• S. Eger, “Multiple Many-To-Many Sequence Alignment For Combining String-Valued Variables: A G2P Experiment,” in ACL, 2015.
[BibTeX]

@INPROCEEDINGS{Eger:2015_ACL,
author={Eger, Steffen},
title={Multiple Many-To-Many Sequence Alignment For Combining String-Valued Variables: A G2P Experiment},
booktitle={ACL},
year={2015},
publisher={Association for Computational Linguistics}}
• S. Eger, “Designing and comparing G2P-type lemmatizers for a morphology-rich language.” 2015.
[BibTeX]

@INPROCEEDINGS{Eger:2015_SFCM,
author={Eger, Steffen},
title={Designing and comparing G2P-type lemmatizers for a morphology-rich language},
year={2015},
publisher={Fourth International Workshop on Systems and Frameworks for Computational Morphology}}
• S. Eger, N. Schenk, and A. Mehler, “Towards Semantic Language Classification: Inducing and Clustering Semantic Association Networks from Europarl,” in Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, 2015, pp. 127-136.
[BibTeX]

@INPROCEEDINGS{Eger:Schenk:Mehler:2015,
author={Eger, Steffen and Schenk, Niko and Mehler, Alexander},
title={Towards Semantic Language Classification: Inducing and Clustering Semantic Association Networks from Europarl},
booktitle={Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics},
month={June},
year={2015},
publisher={Association for Computational Linguistics},
pages={127--136},
url={http://www.aclweb.org/anthology/S15-1014},
pdf={https://hucompute.org/wp-content/uploads/2015/08/starsem2015-corrected-version.pdf}}
• S. Eger, “Identities for Partial Bell Polynomials Derived from Identities for Weighted Integer Compositions.,” Aequationes Mathematicae, 2015.
[BibTeX]

@ARTICLE{Eger:2015b,
author={Eger, Steffen},
journal={Aequationes Mathematicae},
title={Identities for Partial Bell Polynomials Derived from Identities for Weighted Integer Compositions.},
year={2015},
doi={10.1007/s00010-015-0338-2}}
• S. Eger, “Some Elementary Congruences for the Number of Weighted Integer Compositions.,” Journal of Integer Sequences (electronic only), vol. 18, iss. 4, 2015.
[BibTeX]

@ARTICLE{Eger:2015a,
author={Eger, Steffen},
journal={Journal of Integer Sequences (electronic only)},
title={Some Elementary Congruences for the Number of Weighted Integer Compositions.},
year={2015},
volume={18},
number={4},
publisher={School of Computer Science, University of Waterloo, Waterloo, ON},
pdf={https://cs.uwaterloo.ca/journals/JIS/VOL18/Eger/eger11.pdf}}
• A. Lücking, T. Pfeiffer, and H. Rieser, “Pointing and Reference Reconsidered,” Journal of Pragmatics, vol. 77, pp. 56-79, 2015.
[Abstract] [BibTeX]

Current semantic theory on indexical expressions claims that demonstratively used indexicals such as this lack a referent-determining meaning but instead rely on an accompanying demonstration act like a pointing gesture. While this view allows to set up a sound logic of demonstratives, the direct-referential role assigned to pointing gestures has never been scrutinized thoroughly in semantics or pragmatics. We investigate the semantics and pragmatics of co-verbal pointing from a foundational perspective combining experiments, statistical investigation, computer simulation and theoretical modeling techniques in a novel manner. We evaluate various referential hypotheses with a corpus of object identification games set up in experiments in which body movement tracking techniques have been extensively used to generate precise pointing  measurements. Statistical investigation and computer simulations show that especially distal areas in the pointing domain falsify the semantic direct-referential hypotheses concerning pointing gestures. As an alternative, we propose that reference involving pointing rests on a default inference which we specify using the empirical data. These results raise numerous problems for classical semantics–pragmatics interfaces: we argue for pre-semantic pragmatics in order to account for inferential reference in addition to classical post-semantic Gricean pragmatics.
@ARTICLE{Luecking:Pfeiffer:Rieser:2015,
year={2015},
title={Pointing and Reference Reconsidered},
author={Lücking, Andy and Pfeiffer, Thies and Rieser, Hannes},
journal={Journal of Pragmatics},
volume={77},
pages={56-79},
doi={10.1016/j.pragma.2014.12.013},
website={http://www.sciencedirect.com/science/article/pii/S037821661500003X},
abstract={Current semantic theory on indexical expressions claims that demonstratively used indexicals such as this lack a referent-determining meaning but instead rely on an accompanying demonstration act like a pointing gesture. While this view allows to set up a sound logic of demonstratives, the direct-referential role assigned to pointing gestures has never been scrutinized thoroughly in semantics or pragmatics. We investigate the semantics and pragmatics of co-verbal pointing from a foundational perspective combining experiments, statistical investigation, computer simulation and theoretical modeling techniques in a novel manner. We evaluate various referential hypotheses with a corpus of object identification games set up in experiments in which body movement tracking techniques have been extensively used to generate precise pointing  measurements. Statistical investigation and computer simulations show that especially distal areas in the pointing domain falsify the semantic direct-referential hypotheses concerning pointing gestures. As an alternative, we propose that reference involving pointing rests on a default inference which we specify using the empirical data. These results raise numerous problems for classical semantics–pragmatics interfaces: we argue for pre-semantic pragmatics in order to account for inferential reference in addition to classical post-semantic Gricean pragmatics.}}
• Text Mining: From Ontology Learning to Automated Text Processing Applications. Festschrift in Honor of Gerhard Heyer, C. Biemann and A. Mehler, Eds., Heidelberg: Springer, 2015.
[BibTeX]

@BOOK{Biemann:Mehler:2015,
publisher={Springer},
editor={Biemann, Chris and Mehler, Alexander},
year={2015},
title={Text Mining: From Ontology Learning to Automated Text Processing Applications. Festschrift in Honor of Gerhard Heyer},
series={Theory and Applications of Natural Language Processing},
address={Heidelberg}}
• A. Mehler, T. vor der Brück, R. Gleim, and T. Geelhaar, “Towards a Network Model of the Coreness of Texts: An Experiment in Classifying Latin Texts using the TTLab Latin Tagger,” in Text Mining: From Ontology Learning to Automated text Processing Applications, C. Biemann and A. Mehler, Eds., Berlin/New York: Springer, 2015, pp. 87-112. appears
[Abstract] [BibTeX]

The analysis of longitudinal corpora of historical texts requires the integrated development of tools for automatically preprocessing these texts and for building representation models of their genre- and register-related dynamics. In this chapter we present such a joint endeavor that ranges from resource formation via preprocessing to network-based text representation and classification. We start with presenting the so-called TTLab Latin Tagger (TLT) that preprocesses texts of classical and medieval Latin. Its lexical resource in the form of the Frankfurt Latin Lexicon (FLL) is also briefly introduced. As a first test case for showing the expressiveness of these resources, we perform a tripartite classification task of authorship attribution, genre detection and a combination thereof. To this end, we introduce a novel text representation model that explores the core structure (the so-called coreness) of lexical network representations of texts. Our experiment shows the expressiveness of this representation format and mediately of our Latin preprocessor.
@INCOLLECTION{Mehler:Brueck:Gleim:Geelhaar:2015,
publisher={Springer},
series={Theory and Applications of Natural Language Processing},
booktitle={Text Mining: From Ontology Learning to Automated text Processing Applications},
pages={87-112},
editor={Chris Biemann and Alexander Mehler},
author={Mehler, Alexander and vor der Brück, Tim and Gleim, Rüdiger and Geelhaar, Tim},
note={appears},
year={2015},
title={Towards a Network Model of the Coreness of Texts: An Experiment in Classifying Latin Texts using the TTLab Latin Tagger},
abstract={The analysis of longitudinal corpora of historical texts requires the integrated development of tools for automatically preprocessing these texts and for building representation models of their genre- and register-related dynamics. In this chapter we present such a joint endeavor that ranges from resource formation via preprocessing to network-based text representation and classification. We start with presenting the so-called TTLab Latin Tagger (TLT) that preprocesses texts of classical and medieval Latin. Its lexical resource in the form of the Frankfurt Latin Lexicon (FLL) is also briefly introduced. As a first test case for showing the expressiveness of these resources, we perform a tripartite classification task of authorship attribution, genre detection and a combination thereof. To this end, we introduce a novel text representation model that explores the core structure (the so-called coreness) of lexical network representations of texts. Our experiment shows the expressiveness of this representation format and mediately of our Latin preprocessor.},
website={http://link.springer.com/chapter/10.1007/978-3-319-12655-5_5}}
• A. Hoenen, “Das artifizielle Manuskriptkorpus TASCFE,” in Accepted in the Proceedings of the Jahrestagung der Digital Humanities im deutschsprachigen Raum, 2015.
[BibTeX]

@INPROCEEDINGS{Hoenen:2015,
booktitle={Accepted in the Proceedings of the Jahrestagung der Digital Humanities im deutschsprachigen Raum},
author={Hoenen, Armin},
year={2015},
title={Das artifizielle Manuskriptkorpus TASCFE},
pdf={https://hucompute.org/wp-content/uploads/2015/08/Hoenen_tascfeDH2015.pdf}}
• R. Gleim and A. Mehler, “TTLab Preprocessor – Eine generische Web-Anwendung für die Vorverarbeitung von Texten und deren Evaluation,” in Accepted in the Proceedings of the Jahrestagung der Digital Humanities im deutschsprachigen Raum, 2015.
[BibTeX]

@INPROCEEDINGS{Gleim:Mehler:2015,
booktitle={Accepted in the Proceedings of the Jahrestagung der Digital Humanities im deutschsprachigen Raum},
author={Gleim, Rüdiger and Mehler, Alexander},
year={2015},
title={TTLab Preprocessor – Eine generische Web-Anwendung für die Vorverarbeitung von Texten und deren Evaluation}}
• G. Abrami, A. Mehler, and S. Zeunert, “Ontologiegestütze geisteswissenschaftliche Annotationen mit dem OWLnotator,” in Proceedings of the Jahrestagung der Digital Humanities im deutschsprachigen Raum, 2015.
[BibTeX]

@INPROCEEDINGS{Abrami:Mehler:Zeunert:2015:a,
booktitle={Proceedings of the Jahrestagung der Digital Humanities im deutschsprachigen Raum},
author={Abrami, Giuseppe and Mehler, Alexander and Zeunert, Susanne},
year={2015},
title={Ontologiegestütze geisteswissenschaftliche Annotationen mit dem OWLnotator}}
• G. Abrami, A. Mehler, and D. Pravida, “Fusing Text and Image Data with the Help of the OWLnotator,” in Human Interface and the Management of Information. Information and Knowledge Design, S. Yamamoto, Ed., Springer International Publishing, 2015, vol. 9172, pp. 261-272.
[BibTeX]

@INCOLLECTION{Abrami:Mehler:Pravida:2015:b,
booktitle={Human Interface and the Management of Information. Information and Knowledge Design},
publisher={Springer International Publishing},
editor={Yamamoto, Sakae},
pages={261-272},
series={Lecture Notes in Computer Science},
Volume={9172},
Doi={10.1007/978-3-319-20612-7_25},
ISBN={978-3-319-20611-0},
language={English},
website={http://dx.doi.org/10.1007/978-3-319-20612-7_25},
author={Abrami, Giuseppe and Mehler, Alexander and Pravida, Dietmar},
year={2015},
title={Fusing Text and Image Data with the Help of the OWLnotator}}
• A. Hoenen and F. Mader, ,” in Historical Corpora, Frankfurt am Main, Germany, 2015.
[Abstract] [BibTeX]

In this paper, that goes along with the re- lease of an Austrian lemma list for NLP ap- plications, the creation and representation of a digital dialect lemma list from exist- ing internet sources and books is presented. The creation procedure can serve as a role- model for similar projects on other dialects and points to a new cost saving way to produce NLP resources by use of the in- ternet in a similar way to human-based- computation. Dialect lexica can facilitate NLP and improve POS-tagging for German language ressources in general. The repre- sentation standard used is LMF. It will be demonstrated, how this lemma list can be used as a tool in literature science, linguis- tics and computational linguistics. Espe- cially the critical edition of Hugo von Hof- mannsthal is a well-suited corpus for the aforementioned research fields and the in- spiration to build this tool.
@INPROCEEDINGS{Hoenen:Mader:2015,
website={http://www.narr-shop.de/historical-corpora.html},
booktitle={Historical Corpora},
year={2015},
title={A New LMF Schema Application by Example of an Austrian Lexicon Applied to the Historical Corpus of the Writer Hugo von Hofmannsthal},
abstract={In this paper, that goes along with the re- lease of an Austrian lemma list for NLP ap- plications, the creation and representation of a digital dialect lemma list from exist- ing internet sources and books is presented. The creation procedure can serve as a role- model for similar projects on other dialects and points to a new cost saving way to produce NLP resources by use of the in- ternet in a similar way to human-based- computation. Dialect lexica can facilitate NLP and improve POS-tagging for German language ressources in general. The repre- sentation standard used is LMF. It will be demonstrated, how this lemma list can be used as a tool in literature science, linguis- tics and computational linguistics. Espe- cially the critical edition of Hugo von Hof- mannsthal is a well-suited corpus for the aforementioned research fields and the in- spiration to build this tool.}}
• A. Hoenen, “Lachmannian Archetype Reconstruction for Ancient Manuscript Corpora,” in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), 2015. Citation: Trovato is published in 2014 not in 2009.
[Abstract] [BibTeX]

Two goals are targeted by computer philology for ancient manuscript corpora: firstly, making an edition, that is roughly speaking one text version representing the whole corpus, which contains variety induced through copy errors and other processes and secondly, producing a stemma. A stemma is a graph-based visualization of the copy history with manuscripts as nodes and copy events as edges. Its root, the so-called archetype is the supposed original text or urtext from which all subsequent copies are made. Our main contribution is to present one of the first computational approaches to automatic archetype reconstruction and to introduce the first text-based evaluation for automatically produced archetypes. We compare a philologically generated archetype with one generated by bio-informatic software.
@INPROCEEDINGS{Hoenen:2015a,
booktitle={Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT)},
author={Hoenen, Armin},
year={2015},
note={Citation: Trovato is published in 2014 not in 2009.},
website={http://www.aclweb.org/anthology/N15-1127},
title={Lachmannian Archetype Reconstruction for Ancient Manuscript Corpora},
abstract={Two goals are targeted by computer philology for ancient manuscript corpora: firstly, making an edition, that is roughly speaking one text version representing the whole corpus, which contains variety induced through copy errors and other processes and secondly, producing a stemma. A stemma is a graph-based visualization of the copy history with manuscripts as nodes and copy events as edges. Its root, the so-called archetype is the supposed original text or urtext from which all subsequent copies are made. Our main contribution is to present one of the first computational approaches to automatic archetype reconstruction and to introduce the first text-based evaluation for automatically produced archetypes. We compare a philologically generated archetype with one generated by bio-informatic software.}}
• A. Hoenen, “Simulating Misreading,” in Proceedings of the 20TH INTERNATIONAL CONFERENCE ON APPLICATIONS OF NATURAL LANGUAGE TO INFORMATION SYSTEMS (NLDB), 2015.
[Abstract] [BibTeX]

Physical misreading (as opposed to interpretational misreading) is an unnoticed substitution in silent reading. Especially for legally important documents or instruction manuals, this can lead to serious consequences. We present a prototype of an automatic highlighter targeting words which can most easily be misread in a given text using a dynamic orthographic neighbour concept. We propose measures of fit of a misread token based on Natural Language Processing and detect a list of short most easily misread tokens in the English language. We design a highlighting scheme for avoidance of misreading.
@INPROCEEDINGS{Hoenen:2015b,
booktitle={Proceedings of the 20TH INTERNATIONAL CONFERENCE ON APPLICATIONS OF NATURAL LANGUAGE TO INFORMATION SYSTEMS (NLDB)},
author={Hoenen, Armin},
year={2015},
abstract={Physical misreading (as opposed to interpretational misreading) is an unnoticed substitution in silent reading. Especially for legally important documents or instruction manuals, this can lead to serious consequences. We present a prototype of an automatic highlighter targeting words which can most easily be misread in a given text using a dynamic orthographic neighbour concept. We propose measures of fit of a misread token based on Natural Language Processing and detect a list of short most easily misread tokens in the English language. We design a highlighting scheme for avoidance of misreading.}}
• G. Abrami, M. Freiberg, and P. Warner, “Managing and Annotating Historical Multimodal Corpora with the eHumanities Desktop – An outline of the current state of the LOEWE project Illustrations of Goethe s Faust,” in Historical Corpora, 2015, pp. 353-363.
[Abstract] [BibTeX]

Text corpora are structured sets of text segments that can be annotated or interrelated. Expanding on this, we can define a database of images as an iconographic multimodal corpus with annotated images and the relations between images as well as between images and texts. The Goethe-Museum in Frankfurt holds a significant collection of art work and texts relating to Goethe’s Faust from the early 19th century until the present. In this project we create a database containing digitized items from this collection, and extend a tool, the ImageDB in the eHumanities Desktop, to annotate and provide relations between resources. This article gives an overview of the project and provides some technical details. Furthermore we show newly implemented features, explain the challenge of creating an ontology on multimodal corpora and give a forecast for future work.
@INPROCEEDINGS{Abrami:Freiberg:Warner:2015,
website={http://www.narr-shop.de/historical-corpora.html},
booktitle={Historical Corpora},
pages={353 - 363},
author={Abrami, Giuseppe and Freiberg, Michael and Warner, Paul},
year={2015},
title={Managing and Annotating Historical Multimodal Corpora with the eHumanities Desktop - An outline of the current state of the LOEWE project Illustrations of Goethe s Faust},
abstract={Text corpora are structured sets of text segments that can be annotated or interrelated. Expanding on this, we can define a database of images as an iconographic multimodal corpus with annotated images and the relations between images as well as between images and texts. The Goethe-Museum in Frankfurt holds a significant collection of art work and texts relating to Goethe’s Faust from the early 19th century until the present. In this project we create a database containing digitized items from this collection, and extend a tool, the ImageDB in the eHumanities Desktop, to annotate and provide relations between resources. This article gives an overview of the project and provides some technical details. Furthermore we show newly implemented features, explain the challenge of creating an ontology on multimodal corpora and give a forecast for future work.}}

### 2014 (13)

• A. Hoenen, “Stemmatology, an interdisciplinary endeavour,” in Book of Abstracts zum DHd Workshop Informatik und die Digital Humanities, DHd, 2014.
[BibTeX]

@INCOLLECTION{Hoenen:2014plz,
author={Hoenen, Armin},
title={{Stemmatology, an interdisciplinary endeavour}},
booktitle={{Book of Abstracts zum DHd Workshop Informatik und die Digital Humanities}},
year={2014},
publisher={DHd},
url={http://dhd-wp.hab.de/files/book_of_abstracts.pdf}}
• X. Chen, “Language as a whole — A new framework for linguistic knowledge integration: Comment on "Approaching human language with complex networks" by Cong and Liu,” Physics of Life Reviews, vol. 11, iss. 4, pp. 628-629, 2014.
[BibTeX]

@ARTICLE{Chen:2014:a,
title={Language as a whole -- A new framework for linguistic knowledge integration: Comment on "Approaching human language with complex networks" by {Cong} and {Liu}},
journal={Physics of Life Reviews},
volume={11},
number={4},
pages={628-629},
year={2014},
doi={10.1016/j.plrev.2014.07.011},
url={http://www.sciencedirect.com/science/article/pii/S1571064514001249},
author={Chen, Xinying},
pdf={https://hucompute.org/wp-content/uploads/2015/08/Language-as-a-whole-Chen.pdf}}
• T. Gong, Y. W. Lam, X. Chen, and M. Zhang, “Review: Evolutionary Linguistics in the Past Two Decades — EVOLANG10: the 10th International Conference on Language Evolution,” Journal of Chinese Linguistics, vol. 42, iss. 2, pp. 499-530, 2014.
[BibTeX]

@ARTICLE{Gong:Lam:Chen:Zhang:2014,
author={Gong, Tao and Lam, Yau Wai and Chen, Xinying and Zhang, Menghan},
title={Review: Evolutionary Linguistics in the Past Two Decades -- EVOLANG10: the 10th International Conference on Language Evolution},
journal={Journal of Chinese Linguistics},
volume={42},
number={2},
pages={499-530},
year={2014},
pdf={https://hucompute.org/wp-content/uploads/2015/08/JCL-EvolangReview.pdf}}
• G. Abrami, A. Mehler, D. Pravida, and S. Zeunert, “Rubrik: Neues aus dem Netz,” Kunstchronik, vol. 12, p. 623, 2014.
[BibTeX]

@ARTICLE{Abrami:Mehler:Pravida:Zeunert:2014,
journal={Kunstchronik},
month={12},
author={Abrami, Giuseppe and Mehler, Alexander and Pravida, Dietmar and Zeunert, Susanne},
pages={623},
volume={12},
publisher={Zentralinstitut für Kunstgeschichte},
year={2014},
title={Rubrik: Neues aus dem Netz},
website={http://www.zikg.eu/publikationen/laufende-publikationen/kunstchronik}}
• S. Eger, “A proof of the Mann-Shanks primality criterion conjecture for extended binomial coefficients,” Integers: The Electronic Journal of Combinatorial Number Theory, vol. 14, 2014.
[Abstract] [BibTeX]

We show that the Mann-Shanks primality criterion holds for weighted extended binomial coefficients (which count the number of weighted integer compositions), not only for the ordinary binomial coefficients.
@ARTICLE{Eger:2014:a,
author={Eger, Steffen},
journal={Integers: The Electronic Journal of Combinatorial Number Theory},
title={A proof of the Mann-Shanks primality criterion conjecture for extended binomial coefficients},
year={2014},
abstract={We show that the Mann-Shanks primality criterion holds for weighted extended binomial coefficients (which count the number of weighted integer compositions), not only for the ordinary binomial coefficients.},
volume={14},
pdf={http://www.emis.de/journals/INTEGERS/papers/o60/o60.pdf},
website={http://www.emis.de/journals/INTEGERS/vol14.html}}
• S. Eger, “Stirling’s approximation for central extended binomial coefficients.,” The American Mathematical Monthly, vol. 121, iss. 4, pp. 344-349, 2014.
[Abstract] [BibTeX]

We derive asymptotic formulas for central extended binomial coefficients, which are generalizations of binomial coefficients, using the distribution of the sum of independent discrete uniform random variables with the Central Limit Theorem and a local limit variant.
@ARTICLE{Eger:2014:b,
author={Eger, Steffen},
journal={The American Mathematical Monthly},
title={Stirling's approximation for central extended binomial coefficients.},
year={2014},
volume={121},
number={4},
pages={344-349},
abstract={We derive asymptotic formulas for central extended binomial coefficients, which are generalizations of binomial coefficients, using the distribution of the sum of independent discrete uniform random variables with the Central Limit Theorem and a local limit variant.},
website={http://www.jstor.org/stable/10.4169/amer.math.monthly.121.04.344}}
• A. Mehler, “On the Expressiveness, Validity and Reproducibility of Models of Language Evolution. Comment on ‘Modelling language evolution: Examples and predictions’ by Tao Gong, Shuai Lan, and Menghan Zhang,” Physics of Life Review, 2014.
[BibTeX]

@ARTICLE{Mehler:2014,
journal={Physics of Life Review},
author={Mehler, Alexander},
year={2014},
title={On the Expressiveness, Validity and Reproducibility of Models of Language Evolution. Comment on 'Modelling language evolution: Examples and predictions' by Tao Gong, Shuai Lan, and Menghan Zhang},
abstract={},
website={https://www.researchgate.net/publication/261290946_On_the_expressiveness_validity_and_reproducibility_of_models_of_language_evolution_Comment_on_Modelling_language_evolution_Examples_and_predictions_by_Tao_Gong_Shuai_Lan_and_Menghan_Zhang},
pdf={http://www.sciencedirect.com/science/article/pii/S1571064514000529/pdfft?md5=6a2cbbfc083d7bc3adfd26d431cc55d8&pid=1-s2.0-S1571064514000529-main.pdf}}
• C. Biemann, G. R. Crane, C. D. Fellbaum, and A. M. (eds.), “Computational Humanities – bridging the gap between Computer Science and Digital Humanities (Dagstuhl Seminar 14301),” Dagstuhl Reports, vol. 4, iss. 7, pp. 80-111, 2014.
[Abstract] [BibTeX]

Research in the field of Digital Humanities, also known as Humanities Computing, has seen a steady increase over the past years. Situated at the intersection of computing science and the humanities, present efforts focus on making resources such as texts, images, musical pieces and other semiotic artifacts digitally available, searchable and analysable. To this end, computational tools enabling textual search, visual analytics, data mining, statistics and natural language processing are harnessed to support the humanities researcher. The processing of large data sets with appropriate software opens up novel and fruitful approaches to questions in the traditional humanities. This report summarizes the Dagstuhl seminar 14301 on “Computational Humanities – bridging the gap between Computer Science and Digital Humanities”
@ARTICLE{Biemann:Crane:Fellbaum:Mehler:2014,
journal={Dagstuhl Reports},
pages={80-111},
issn={2192-5283},
author={Chris Biemann and Gregory R. Crane and Christiane D. Fellbaum and Alexander Mehler (eds.)},
publisher={Schloss Dagstuhl--Leibniz-Zentrum für Informatik},
year={2014},
volume={4},
number={7},
title={Computational Humanities - bridging the gap between Computer Science and Digital Humanities (Dagstuhl Seminar 14301)},
abstract={Research in the field of Digital Humanities, also known as Humanities Computing, has seen a steady increase over the past years. Situated at the intersection of computing science and the humanities, present efforts focus on making resources such as texts, images, musical pieces and other semiotic artifacts digitally available, searchable and analysable. To this end, computational tools enabling textual search, visual analytics, data mining, statistics and natural language processing are harnessed to support the humanities researcher. The processing of large data sets with appropriate software opens up novel and fruitful approaches to questions in the traditional humanities. This report summarizes the Dagstuhl seminar 14301 on “Computational Humanities – bridging the gap between Computer Science and Digital Humanities”},
pdf={https://hucompute.org/wp-content/uploads/2015/08/dagrep_v004_i007_p080_s14301.pdf}}
• M. Z. Islam, M. R. Rahman, and A. Mehler, “Readability Classification of Bangla Texts,” in 15th International Conference on Intelligent Text Processing and Computational Linguistics (cicLing), Kathmandu, Nepal, 2014.
[Abstract] [BibTeX]

Readability classification is an important application of Natural Language Processing. It aims at judging the quality of documents and to assist writers to identify possible problems. This paper presents a readability classifier for Bangla textbooks using information-theoretic and lexical features. All together 18 features are explored to achieve an F-score of 86.46
@INPROCEEDINGS{Islam:Rahman:Mehler:2014,
booktitle={15th International Conference on Intelligent Text Processing and Computational Linguistics (cicLing), Kathmandu, Nepal},
author={Islam, Md. Zahurul and Rahman, Md. Rashedur and Mehler, Alexander},
year={2014},
abstract={Readability classification is an important application of Natural Language Processing. It aims at judging the quality of documents and to assist writers to identify possible problems. This paper presents a readability classifier for Bangla textbooks using information-theoretic and lexical features. All together 18 features are explored to achieve an F-score of 86.46}}
• A. Mehler, T. vor der Brück, and A. Lücking, “Comparing Hand Gesture Vocabularies for HCI,” in Proceedings of HCI International 2014, 22 – 27 June 2014, Heraklion, Greece, Berlin/New York: Springer, 2014.
[Abstract] [BibTeX]

HCI systems are often equipped with gestural interfaces drawing on a predefined set of admitted gestures. We provide an assessment of the fitness of such gesture vocabularies in terms of their learnability and naturalness. This is done by example of rivaling gesture vocabularies of the museum information system WikiNect. In this way, we do not only provide a procedure for evaluating gesture vocabularies, but additionally contribute to design criteria to be followed by the gestures.
@INCOLLECTION{Mehler:vor:der:Brueck:Luecking:2014,
publisher={Springer},
booktitle={Proceedings of HCI International 2014, 22 - 27 June 2014, Heraklion, Greece},
author={Mehler, Alexander and vor der Brück, Tim and Lücking, Andy},
year={2014},
title={Comparing Hand Gesture Vocabularies for HCI},
abstract={HCI systems are often equipped with gestural interfaces drawing on a predefined set of admitted gestures. We provide an assessment of the fitness of such gesture vocabularies in terms of their learnability and naturalness. This is done by example of rivaling gesture vocabularies of the museum information system WikiNect. In this way, we do not only provide a procedure for evaluating gesture vocabularies, but additionally contribute to design criteria to be followed by the gestures.},
keywords={wikinect}}
• A. Mehler, A. Lücking, and G. Abrami, “WikiNect: Image Schemata as a Basis of Gestural Writing for Kinetic Museum Wikis,” Universal Access in the Information Society, pp. 1-17, 2014.
[Abstract] [BibTeX]

This paper provides a theoretical assessment of gestures in the context of authoring image-related hypertexts by example of the museum information system WikiNect. To this end, a first implementation of gestural writing based on image schemata is provided (Lakoff in Women, fire, and dangerous things: what categories reveal about the mind. University of Chicago Press, Chicago, 1987). Gestural writing is defined as a sort of coding in which propositions are only expressed by means of gestures. In this respect, it is shown that image schemata allow for bridging between natural language predicates and gestural manifestations. Further, it is demonstrated that gestural writing primarily focuses on the perceptual level of image descriptions (Hollink et al. in Int J Hum Comput Stud 61(5):601–626, 2004). By exploring the metaphorical potential of image schemata, it is finally illustrated how to extend the expressiveness of gestural writing in order to reach the conceptual level of image descriptions. In this context, the paper paves the way for implementing museum information systems like WikiNect as systems of kinetic hypertext authoring based on full-fledged gestural writing.
@ARTICLE{Mehler:Luecking:Abrami:2014,
journal={Universal Access in the Information Society},
issn={1615-5289},
doi={10.1007/s10209-014-0386-8},
author={Mehler, Alexander and Lücking, Andy and Abrami, Giuseppe},
pages={1-17},
year={2014},
title={{WikiNect}: Image Schemata as a Basis of Gestural Writing for Kinetic Museum Wikis},
website={http://dx.doi.org/10.1007/s10209-014-0386-8},
abstract={This paper provides a theoretical assessment of gestures in the context of authoring image-related hypertexts by example of the museum information system WikiNect. To this end, a first implementation of gestural writing based on image schemata is provided (Lakoff in Women, fire, and dangerous things: what categories reveal about the mind. University of Chicago Press, Chicago, 1987). Gestural writing is defined as a sort of coding in which propositions are only expressed by means of gestures. In this respect, it is shown that image schemata allow for bridging between natural language predicates and gestural manifestations. Further, it is demonstrated that gestural writing primarily focuses on the perceptual level of image descriptions (Hollink et al. in Int J Hum Comput Stud 61(5):601–626, 2004). By exploring the metaphorical potential of image schemata, it is finally illustrated how to extend the expressiveness of gestural writing in order to reach the conceptual level of image descriptions. In this context, the paper paves the way for implementing museum information systems like WikiNect as systems of kinetic hypertext authoring based on full-fledged gestural writing.},
keywords={wikinect}}
• T. vor der Brück, A. Mehler, and M. Z. Islam, “ColLex.EN: Automatically Generating and Evaluating a Full-form Lexicon for English,” in Proceedings of LREC 2014, Reykjavik, Iceland, 2014.
[Abstract] [BibTeX]

Currently, a large number of different lexica is available for English. However, substantial and freely available fullform lexica with a high number of named entities are rather rare even in the case of this lingua franca. Existing lexica are often limited in several respects as explained in Section 2. What is missing so far is a freely available substantial machine-readable lexical resource of English that contains a high number of word forms and a large collection of named entities. In this paper, we describe a procedure to generate such a resource by example of English. This lexicon, henceforth called ColLex.EN (for Collecting Lexica for English ), will be made freely available to the public 1. In this paper, we describe how ColLex.EN was collected from existing lexical resources and specify the statistical procedures that we developed to extend and adjust it. No manual modifications were done on the generated word forms and lemmas. Our fully automatic procedure has the advantage that whenever new versions of the source lexica are available, a new version of ColLex.EN can be automatically generated with low effort.
@INPROCEEDINGS{vor:der:Brueck:Mehler:Islam:2014,
booktitle={Proceedings of LREC 2014},
author={vor der Brück, Tim and Mehler, Alexander and Islam, Md. Zahurul},
year={2014},
title={ColLex.EN: Automatically Generating and Evaluating a Full-form Lexicon for English},
abstract={Currently, a large number of different lexica is available for English. However, substantial and freely available fullform lexica with a high number of named entities are rather rare even in the case of this lingua franca. Existing lexica are often limited in several respects as explained in Section 2. What is missing so far is a freely available substantial machine-readable lexical resource of English that contains a high number of word forms and a large collection of named entities. In this paper, we describe a procedure to generate such a resource by example of English. This lexicon, henceforth called ColLex.EN (for Collecting Lexica for English ), will be made freely available to the public 1. In this paper, we describe how ColLex.EN was collected from existing lexical resources and specify the statistical procedures that we developed to extend and adjust it. No manual modifications were done on the generated word forms and lemmas. Our fully automatic procedure has the advantage that whenever new versions of the source lexica are available, a new version of ColLex.EN can be automatically generated with low effort.},
website={ http://aclanthology.info/papers/collex-en-automatically-generating-and-evaluating-a-full-form-lexicon-for-english}}
• A. Hoenen, “Simulation of Scribal Letter Substitution,” in Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches, 2014.
[BibTeX]

@INPROCEEDINGS{Hoenen:2014,
owner={hoenen},
booktitle={Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches},
author={Hoenen, Armin},
editor={T.L Andrews and C.Macé},
year={2014},
title={Simulation of Scribal Letter Substitution},
website={http://www.brepols.net/Pages/ShowProduct.aspx?prod_id=IS-9782503552682-1}}

### 2013 (19)

• I. Sejane and S. Eger, “Semantic typologies by means of network analysis of bilingual dictionaries,” in Approaches to Measuring Linguistic Differences, L. Borin and A. Saxena, Eds., De Gruyter, 2013, pp. 447-474.
[BibTeX]

@INCOLLECTION{Sejane:Eger:2013,
author={Sejane, Ineta and Eger, Steffen},
booktitle={Approaches to Measuring Linguistic Differences},
editor={Borin, Lars and Saxena, Anju},
pages={447-474},
publisher={De Gruyter},
title={Semantic typologies by means of network analysis of bilingual dictionaries},
url={http://www.degruyter.com/view/books/9783110305258/9783110305258.447/9783110305258.447.xml},
year={2013},
bibtexkey={eger-sejane_network-typologies2013},
doi={10.1515/9783110305258.447},
inlg={English [eng]},
src={degruyter},
srctrickle={degruyter#/books/9783110305258/9783110305258.447/9783110305258.447.xml}}
• S. Eger, “Sequence Segmentation by Enumeration: An Exploration.,” Prague Bull. Math. Linguistics, vol. 100, pp. 113-131, 2013.
[Abstract] [BibTeX]

We investigate exhaustive enumeration and subsequent language model evaluation (E&E approach) as an alternative to solving the sequence segmentation problem. We show that, under certain conditions (on string lengths and regarding a possibility to accurately estimate the number of segments), which are satisfied for important NLP applications, such as phonological segmentation, syllabification, and morphological segmentation, the E&E approach is feasible and promises superior results than the standard sequence labeling approach to sequence segmentation.
@ARTICLE{Eger:2013:a,
author={Eger, Steffen},
journal={Prague Bull. Math. Linguistics},
title={Sequence Segmentation by Enumeration: An Exploration.},
year={2013},
volume={100},
pages={113-131},
abstract={We investigate exhaustive enumeration and subsequent language model evaluation (E\&E approach) as an alternative to solving the sequence segmentation problem. We show that, under certain conditions (on string lengths and regarding a possibility to accurately estimate the number of segments), which are satisfied for important NLP applications, such as phonological segmentation, syllabification, and morphological segmentation, the E\&E approach is feasible and promises superior results than the standard sequence labeling approach to sequence segmentation.},
pdf={http://ufal.mff.cuni.cz/pbml/100/art-eger.pdf}}
• S. Eger, “A Contribution to the Theory of Word Length Distribution Based on a Stochastic Word Length Distribution Model.,” Journal of Quantitative Linguistics, vol. 20, iss. 3, pp. 252-265, 2013.
[Abstract] [BibTeX]

We derive a stochastic word length distribution model based on the concept of compound distributions and show its relationships with and implications for Wimmer et al. ’s (1994) synergetic word length distribution model.
@ARTICLE{Eger:2013:b,
author={Eger, Steffen},
journal={Journal of Quantitative Linguistics},
title={A Contribution to the Theory of Word Length Distribution Based on a Stochastic Word Length Distribution Model.},
year={2013},
volume={20},
number={3},
pages={252-265},
abstract={We derive a stochastic word length distribution model based on the concept of compound distributions and show its relationships with and implications for Wimmer et al. ’s (1994) synergetic word length distribution model.}}
• S. Eger, “Sequence alignment with arbitrary steps and further generalizations, with applications to alignments in linguistics.,” Information Sciences, vol. 237, pp. 287-304, 2013.
[Abstract] [BibTeX]

We provide simple generalizations of the classical Needleman–Wunsch algorithm for aligning two sequences. First, we let both sequences be defined over arbitrary, potentially different alphabets. Secondly, we consider similarity functions between elements of both sequences with ranges in a semiring. Thirdly, instead of considering only ‘match’, ‘mismatch’ and ‘skip’ operations, we allow arbitrary non-negative alignment ‘steps’ S. Next, we present novel combinatorial formulas for the number of monotone alignments between two sequences for selected steps S. Finally, we illustrate sample applications in natural language processing that require larger steps than available in the original Needleman–Wunsch sequence alignment procedure such that our generalizations can be fruitfully adopted.
@ARTICLE{Eger:2013:c,
author={Eger, Steffen},
journal={Information Sciences},
title={Sequence alignment with arbitrary steps and further generalizations, with applications to alignments in linguistics.},
year={2013},
volume={237},
pages={287-304},
abstract={We provide simple generalizations of the classical Needleman–Wunsch algorithm for aligning two sequences. First, we let both sequences be defined over arbitrary, potentially different alphabets. Secondly, we consider similarity functions between elements of both sequences with ranges in a semiring. Thirdly, instead of considering only ‘match’, ‘mismatch’ and ‘skip’ operations, we allow arbitrary non-negative alignment ‘steps’ S. Next, we present novel combinatorial formulas for the number of monotone alignments between two sequences for selected steps S. Finally, we illustrate sample applications in natural language processing that require larger steps than available in the original Needleman–Wunsch sequence alignment procedure such that our generalizations can be fruitfully adopted.},
website={http://www.sciencedirect.com/science/article/pii/S0020025513001485}}
• S. Eger, “Restricted weighted integer compositions and extended binomial coefficients.,” Journal of Integer Sequences (electronic only), vol. 16, iss. 1, 2013.
[Abstract] [BibTeX]

We prove a simple relationship between extended binomial coefficients — natural extensions of the well-known binomial coefficients — and weighted restricted integer compositions. Moreover, wegiveaveryuseful interpretation ofextendedbinomial coefficients as representing distributions of sums of independent discrete random variables. We apply our results, e.g., to determine the distribution of the sum of k logarithmically distributed random variables, and to determining the distribution, specifying all moments, of the random variable whose values are part-products of random restricted integer compositions. Based on our findings and using the central limit theorem, we also give generalized Stirling formulae for central extended binomial coefficients. We enlarge the list of known properties of extended binomial coefficients.
@ARTICLE{Eger:2013:d,
author={Eger, Steffen},
journal={Journal of Integer Sequences (electronic only)},
title={Restricted weighted integer compositions and extended binomial coefficients.},
year={2013},
volume={16},
number={1},
issn={1530-7638},
abstract={We prove a simple relationship between extended binomial coefficients — natural extensions of the well-known binomial coefficients — and weighted restricted integer compositions. Moreover, wegiveaveryuseful interpretation ofextendedbinomial coefficients as representing distributions of sums of independent discrete random variables. We apply our results, e.g., to determine the distribution of the sum of k logarithmically distributed random variables, and to determining the distribution, specifying all moments, of the random variable whose values are part-products of random restricted integer compositions. Based on our findings and using the central limit theorem, we also give generalized Stirling formulae for central extended binomial coefficients. We enlarge the list of known properties of extended binomial coefficients.},
publisher={School of Computer Science, University of Waterloo, Waterloo, ON},
pdf={https://cs.uwaterloo.ca/journals/JIS/VOL16/Eger/eger6.pdf},
website={http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.397.3745}}
• A. Mehler, R. Schneider, and A. Storrer, Webkorpora in Computerlinguistik und Sprachforschung, R. Schneider, A. Storrer, and A. Mehler, Eds., JLCL, 2013, vol. 28.
[BibTeX]

@BOOK{Schneider:Storrer:Mehler:2013,
publisher={JLCL},
pdf={http://www.jlcl.org/2013_Heft2/H2013-2.pdf},
author={Mehler, Alexander and Schneider, Roman and Storrer, Angelika},
number={2},
series={Journal for Language Technology and Computational Linguistics (JLCL)},
volume={28},
editor={Roman Schneider and Angelika Storrer and Alexander Mehler},
pagetotal={107},
year={2013},
title={Webkorpora in Computerlinguistik und Sprachforschung},
issn={2190-6858}}
• A. Mehler, A. Lücking, T. vor der Brück, and G. Abrami, WikiNect – A Kinetic Artwork Wiki for Exhibition Visitors, 2013.
[Poster][BibTeX]

@MISC{Mehler:Luecking:vor:der:Brueck:2013:a,
url={http://scch2013.wordpress.com/},
author={Mehler, Alexander and Lücking, Andy and vor der Brück, Tim and Abrami, Giuseppe},
month={11},
year={2013},
howpublished={Poster Presentation at the Scientific Computing and Cultural Heritage 2013 Conference, Heidelberg},
title={WikiNect - A Kinetic Artwork Wiki for Exhibition Visitors},
keywords={wikinect}}
• A. Lücking, Theoretische Bausteine für einen semiotischen Ansatz zum Einsatz von Gestik in der Aphasietherapie, 2013.
[BibTeX]

@MISC{Luecking:2013:c,
url={http://www.bkl-ev.de/bkl_workshop/archiv/workshop13_programm.php},
author={Lücking, Andy},
month={05},
year={2013},
howpublished={Talk at the BKL workshop 2013, Bochum},
title={Theoretische Bausteine für einen semiotischen Ansatz zum Einsatz von Gestik in der Aphasietherapie}}
• A. Lücking, Eclectic Semantics for Non-Verbal Signs, 2013.
[BibTeX]

@MISC{Luecking:2013:d,
url={http://www.ruhr-uni-bochum.de/phil-lang/investigating/index.html},
author={Lücking, Andy},
month={10},
year={2013},
howpublished={Talk at the Conference on Investigating semantics: Empirical and philosophical approaches, Bochum},
title={Eclectic Semantics for Non-Verbal Signs}}
• A. Lücking, “Multimodal Propositions? From Semiotic to Semantic Considerations in the Case of Gestural Deictics,” in Poster Abstracts of the Proceedings of the 17th Workshop on the Semantics and Pragmatics of Dialogue, Amsterdam, 2013, pp. 221-223.
[Poster][BibTeX]

@INPROCEEDINGS{Luecking:2013:e,
booktitle={Poster Abstracts of the Proceedings of the 17th Workshop on the Semantics and Pragmatics of Dialogue},
pages={221-223},
author={Lücking, Andy},
series={SemDial 2013},
editor={Fernandez, Raquel and Isard, Amy},
month={12},
year={2013},
title={Multimodal Propositions? From Semiotic to Semantic Considerations in the Case of Gestural Deictics},
poster={https://hucompute.org/wp-content/uploads/2015/08/dialdam2013.pdf}}
• M. Z. Islam and A. Hoenen, “Source and Translation Classifiction using Most Frequent Words,” in Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP), 2013.
[Abstract] [BibTeX]

Recently, translation scholars have made some general claims about translation properties. Some of these are source language independent while others are not. Koppel and Ordan (2011) performed empirical studies to validate both types of properties using English source texts and other texts translated into English. Obviously, corpora of this sort, which focus on a single language, are not adequate for claiming universality of translation prop- erties. In this paper, we are validating both types of translation properties using original and translated texts from six European languages.
@INPROCEEDINGS{Islam:Hoenen:2013,
booktitle={Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP)},
author={Islam, Md. Zahurul and Hoenen, Armin},
year={2013},
title={Source and Translation Classifiction using Most Frequent Words},
pdf={http://www.aclweb.org/anthology/I/I13/I13-1185.pdf},
website={http://aclanthology.info/papers/source-and-translation-classification-using-most-frequent-words},
abstract={Recently, translation scholars have made some general claims about translation properties. Some of these are source language independent while others are not. Koppel and Ordan (2011) performed empirical studies to validate both types of properties using English source texts and other texts translated into English. Obviously, corpora of this sort, which focus on a single language, are not adequate for claiming universality of translation prop- erties. In this paper, we are validating both types of translation properties using original and translated texts from six European languages.}}
• A. Lücking and A. Mehler, “On Three Notions of Grounding of Artificial Dialog Companions,” Science, Technology & Innovation Studies, vol. 10, iss. 1, pp. 31-36, 2013.
[Abstract] [BibTeX]

We provide a new, theoretically motivated evaluation grid for assessing the conversational achievements of Artificial Dialog Companions (ADCs). The grid is spanned along three grounding problems. Firstly, it is argued that symbol grounding in general has to be instrinsic. Current approaches in this context, however, are limited to a certain kind of expression that can be grounded in this way. Secondly, we identify three requirements for conversational grounding, the process leading to mutual understanding. Finally, we sketch a test case for symbol grounding in the form of the philosophical grounding problem that involves the use of modal language. Together, the three grounding problems provide a grid that allows us to assess ADCs’ dialogical performances and to pinpoint future developments on these grounds.
@ARTICLE{Luecking:Mehler:2013:a,
journal={Science, Technology \& Innovation Studies},
author={Lücking, Andy and Mehler, Alexander},
year={2013},
title={On Three Notions of Grounding of Artificial Dialog Companions},
website={http://www.sti-studies.de/ojs/index.php/sti/article/view/143},
abstract={We provide a new, theoretically motivated evaluation grid for assessing the conversational achievements of Artificial Dialog Companions (ADCs). The grid is spanned along three grounding problems. Firstly, it is argued that symbol grounding in general has to be instrinsic. Current approaches in this context, however, are limited to a certain kind of expression that can be grounded in this way. Secondly, we identify three requirements for conversational grounding, the process leading to mutual understanding. Finally, we sketch a test case for symbol grounding in the form of the philosophical grounding problem that involves the use of modal language. Together, the three grounding problems provide a grid that allows us to assess ADCs’ dialogical performances and to pinpoint future developments on these grounds.},
pages={31-36},
volume={10},
number={1}}
• Die Dynamik sozialer und sprachlicher Netzwerke: Konzepte, Methoden und empirische Untersuchungen an Beispielen des WWW, B. Frank-Job, A. Mehler, and T. Sutter, Eds., Wiesbaden: Springer VS, 2013.
[Abstract] [BibTeX]

In diesem Band präsentieren Medien- und Informationswissenschaftler, Netzwerkforscher aus Informatik, Texttechnologie und Physik, Soziologen und Linguisten interdisziplinär Aspekte der Erforschung komplexer Mehrebenen-Netzwerke. Im Zentrum ihres Interesses stehen Untersuchungen zum Zusammenhang zwischen sozialen und sprachlichen Netzwerken und ihrer Dynamiken, aufgezeigt an empirischen Beispielen aus dem Bereich des Web 2.0, aber auch an historischen Dokumentenkorpora sowie an Rezeptions-Netzwerken aus Kunst- und Literaturwissenschaft.
@BOOK{FrankJob:Mehler:Sutter:2013,
publisher={Springer VS},
editor={Barbara Frank-Job and Alexander Mehler and Tilmann Sutter},
pagetotal={240},
year={2013},
title={Die Dynamik sozialer und sprachlicher Netzwerke: Konzepte, Methoden und empirische Untersuchungen an Beispielen des WWW},
abstract={In diesem Band pr{\"a}sentieren Medien- und Informationswissenschaftler, Netzwerkforscher aus Informatik, Texttechnologie und Physik, Soziologen und Linguisten interdisziplin{\"a}r Aspekte der Erforschung komplexer Mehrebenen-Netzwerke. Im Zentrum ihres Interesses stehen Untersuchungen zum Zusammenhang zwischen sozialen und sprachlichen Netzwerken und ihrer Dynamiken, aufgezeigt an empirischen Beispielen aus dem Bereich des Web 2.0, aber auch an historischen Dokumentenkorpora sowie an Rezeptions-Netzwerken aus Kunst- und Literaturwissenschaft.}}
• A. Lücking, “Interfacing Speech and Co-Verbal Gesture: Exemplification,” in Proceedings of the 35th Annual Conference of the German Linguistic Society, Potsdam, Germany, 2013, pp. 284-286.
[BibTeX]

@INPROCEEDINGS{Luecking:2013:b,
booktitle={Proceedings of the 35th Annual Conference of the German Linguistic Society},
pages={284-286},
author={Lücking, Andy},
series={DGfS 2013},
year={2013},
title={Interfacing Speech and Co-Verbal Gesture: Exemplification},
address={Potsdam, Germany}}
• A. Lücking, Ikonische Gesten. Grundzüge einer linguistischen Theorie, Berlin and Boston: De Gruyter, 2013. Zugl. Diss. Univ. Bielefeld (2011)
[Abstract] [BibTeX]

Nicht-verbale Zeichen, insbesondere sprachbegleitende Gesten, spielen eine herausragende Rolle in der menschlichen Kommunikation. Um eine Analyse von Gestik innerhalb derjenigen Disziplinen, die sich mit der Erforschung und Modellierung von Dialogen beschäftigen, zu ermöglichen, bedarf es einer entsprechenden linguistischen Rahmentheorie. „Ikonische Gesten“ bietet einen ersten zeichen- und wahrnehmungstheoretisch motivierten Rahmen an, in dem eine grammatische Analyse der Integration von Sprache und Gestik möglich ist. Ausgehend von einem Abriss semiotischer Zugänge zu ikonischen Zeichen wird der vorherrschende Ähnlichkeitsansatz unter Rückgriff auf Wahrnehmungstheorien zugunsten eines Exemplifikationsansatzes verworfen. Exemplifikation wird im Rahmen einer unifikationsbasierten Grammatik umgesetzt. Dort werden u.a. multimodale Wohlgeformtheit, Synchronie und multimodale Subkategorisierung als neue Gegenstände linguistischer Forschung eingeführt und im Rahmen einer integrativen Analyse von Sprache und Gestik modelliert.
@BOOK{Luecking:2013,
publisher={De Gruyter},
author={Lücking, Andy},
note={Zugl. Diss. Univ. Bielefeld (2011)},
year={2013},
title={Ikonische Gesten. Grundzüge einer linguistischen Theorie},
abstract={Nicht-verbale Zeichen, insbesondere sprachbegleitende Gesten, spielen eine herausragende Rolle in der menschlichen Kommunikation. Um eine Analyse von Gestik innerhalb derjenigen Disziplinen, die sich mit der Erforschung und Modellierung von Dialogen besch{\"a}ftigen, zu ermöglichen, bedarf es einer entsprechenden linguistischen Rahmentheorie. „Ikonische Gesten“ bietet einen ersten zeichen- und wahrnehmungstheoretisch motivierten Rahmen an, in dem eine grammatische Analyse der Integration von Sprache und Gestik möglich ist. Ausgehend von einem Abriss semiotischer Zug{\"a}nge zu ikonischen Zeichen wird der vorherrschende {\"A}hnlichkeitsansatz unter Rückgriff auf Wahrnehmungstheorien zugunsten eines Exemplifikationsansatzes verworfen. Exemplifikation wird im Rahmen einer unifikationsbasierten Grammatik umgesetzt. Dort werden u.a. multimodale Wohlgeformtheit, Synchronie und multimodale Subkategorisierung als neue Gegenst{\"a}nde linguistischer Forschung eingeführt und im Rahmen einer integrativen Analyse von Sprache und Gestik modelliert.}}
• M. Z. Islam and A. Mehler, “Automatic Readability Classification of Crowd-Sourced Data based on Linguistic and Information-Theoretic Features,” in 14th International Conference on Intelligent Text Processing and Computational Linguistics, 2013.
[Abstract] [BibTeX]

This paper presents a classifier of text readability based on information-theoretic features. The classifier was developed based on a linguistic approach to readability that explores lexical, syntactic and semantic features. For this evaluation we extracted a corpus of 645 articles from Wikipedia together with their quality judgments. We show that information-theoretic features perform as well as their linguistic counterparts even if we explore several linguistic levels at once.
@INPROCEEDINGS{Islam:Mehler:2013:a,
owner={zahurul},
booktitle={14th International Conference on Intelligent Text Processing and Computational Linguistics},
author={Islam, Md. Zahurul and Mehler, Alexander},
timestamp={2013.01.22},
year={2013},
title={Automatic Readability Classification of Crowd-Sourced Data based on Linguistic and Information-Theoretic Features},
website={http://www.redalyc.org/articulo.oa?id=61527437002},
abstract={This paper presents a classifier of text readability based on information-theoretic features. The classifier was developed based on a linguistic approach to readability that explores lexical, syntactic and semantic features. For this evaluation we extracted a corpus of 645 articles from Wikipedia together with their quality judgments. We show that information-theoretic features perform as well as their linguistic counterparts even if we explore several linguistic levels at once.}}
• M. Z. Islam and R. Rahman, “English to Bangla Name Transliteration System (Abstract),” in The 23rd Meeting of Computational Linguistics in the Netherlands (CLIN 2013), 2013.
[Abstract] [BibTeX]

Machine translation systems always struggle transliterating names and unknown words during the translation process. It becomes more problematic when the source and the target language use different scripts for writing. To handle this problem, transliteration systems are becoming popular as additional modules of the MT systems. In this abstract, we are presenting an English to Bangla name transliteration system that outperforms Google’s transliteration system. The transliteration system is the same as the phrase based statistical machine translation system, but it works on character level rather than on phrase level. The performance of a statistical system is directly correlated with the size of the training corpus. In this work, 2200 names are extracted from the Wikipedia cross lingual links and from Geonames . Also 3694 names are manually transliterated and added to the data. 4716 names are used for training, 590 for tuning and 588 names are used for testing. If we consider only the candidate transliterations, the system gives 64.28% accuracy. The performance increases to more than 90%, if we consider only the top 5 transliterations. To compare with the Google’s English to Bangla transliteration system, a list of 100 names are randomly selected from the test data and translated by both systems. Our system gives 63% accuracy where the Google’s transliteration system does not transliterate a single name correctly. We have found significant improvement in terms of BLUE and TER score when we add the transliteration module with an English to Bangla machine transliteration system.
@INPROCEEDINGS{Islam:Rahman:2013,
owner={zahurul},
booktitle={The 23rd Meeting of Computational Linguistics in the Netherlands (CLIN 2013)},
author={Islam, Md. Zahurul and Rahman, Rashedur},
timestamp={2013.01.22},
year={2013},
title={English to Bangla Name Transliteration System (Abstract)},
abstract={Machine translation systems always struggle transliterating names and unknown words during the translation process. It becomes more problematic when the source and the target language use different scripts for writing. To handle this problem, transliteration systems are becoming popular as additional modules of the MT systems. In this abstract, we are presenting an English to Bangla name transliteration system that outperforms Google’s transliteration system. The transliteration system is the same as the phrase based statistical machine translation system, but it works on character level rather than on phrase level.
The performance of a statistical system is directly correlated with the size of the training corpus. In this work, 2200 names are extracted from the Wikipedia cross lingual links and from Geonames . Also 3694 names are manually transliterated and added to the data. 4716 names are used for training, 590 for tuning and 588 names are used for testing.
If we consider only the candidate transliterations, the system gives 64.28% accuracy. The performance increases to more than 90%, if we consider only the top 5 transliterations. To compare with the Google’s English to Bangla transliteration system, a list of 100 names are randomly selected from the test data and translated by both systems. Our system gives 63% accuracy where the Google’s transliteration system does not transliterate a single name correctly. We have found significant improvement in terms of BLUE and TER score when we add the transliteration module with an English to Bangla machine transliteration system.}}
• A. Mehler, C. Stegbauer, and R. Gleim, “Zur Struktur und Dynamik der kollaborativen Plagiatsdokumentation am Beispiel des GuttenPlag Wiki: eine Vorstudie,” in Die Dynamik sozialer und sprachlicher Netzwerke. Konzepte, Methoden und empirische Untersuchungen am Beispiel des WWW, B. Frank-Job, A. Mehler, and T. Sutter, Eds., Wiesbaden: VS Verlag, 2013.
[BibTeX]

@INCOLLECTION{Mehler:Stegbauer:Gleim:2013,
publisher={VS Verlag},
booktitle={Die Dynamik sozialer und sprachlicher Netzwerke. Konzepte, Methoden und empirische Untersuchungen am Beispiel des WWW},
author={Mehler, Alexander and Stegbauer, Christian and Gleim, Rüdiger},
editor={Frank-Job, Barbara and Mehler, Alexander and Sutter, Tilman},
year={2013},
title={Zur Struktur und Dynamik der kollaborativen Plagiatsdokumentation am Beispiel des GuttenPlag Wiki: eine Vorstudie},
address={Wiesbaden}}
• A. Lücking, K. Bergman, F. Hahn, S. Kopp, and H. Rieser, “Data-based Analysis of Speech and Gesture: The Bielefeld Speech and Gesture Alignment Corpus (SaGA) and its Applications,” Journal of Multimodal User Interfaces, vol. 7, iss. 1-2, pp. 5-18, 2013.
[Abstract] [BibTeX]

Communicating face-to-face, interlocutors frequently produce multimodal meaning packages consisting of speech and accompanying gestures. We discuss a systematically annotated speech and gesture corpus consisting of 25 route-and-landmark-description dialogues, the Bielefeld Speech and Gesture Alignment corpus (SaGA), collected in experimental face-to-face settings. We first describe the primary and secondary data of the corpus and its reliability assessment. Then we go into some of the projects carried out using SaGA demonstrating the wide range of its usability: on the empirical side, there is work on gesture typology, individual and contextual parameters influencing gesture production and gestures’ functions for dialogue structure. Speech-gesture interfaces have been established extending unification-based grammars. In addition, the development of a computational model of speech-gesture alignment and its implementation constitutes a research line we focus on.
@ARTICLE{Luecking:Bergmann:Hahn:Kopp:Rieser:2012,
journal={Journal of Multimodal User Interfaces},
author={Lücking, Andy and Bergman, Kirsten and Hahn, Florian and Kopp, Stefan and Rieser, Hannes},
doi={10.1007/s12193-012-0106-8},
year={2013},
volume={7},
number={1-2},
pages={5-18},
title={Data-based Analysis of Speech and Gesture: The Bielefeld Speech and Gesture Alignment Corpus (SaGA) and its Applications},
abstract={Communicating face-to-face, interlocutors frequently produce multimodal meaning packages consisting of speech and accompanying gestures. We discuss a systematically annotated speech and gesture corpus consisting of 25 route-and-landmark-description dialogues, the Bielefeld Speech and Gesture Alignment corpus (SaGA), collected in experimental face-to-face settings. We first describe the primary and secondary data of the corpus and its reliability assessment. Then we go into some of the projects carried out using SaGA demonstrating the wide range of its usability: on the empirical side, there is work on gesture typology, individual and contextual parameters influencing gesture production and gestures’ functions for dialogue structure. Speech-gesture interfaces have been established extending unification-based grammars. In addition, the development of a computational model of speech-gesture alignment and its implementation constitutes a research line we focus on.},
pdf={https://hucompute.org/wp-content/uploads/2015/08/MMUI-SaGA-revision2.pdf}}

### 2012 (26)

• O. Abramov, “Network theory applied to linguistics: new advances in language classification and typology,” PhD Thesis, 2012.
[Abstract] [BibTeX]

This thesis bridges between two scientific fields -- linguistics and computer science -- in terms of Linguistic Networks. From the linguistic point of view we examine whether languages can be distinguished when looking at network topology of different linguistic networks. We deal with up to 17 languages and ask how far the methods of network theory reveal the peculiarities of single languages. We present and apply network models from different levels of linguistic representation: syntactic, phonological and morphological. The network models presented here allow to integrate various linguistic features at once, which enables a more abstract, holistic view at the particular language. From the point of view of computer science we elaborate the instrumentarium of network theory applying it to a new field. We study the expressiveness of different network features and their ability to characterize language structure. We evaluate the interplay of these features and their goodness in the task of classifying languages genealogically. Among others we compare network features related to: average degree, average geodesic distance, clustering, entropy-based indices, assortativity, centrality, compactness etc. We also propose some new indices that can serve as additional characteristics of networks. The results obtained show that network models succeed in classifying related languages, and allow to study language structure in general. The mathematical analysis of the particular network indices brings new insights into the nature of these indices and their potential when applied to different networks.
@PHDTHESIS{Abramov:2012,
year={2012},
school={Bielefeld University, Germany},
author={Abramov, Olga},
title={Network theory applied to linguistics: new advances in language classification and typology},
website={http://pub.uni-bielefeld.de/publication/2538828},
abstract={This thesis bridges between two scientific fields -- linguistics and computer science -- in terms of Linguistic Networks. From the linguistic point of view we examine whether languages can be distinguished when looking at network topology of different linguistic networks. We deal with up to 17 languages and ask how far the methods of network theory reveal the peculiarities of single languages. We present and apply network models from different levels of linguistic representation: syntactic, phonological and morphological. The network models presented here allow to integrate various linguistic features at once, which enables a more abstract, holistic view at the particular language. From the point of view of computer science we elaborate the instrumentarium of network theory applying it to a new field. We study the expressiveness of different network features and their ability to characterize language structure. We evaluate the interplay of these features and their goodness in the task of classifying languages genealogically. Among others we compare network features related to: average degree, average geodesic distance, clustering, entropy-based indices, assortativity, centrality, compactness etc. We also propose some new indices that can serve as additional characteristics of networks. The results obtained show that network models succeed in classifying related languages, and allow to study language structure in general. The mathematical analysis of the particular network indices brings new insights into the nature of these indices and their potential when applied to different networks.}}
• A. Hoenen, “Measuring Repetitiveness in Texts, a Preliminary Investigation,” Sprache und Datenverarbeitung. International Journal for Language Data Processing, vol. 36, iss. 2, pp. 93-104, 2012.
[Abstract] [BibTeX]

In this paper, a model is presented for the automatic measurement that can systematically describe the usage and function of the phenomenon of repetition in written text. The motivating hypothesis for this study is that the more repetitive a text is, the easier it is to memorize. Therefore, an automated measurement index can provide feedback to writers and for those who design texts that are often memorized including songs, holy texts, theatrical plays, and advertising slogans. The potential benefits of this kind of systematic feedback are numerous, the main one being that content creators would be able to employ a standard threshold of memorizability. This study explores multiple ways of implementing and calculating repetitiveness across levels of analysis (such as paragraph-level or sub-word level) genres (such as songs, holy texts, and other genres) and languages, integrating these into the a model for the automatic measurement of repetitiveness. The Avestan language and some of its idiosyncratic features are explored in order to illuminate how the proposed index is applied in the ranking of texts according to their repetitiveness.
@ARTICLE{Hoenen:2012:a,
journal={Sprache und Datenverarbeitung. International Journal for Language Data Processing},
pages={93-104},
number={2},
author={Hoenen, Armin},
volume={36},
year={2012},
title={Measuring Repetitiveness in Texts, a Preliminary Investigation},
abstract={In this paper, a model is presented for the automatic measurement that can systematically describe the usage and function of the phenomenon of repetition in written text. The motivating hypothesis for this study is that the more repetitive a text is, the easier it is to memorize. Therefore, an automated measurement index can provide feedback to writers and for those who design texts that are often memorized including songs, holy texts, theatrical plays, and advertising slogans. The potential benefits of this kind of systematic feedback are numerous, the main one being that content creators would be able to employ a standard threshold of memorizability. This study explores multiple ways of implementing and calculating repetitiveness across levels of analysis (such as paragraph-level or sub-word level) genres (such as songs, holy texts, and other genres) and languages, integrating these into the a model for the automatic measurement of repetitiveness. The Avestan language and some of its idiosyncratic features are explored in order to illuminate how the proposed index is applied in the ranking of texts according to their repetitiveness.},
website={http://www.linse.uni-due.de/jahrgang-36-2012/articles/measuring-repetitiveness-in-texts-a-preliminary-investigation.html}}
• S. Eger, “The Combinatorics of String Alignments: Reconsidering the Problem.,” Journal of Quantitative Linguistics, vol. 19, iss. 1, pp. 32-53, 2012.
[Abstract] [BibTeX]

In recent work, Covington discusses the number of alignments of two strings. Thereby, Covington defines an alignment as “a way of pairing up elements of two strings, optionally skipping some but preserving the order”. This definition has drawbacks as it excludes many relevant situations. In this work, we specify the notion of an alignment so that many linguistically interesting situations are covered. To this end, we define an alignment in an abstract manner as a set of pairs and then define three properties on such sets. Secondly, we specify the numbers of possibilities of aligning two strings in each case.
@ARTICLE{Eger:2012:a,
author={Eger, Steffen},
journal={Journal of Quantitative Linguistics},
title={The Combinatorics of String Alignments: Reconsidering the Problem.},
year={2012},
volume={19},
number={1},
pages={32-53},
abstract={In recent work, Covington discusses the number of alignments of two strings. Thereby, Covington defines an alignment as “a way of pairing up elements of two strings, optionally skipping some but preserving the order”. This definition has drawbacks as it excludes many relevant situations. In this work, we specify the notion of an alignment so that many linguistically interesting situations are covered. To this end, we define an alignment in an abstract manner as a set of pairs and then define three properties on such sets. Secondly, we specify the numbers of possibilities of aligning two strings in each case.},
website={ http://www.tandfonline.com/doi/full/10.1080/09296174.2011.638792#tabModule}}
• S. Eger, “S-Restricted Monotone Alignments: Algorithm, Search Space, and Applications,” in Proceedings of COLING 2012, Mumbai, India, 2012, pp. 781-798.
[Abstract] [BibTeX]

We present a simple and straightforward alignment algorithm for monotone many-to-many alignments in grapheme-to-phoneme conversion and related fields such as morphology, and discuss a few noteworthy extensions. Moreover, we specify combinatorial formulas for monotone many-to-many alignments and decoding in G2P which indicate that exhaustive enumeration is generally possible, so that some limitations of our approach can easily be overcome. Finally, we present a decoding scheme, within the monotone many-to-many alignment paradigm, that relates the decoding problem to restricted integer compositions and that is, putatively, superior to alternatives suggested in the literatur
@INPROCEEDINGS{Eger:2012:b,
booktitle={Proceedings of COLING 2012},
pages={781-798},
author={Eger, Steffen},
year={2012},
title={S-Restricted Monotone Alignments: Algorithm, Search Space, and Applications},
publisher={The COLING 2012 Organizing Committee},
abstract={We present a simple and straightforward alignment algorithm for monotone many-to-many alignments in grapheme-to-phoneme conversion and related fields such as morphology, and discuss a few noteworthy extensions. Moreover, we specify combinatorial formulas for monotone many-to-many alignments and decoding in G2P which indicate that exhaustive enumeration is generally possible, so that some limitations of our approach can easily be overcome. Finally, we present a decoding scheme, within the monotone many-to-many alignment paradigm, that relates the decoding problem to restricted integer compositions and that is, putatively, superior to alternatives suggested in the literatur},
website={http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.370.5941},
pdf={http://aclweb.org/anthology/C/C12/C12-1048.pdf}}
• S. Eger, “Lexical semantic typologies from bilingual corpora – A framework,” in SEM 2012: The First Joint Conference on Lexical and Computational Semantics — Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), Montreal, Canada, 2012, pp. 90-94.
[Abstract] [BibTeX]

We present a framework, based on Sejane and Eger (2012), for inducing lexical semantic typologies for groups of languages. Our framework rests on lexical semantic association networks derived from encoding, via bilingual corpora, each language in a common reference language, the tertium comparationis, so that distances between languages can easily be determined.
@INPROCEEDINGS{Eger:2012:c,
booktitle={SEM 2012: The First Joint Conference on Lexical and Computational Semantics -- Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)},
pages={90-94},
author={Eger, Steffen},
year={2012},
title={Lexical semantic typologies from bilingual corpora - A framework},
publisher={Association for Computational Linguistics},
abstract={We present a framework, based on Sejane and Eger (2012), for inducing lexical semantic typologies for groups of languages. Our framework rests on lexical semantic association networks derived from encoding, via bilingual corpora, each language in a common reference language, the tertium comparationis, so that distances between languages can easily be determined.},
website={http://dl.acm.org/citation.cfm?id=2387653},
pdf={http://www.aclweb.org/anthology/S12-1015}}
• A. Mehler, C. Stegbauer, and R. Gleim, “Latent Barriers in Wiki-based Collaborative Writing,” in Proceedings of the Wikipedia Academy: Research and Free Knowledge. June 29 – July 1 2012, Berlin, 2012.
[BibTeX]

@INPROCEEDINGS{Mehler:Stegbauer:Gleim:2012:b,
booktitle={Proceedings of the Wikipedia Academy: Research and Free Knowledge. June 29 - July 1 2012},
author={Mehler, Alexander and Stegbauer, Christian and Gleim, Rüdiger},
month={July},
year={2012},
title={Latent Barriers in Wiki-based Collaborative Writing},
address={Berlin}}
• A. Hoenen and T. Jügel, Altüberlieferte Sprachen als Gegenstand der Texttechnologie — Ancient Languages as the Object of Text Technology, A. Hoenen and T. Jügel, Eds., JLCL, 2012, vol. 27.
[Abstract] [BibTeX]

‘Avestan’ is the name of the ritual language of Zor oastrianism, which was the state religion of the Iranian empire in Achaemenid, Arsacid and Sasanid times, covering a time span of more than 1200 years. [1] It is named after the ‘Avesta’, i.e., the collection of holy scriptures that form the basis of the religion which was allegedly founded by Zarathushtra, also known as Zoroaster, by about the beginning of the first millennium B.C. Together with Vedic Sanskrit, Avestan represents one of the most archaic witnesses of the Indo-Iranian branch of the Indo-European languages, which makes it especially interesting for historical-comparative linguistics. This is why the texts of the Avesta were among the first objects of electronic corpus building that were undertaken in the framework of Indo-European studies, leading to the establishment of the TITUS database (‘Thesaurus indogermanischer Text- u nd Sprachmaterialien’). [2] Today, the complete Avestan corpus is available, together with elaborate search functions [3] and an extended version of the subcorpus of the so-called ‘Yasna’, which covers a great deal of the attestation of variant readings. [4] Right from the beginning of their computational work concerning the Avesta, the compilers [5] had to cope with the fact that the texts contained in it have been transmitted in a special script written from right to left, which was also used for printing them in the scholarly editions used until today. [6] It goes without saying that there was no way in the middle of the 1980s to encode the Avestan scriptures exactly as they are found in the manuscripts. Instead, we had to rely upon transcriptional devices that were dictated by the restrictions of character encoding as provided by the computer systems used. As the problems we had to face in this respect and the solutions we could apply are typical for the development of computational work on ancient languages, it seems worthwhile to sketch them out here.
@BOOK{Hoenen:Jügel:2012,
publisher={JLCL},
author={Hoenen, Armin and Jügel, Thomas},
number={2},
volume={27},
editor={Armin Hoenen and Thomas Jügel},
pdf={http://www.jlcl.org/2012_Heft2/H2012-2.pdf},
year={2012},
title={Altüberlieferte Sprachen als Gegenstand der Texttechnologie -- Ancient Languages as the Object of Text Technology},
abstract={‘Avestan’ is the name of the ritual language of Zor oastrianism, which was the state religion of the Iranian empire in Achaemenid, Arsacid and Sasanid times, covering a time span of more than 1200 years. [1] It is named after the ‘Avesta’, i.e., the collection of holy scriptures that form the basis of the religion which was allegedly founded by Zarathushtra, also known as Zoroaster, by about the beginning of the first millennium B.C. Together with Vedic Sanskrit, Avestan represents one of the most archaic witnesses of the Indo-Iranian branch of the Indo-European languages, which makes it especially interesting for historical-comparative linguistics. This is why the texts of the Avesta were among the first objects of electronic corpus building that were undertaken in the framework of Indo-European studies, leading to the establishment of the TITUS database (‘Thesaurus indogermanischer Text- u nd Sprachmaterialien’). [2] Today, the complete Avestan corpus is available, together with elaborate search functions [3] and an extended version of the subcorpus of the so-called ‘Yasna’, which covers a great deal of the attestation of variant readings. [4] Right from the beginning of their computational work concerning the Avesta, the compilers [5] had to cope with the fact that the texts contained in it have been transmitted in a special script written from right to left, which was also used for printing them in the scholarly editions used until today. [6] It goes without saying that there was no way in the middle of the 1980s to encode the Avestan scriptures exactly as they are found in the manuscripts. Instead, we had to rely upon transcriptional devices that were dictated by the restrictions of character encoding as provided by the computer systems used. As the problems we had to face in this respect and the solutions we could apply are typical for the development of computational work on ancient languages, it seems worthwhile to sketch them out here.},
issn={2190-6858}}
• T. vor der Brück, Wissensakquisition mithilfe maschineller Lernverfahren auf tiefen semantischen Repräsentationen, Heidelberg, Germany: Springer, 2012.
[Abstract] [BibTeX]

Eine große Wissensbasis ist eine Voraussetzung für eine Vielzahl von Anwendungen im Bereich der automatischen Sprachverarbeitung, wie Frage-Antwort- oder Information-Retrieval-Systeme. Ein Mensch hat sich das erforderliche Wissen, um Informationen zu suchen oder Fragen zu beantworten, im Laufe seines Lebens angeeignet. Einem Computer muss dieses Wissen explizit mitgeteilt werden. Tim vor der Brück beschreibt einen Ansatz, wie ein Computer dieses Wissen ähnlich wie ein Mensch durch die Lektüre von Texten erwerben kann. Dabei kommen Methoden der Logik und des maschinellen Lernens zum Einsatz.
@BOOK{vor:der:Brueck:2012:a,
publisher={Springer},
school={FernUniversit{\"a}t in Hagen},
author={vor der Brück, Tim},
year={2012},
title={Wissensakquisition mithilfe maschineller Lernverfahren auf tiefen semantischen Repr{\"a}sentationen},
abstract={Eine gro{\ss}e Wissensbasis ist eine Voraussetzung für eine Vielzahl von Anwendungen im Bereich der automatischen Sprachverarbeitung, wie Frage-Antwort- oder Information-Retrieval-Systeme. Ein Mensch hat sich das erforderliche Wissen, um Informationen zu suchen oder Fragen zu beantworten, im Laufe seines Lebens angeeignet. Einem Computer muss dieses Wissen explizit mitgeteilt werden. Tim vor der Brück beschreibt einen Ansatz, wie ein Computer dieses Wissen {\"a}hnlich wie ein Mensch durch die Lektüre von Texten erwerben kann. Dabei kommen Methoden der Logik und des maschinellen Lernens zum Einsatz.}}
• T. vor der Brück and Y. Wang, “Synonymy Extraction from Semantic Networks Using String and Graph Kernel Methods,” in Proceedings of the 20th European Conference on Artificial Intelligence (ECAI), Montpellier, France, 2012, pp. 822-827.
[Abstract] [BibTeX]

Synonyms are a highly relevant information source for natural language processing. Automatic synonym extraction methods have in common that they are either applied on the surface representation of the text or on a syntactical structure derived from it. In this paper, however, we present a semantic synonym extraction approach that operates directly on semantic networks (SNs), which were derived from text by a deep syntactico-semantic analysis. Synonymy hypotheses are extracted from the SNs by graph matching. These hypotheses are then validated by a support vector machine (SVM) employing a combined graph and string kernel. Our method was compared to several other approaches and the evaluation has shown that our results are considerably superior
@INPROCEEDINGS{vor:der:Brueck:Wang:2012,
booktitle={Proceedings of the 20th European Conference on Artificial Intelligence (ECAI)},
pages={822--827},
author={vor der Brück, Tim and Wang, Yu-Fang},
year={2012},
title={Synonymy Extraction from Semantic Networks Using String and Graph Kernel Methods},
abstract={Synonyms are a highly relevant information source for natural language processing. Automatic synonym extraction methods have in common that they are either applied on the surface representation of the text or on a syntactical structure derived from it. In this paper, however, we present a semantic synonym extraction approach that operates directly on semantic networks (SNs), which were derived from text by a deep syntactico-semantic analysis. Synonymy hypotheses are extracted from the SNs by graph matching. These hypotheses are then validated by a support vector machine (SVM) employing a combined graph and string kernel. Our method was compared to several other approaches and the evaluation has shown that our results are considerably superior},
website={http://ebooks.iospress.nl/publication/7076},
pdf={http://www.vdb1.de/papers/ECAI_535.pdf}}
• T. vor der Brück, “Hyponym Extraction Employing a Weighted Graph Kernel,” in Statistical and Machine Learning Approaches for Network Analysis, M. Dehmer and S. C. Basak, Eds., Hoboken, New Jersey: Wiley, 2012.
[BibTeX]

@INCOLLECTION{vor:der:Brueck:2012:b,
publisher={Wiley},
booktitle={Statistical and Machine Learning Approaches for Network Analysis},
author={vor der Brück, Tim},
editor={Matthias Dehmer and Subhash C. Basak},
year={2012},
title={Hyponym Extraction Employing a Weighted Graph Kernel},
address={Hoboken, New Jersey}}
• M. Z. Islam, A. Mehler, and R. Rahman, “Text Readability Classification of Textbooks of a Low-Resource Language,” in Accepted in the 26th Pacific Asia Conference on Language, Information, and Computation (PACLIC 26), 2012.
[Abstract] [BibTeX]

There are many languages considered to be low-density languages, either because the population speaking the language is not very large, or because insufficient digitized text material is available in the language even though millions of people speak the language. Bangla is one of the latter ones. Readability classification is an important Natural Language Processing (NLP) application that can be used to judge the quality of documents and assist writers to locate possible problems. This paper presents a readability classifier of Bangla textbook documents based on information-theoretic and lexical features. The features proposed in this paper result in an F-score that is 50% higher than that for traditional readability formulas.
@INPROCEEDINGS{Islam:Mehler:Rahman:2012,
owner={zahurul},
booktitle={Accepted in the 26th Pacific Asia Conference on Language, Information, and Computation (PACLIC 26)},
author={Islam, Md. Zahurul and Mehler, Alexander and Rahman, Rashedur},
timestamp={2012.08.14},
year={2012},
title={Text Readability Classification of Textbooks of a Low-Resource Language},
abstract={There are many languages considered to be low-density languages, either because the population speaking the language is not very large, or because insufficient digitized text material is available in the language even though millions of people speak the language. Bangla is one of the latter ones. Readability classification is an important Natural Language Processing (NLP) application that can be used to judge the quality of documents and assist writers to locate possible problems. This paper presents a readability classifier of Bangla textbook documents based on information-theoretic and lexical features. The features proposed in this paper result in an F-score that is 50% higher than that for traditional readability formulas.},
pdf={http://www.aclweb.org/anthology/Y12-1059},
website={http://www.researchgate.net/publication/256648250_Text_Readability_Classification_of_Textbooks_of_a_Low-Resource_Language}}
• A. Mehler, L. Romary, and D. Gibbon, “Introduction: Framing Technical Communication,” in Handbook of Technical Communication, A. Mehler, L. Romary, and D. Gibbon, Eds., Berlin and Boston: De Gruyter Mouton, 2012, vol. 8, pp. 1-26.
[BibTeX]

@INCOLLECTION{Mehler:Romary:Gibbon:2012,
publisher={De Gruyter Mouton},
booktitle={Handbook of Technical Communication},
pages={1-26},
author={Mehler, Alexander and Romary, Laurent and Gibbon, Dafydd},
series={Handbooks of Applied Linguistics},
volume={8},
editor={Alexander Mehler and Laurent Romary and Dafydd Gibbon},
year={2012},
title={Introduction: Framing Technical Communication}}
• A. Mehler and A. Lücking, “Pathways of Alignment between Gesture and Speech: Assessing Information Transmission in Multimodal Ensembles,” in Proceedings of the International Workshop on Formal and Computational Approaches to Multimodal Communication under the auspices of ESSLLI 2012, Opole, Poland, 6-10 August, 2012.
[Abstract] [BibTeX]

We present an empirical account of multimodal ensembles based on Hjelmslev’s notion of selection. This is done to get measurable evidence for the existence of speech-and-gesture ensembles. Utilizing information theory, we show that there is an information transmission that makes a gestures’ representation technique predictable when merely knowing its lexical affiliate – in line with the notion of the primacy of language. Thus, there is evidence for a one-way coupling – going from words to gestures – that leads to speech-and-gesture alignment and underlies the constitution of multimodal ensembles.
@INPROCEEDINGS{Mehler:Luecking:2012:d,
booktitle={Proceedings of the International Workshop on Formal and Computational Approaches to Multimodal Communication under the auspices of ESSLLI 2012, Opole, Poland, 6-10 August},
author={Mehler, Alexander and Lücking, Andy},
editor={Gianluca Giorgolo and Katya Alahverdzhieva},
year={2012},
title={Pathways of Alignment between Gesture and Speech: Assessing Information Transmission in Multimodal Ensembles},
abstract={We present an empirical account of multimodal ensembles based on Hjelmslev’s notion of selection. This is done to get measurable evidence for the existence of speech-and-gesture ensembles. Utilizing information theory, we show that there is an information transmission that makes a gestures’ representation technique predictable when merely knowing its lexical affiliate – in line with the notion of the primacy of language. Thus, there is evidence for a one-way coupling – going from words to gestures – that leads to speech-and-gesture alignment and underlies the constitution of multimodal ensembles.},
website={http://www.researchgate.net/publication/268368670_Pathways_of_Alignment_between_Gesture_and_Speech_Assessing_Information_Transmission_in_Multimodal_Ensembles},
keywords={wikinect}}
• A. Lücking, “Towards a Conceptual, Unification-based Speech-Gesture Interface,” in Proceedings of the International Workshop on Formal and Computational Approaches to Multimodal Communication under the auspices of ESSLLI 2012, Opole, Poland, 6-10 August, 2012.
[Abstract] [BibTeX]

A framework for grounding the semantics of co-verbal iconic gestures is presented. A resemblance account to iconicity is discarded in favor of an exemplification approach. It is sketched how exemplification can be captured within a unification-based grammar that provides a conceptual interface. Gestures modeled as vector sequences are the exemplificational base. Some hypotheses that follow from the general account are pointed at and remaining challenges are discussed.
@INPROCEEDINGS{Luecking:2012,
booktitle={Proceedings of the International Workshop on Formal and Computational Approaches to Multimodal Communication under the auspices of ESSLLI 2012, Opole, Poland, 6-10 August},
author={Lücking, Andy},
editor={Gianluca Giorgolo and Katya Alahverdzhieva},
year={2012},
title={Towards a Conceptual, Unification-based Speech-Gesture Interface},
abstract={A framework for grounding the semantics of co-verbal iconic gestures is presented. A resemblance account to iconicity is discarded in favor of an exemplification approach. It is sketched how exemplification can be captured within a unification-based grammar that provides a conceptual interface. Gestures modeled as vector sequences are the exemplificational base. Some hypotheses that follow from the general account are pointed at and remaining challenges are discussed.},
pdf={https://hucompute.org/wp-content/uploads/2015/08/FoCoMoC2012-1.pdf}}
• A. Mehler and A. Lücking, “WikiNect: Towards a Gestural Writing System for Kinetic Museum Wikis,” in Proceedings of the International Workshop On User Experience in e-Learning and Augmented Technologies in Education (UXeLATE 2012) in Conjunction with ACM Multimedia 2012, 29 October- 2 November, Nara, Japan, 2012, pp. 7-12.
[Abstract] [BibTeX]

We introduce WikiNect as a kinetic museum information system that allows museum visitors to give on-site feedback about exhibitions. To this end, WikiNect integrates three approaches to Human-Computer Interaction (HCI): games with a purpose, wiki-based collaborative writing and kinetic text-technologies. Our aim is to develop kinetic technologies as a new paradigm of HCI. They dispense with classical interfaces (e.g., keyboards) in that they build on non-contact modes of communication like gestures or facial expressions as input displays. In this paper, we introduce the notion of gestural writing as a kinetic text-technology that underlies WikiNect to enable museum visitors to communicate their feedback. The basic idea is to explore sequences of gestures that share the semantic expressivity of verbally manifested speech acts. Our task is to identify such gestures that are learnable on-site in the usage scenario of WikiNect. This is done by referring to so-called transient gestures as part of multimodal ensembles, which are candidate gestures of the desired functionality. 
@INPROCEEDINGS{Mehler:Luecking:2012:c,
booktitle={Proceedings of the International Workshop On User Experience in e-Learning and Augmented Technologies in Education (UXeLATE 2012) in Conjunction with ACM Multimedia 2012, 29 October- 2 November, Nara, Japan},
author={Mehler, Alexander and Lücking, Andy},
pages={7-12},
year={2012},
title={WikiNect: Towards a Gestural Writing System for Kinetic Museum Wikis},
abstract={We introduce WikiNect as a kinetic museum information system that allows museum visitors to give on-site feedback about exhibitions. To this end, WikiNect integrates three approaches to Human-Computer Interaction (HCI): games with a purpose, wiki-based collaborative writing and kinetic text-technologies. Our aim is to develop kinetic technologies as a new paradigm of HCI. They dispense with classical interfaces (e.g., keyboards) in that they build on non-contact modes of communication like gestures or facial expressions as input displays. In this paper, we introduce the notion of gestural writing as a kinetic text-technology that underlies WikiNect to enable museum visitors to communicate their feedback. The basic idea is to explore sequences of gestures that share the semantic expressivity of verbally manifested speech acts. Our task is to identify such gestures that are learnable on-site in the usage scenario of WikiNect. This is done by referring to so-called transient gestures as part of multimodal ensembles, which are candidate gestures of the desired functionality. },
website={http://www.researchgate.net/publication/262319200_WikiNect_towards_a_gestural_writing_system_for_kinetic_museum_wikis},
keywords={wikinect}}
• R. Gleim, A. Mehler, and A. Ernst, “SOA implementation of the eHumanities Desktop,” in Proceedings of the Workshop on Service-oriented Architectures (SOAs) for the Humanities: Solutions and Impacts, Digital Humanities 2012, Hamburg, Germany, 2012.
[Abstract] [BibTeX]

The eHumanities Desktop is a system which allows users to upload, organize and share resources using a web interface. Furthermore resources can be processed, annotated and analyzed in various ways. Registered users can organize themselves in groups and collaboratively work on their data. The eHumanities Desktop is platform independent and runs in a web browser. This paper presents the system focusing on its service orientation and process management.
@INPROCEEDINGS{Gleim:Mehler:Ernst:2012,
booktitle={Proceedings of the Workshop on Service-oriented Architectures (SOAs) for the Humanities: Solutions and Impacts, Digital Humanities 2012, Hamburg, Germany},
author={Gleim, Rüdiger and Mehler, Alexander and Ernst, Alexandra},
year={2012},
title={SOA implementation of the eHumanities Desktop},
abstract={The eHumanities Desktop is a system which allows users to upload, organize and share resources using a web interface. Furthermore resources can be processed, annotated and analyzed in various ways. Registered users can organize themselves in groups and collaboratively work on their data. The eHumanities Desktop is platform independent and runs in a web browser. This paper presents the system focusing on its service orientation and process management.},
pdf={https://hucompute.org/wp-content/uploads/2015/08/dhc2012.pdf}}
• A. Mehler and C. Stegbauer, “On the Self-similarity of Intertextual Structures in Wikipedia,” in Proceedings of the HotSocial ’12: The First ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research, Beijing, China, 2012, pp. 65-68.
[BibTeX]

@INPROCEEDINGS{Mehler:Stegbauer:2012,
booktitle={Proceedings of the HotSocial '12: The First ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research},
pages={65-68},
author={Mehler, Alexander and Stegbauer, Christian},
editor={Xiaoming Fu and Peter Gloor and Jie Tang},
year={2012},
website={http://dl.acm.org/citation.cfm?id=2392633&bnc=1},
title={On the Self-similarity of Intertextual Structures in Wikipedia},
pdf={http://wan.poly.edu/KDD2012/forms/workshop/HotSocial12/doc/p64_mehler.pdf}}
• A. Mehler, S. Schwandt, R. Gleim, and A. Ernst, “Inducing Linguistic Networks from Historical Corpora: Towards a New Method in Historical Semantics,” in Proceedings of the Conference on New Methods in Historical Corpora, P. Bennett, M. Durrell, S. Scheible, and R. J. Whitt, Eds., Tübingen: Narr, 2012, vol. 3, pp. 257-274.
[BibTeX]

@INCOLLECTION{Mehler:Schwandt:Gleim:Ernst:2012,
publisher={Narr},
booktitle={Proceedings of the Conference on New Methods in Historical Corpora},
pages={257--274},
author={Mehler, Alexander and Schwandt, Silke and Gleim, Rüdiger and Ernst, Alexandra},
series={Corpus linguistics and Interdisciplinary perspectives on language (CLIP)},
editor={Paul Bennett and Martin Durrell and Silke Scheible and Richard J. Whitt},
year={2012},
volume={3},
title={Inducing Linguistic Networks from Historical Corpora: Towards a New Method in Historical Semantics},
address={Tübingen}}
• A. Lücking, S. Ptock, and K. Bergmann, “Assessing Agreement on Segmentations by Means of Staccato, the Segmentation Agreement Calculator according to Thomann,” in Gesture and Sign Language in Human-Computer Interaction and Embodied Communication, E. Efthimiou, G. Kouroupetroglou, and S. Fotina, Eds., Berlin and Heidelberg: Springer, 2012, vol. 7206, pp. 129-138.
[Abstract] [BibTeX]

Staccato, the Segmentation Agreement Calculator According to Thomann , is a software tool for assessing the degree of agreement of multiple segmentations of some time-related data (e.g., gesture phases or sign language constituents). The software implements an assessment procedure developed by Bruno Thomann and will be made publicly available. The article discusses the rationale of the agreement assessment procedure and points at future extensions of Staccato.
@INCOLLECTION{Luecking:Ptock:Bergmann:2012,
publisher={Springer},
booktitle={Gesture and Sign Language in Human-Computer Interaction and Embodied Communication},
booksubtitle={9th International Gesture Workshop, GW 2011, Athens, Greece, May 2011, Revised Selected Papers},
pages={129-138},
author={Lücking, Andy and Ptock, Sebastian and Bergmann, Kirsten},
series={Lecture Notes in Artificial Intelligence},
volume={7206},
editor={Eleni Efthimiou and Georgios Kouroupetroglou and Stavroula-Evita Fotina},
year={2012},
title={Assessing Agreement on Segmentations by Means of Staccato, the Segmentation Agreement Calculator according to Thomann},
abstract={Staccato, the Segmentation Agreement Calculator According to Thomann , is a software tool for assessing the degree of agreement of multiple segmentations of some time-related data (e.g., gesture phases or sign language constituents). The software implements an assessment procedure developed by Bruno Thomann and will be made publicly available. The article discusses the rationale of the agreement assessment procedure and points at future extensions of Staccato.},
}
• A. Mehler, A. Lücking, and P. Menke, “Assessing Cognitive Alignment in Different Types of Dialog by means of a Network Model,” Neural Networks, vol. 32, pp. 159-164, 2012.
[Abstract] [BibTeX]

We present a network model of dialog lexica, called TiTAN (Two-layer Time-Aligned Network) series. TiTAN  series capture the formation and structure of dialog lexica in terms of serialized graph representations. The dynamic update of TiTAN  series is driven by the dialog-inherent timing of turn-taking. The model provides a link between neural, connectionist underpinnings of dialog lexica on the one hand and observable symbolic behavior on the other. On the neural side, priming and spreading activation are modeled in terms of TiTAN networking. On the symbolic side, TiTAN  series account for cognitive alignment in terms of the structural coupling of the linguistic representations of dialog partners. This structural stance allows us to apply TiTAN  in machine learning of data of dialogical alignment. In previous studies, it has been shown that aligned dialogs can be distinguished from non-aligned ones by means of TiTAN -based modeling. Now, we simultaneously apply this model to two types of dialog: task-oriented, experimentally controlled dialogs on the one hand and more spontaneous, direction giving dialogs on the other. We ask whether it is possible to separate aligned dialogs from non-aligned ones in a type-crossing way. Starting from a recent experiment (Mehler, Lücking, & Menke, 2011a), we show that such a type-crossing classification is indeed possible. This hints at a structural fingerprint left by alignment in networks of linguistic items that are routinely co-activated during conversation.
@ARTICLE{Mehler:Luecking:Menke:2012,
journal={Neural Networks},
author={Mehler, Alexander and Lücking, Andy and Menke, Peter},
doi={10.1016/j.neunet.2012.02.013},
volume={32},
pages={159-164},
year={2012},
title={Assessing Cognitive Alignment in Different Types of Dialog by means of a Network Model},
website={http://www.sciencedirect.com/science/article/pii/S0893608012000421},
abstract={We present a network model of dialog lexica, called TiTAN (Two-layer Time-Aligned Network) series. TiTAN  series capture the formation and structure of dialog lexica in terms of serialized graph representations. The dynamic update of TiTAN  series is driven by the dialog-inherent timing of turn-taking. The model provides a link between neural, connectionist underpinnings of dialog lexica on the one hand and observable symbolic behavior on the other. On the neural side, priming and spreading activation are modeled in terms of TiTAN networking. On the symbolic side, TiTAN  series account for cognitive alignment in terms of the structural coupling of the linguistic representations of dialog partners. This structural stance allows us to apply TiTAN  in machine learning of data of dialogical alignment. In previous studies, it has been shown that aligned dialogs can be distinguished from non-aligned ones by means of TiTAN -based modeling. Now, we simultaneously apply this model to two types of dialog: task-oriented, experimentally controlled dialogs on the one hand and more spontaneous, direction giving dialogs on the other. We ask whether it is possible to separate aligned dialogs from non-aligned ones in a type-crossing way. Starting from a recent experiment (Mehler, Lücking, \& Menke, 2011a), we show that such a type-crossing classification is indeed possible. This hints at a structural fingerprint left by alignment in networks of linguistic items that are routinely co-activated during conversation.}}
• M. Z. Islam and A. Mehler, “Customization of the Europarl Corpus for Translation Studies,” in Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), 2012.
[Abstract] [BibTeX]

Currently, the area of translation studies lacks corpora by which translation scholars can validate their theoretical claims, for example, regarding the scope of the characteristics of the translation relation. In this paper, we describe a customized resource in the area of translation studies that mainly addresses research on the properties of the translation relation. Our experimental results show that the Type-Token-Ratio (TTR) is not a universally valid indicator of the simplification of translation.
@INPROCEEDINGS{Islam:Mehler:2012:a,
owner={zahurul},
booktitle={Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC)},
author={Islam, Md. Zahurul and Mehler, Alexander},
timestamp={2012.02.02},
year={2012},
title={Customization of the Europarl Corpus for Translation Studies},
pdf={http://www.lrec-conf.org/proceedings/lrec2012/pdf/729_Paper.pdf},
abstract={Currently, the area of translation studies lacks corpora by which translation scholars can validate their theoretical claims, for example, regarding the scope of the characteristics of the translation relation. In this paper, we describe a customized resource in the area of translation studies that mainly addresses research on the properties of the translation relation. Our experimental results show that the Type-Token-Ratio (TTR) is not a universally valid indicator of the simplification of translation.}}
• A. Lücking and T. Pfeiffer, “Framing Multimodal Technical Communication. With Focal Points in Speech-Gesture-Integration and Gaze Recognition,” in Handbook of Technical Communication, A. Mehler, L. Romary, and D. Gibbon, Eds., De Gruyter Mouton, 2012, vol. 8, pp. 591-644.
[BibTeX]

@INCOLLECTION{Luecking:Pfeiffer:2012,
publisher={De Gruyter Mouton},
chapter={18},
booktitle={Handbook of Technical Communication},
pages={591-644},
author={Lücking, Andy and Pfeiffer, Thies},
series={Handbooks of Applied Linguistics},
volume={8},
editor={Alexander Mehler and Laurent Romary and Dafydd Gibbon},
year={2012},
title={Framing Multimodal Technical Communication. With Focal Points in Speech-Gesture-Integration and Gaze Recognition},
website={http://www.degruyter.com/view/books/9783110224948/9783110224948.591/9783110224948.591.xml}}
• P. Kubina, O. Abramov, and A. Lücking, “Barrier-free Communication,” in Handbook of Technical Communication, A. Mehler and L. Romary, Eds., Berlin and Boston: De Gruyter Mouton, 2012, vol. 8, pp. 645-706.
[BibTeX]

@INCOLLECTION{Kubina:Abramov:Luecking:2012,
author={Kubina, Petra and Abramov, Olga and Lücking, Andy},
editor={Alexander Mehler and Laurent Romary},
title={Barrier-free Communication},
series={Handbooks of Applied Linguistics},
pages={645-706},
year={2012},
chapter={19},
booktitle={Handbook of Technical Communication},
publisher={De Gruyter Mouton},
editoratype={collaborator},
volume={8},
editora={Dafydd Gibbon},
website={http://www.degruyter.com/view/books/9783110224948/9783110224948.645/9783110224948.645.xml}}
• A. Lücking and A. Mehler, “What’s the Scope of the Naming Game? Constraints on Semantic Categorization,” in Proceedings of the 9th International Conference on the Evolution of Language, Kyoto, Japan, 2012, pp. 196-203.
[Abstract] [BibTeX]

The Naming Game (NG) has become a vivid research paradigm for simulation studies on language evolution and the establishment of naming conventions. Recently, NGs were used for reconstructing the creation of linguistic categories, most notably for color terms. We recap the functional principle of NGs and the latter Categorization Games (CGs) and evaluate them in the light of semantic data of linguistic categorization outside the domain of colors. This comparison reveals two specifics of the CG paradigm: Firstly, the emerging categories draw basically on the predefined topology of the learning domain. Secondly, the kind of categories that can be learnt in CGs is bound to context-independent intersective categories. This suggests that the NG and the CG focus on a special aspect of natural language categorization, which disregards context-sensitive categories used in a non-compositional manner.
@INPROCEEDINGS{Luecking:Mehler:2012,
url={http://kyoto.evolang.org/},
website={https://www.researchgate.net/publication/267858061_WHAT'S_THE_SCOPE_OF_THE_NAMING_GAME_CONSTRAINTS_ON_SEMANTIC_CATEGORIZATION},
booktitle={Proceedings of the 9th International Conference on the Evolution of Language},
pages={196-203},
author={Lücking, Andy and Mehler, Alexander},
year={2012},
title={What's the Scope of the Naming Game? Constraints on Semantic Categorization},
abstract={The Naming Game (NG) has become a vivid research paradigm for simulation studies on language evolution and the establishment of naming conventions. Recently, NGs were used for reconstructing the creation of linguistic categories, most notably for color terms. We recap the functional principle of NGs and the latter Categorization Games (CGs) and evaluate them in the light of semantic data of linguistic categorization outside the domain of colors. This comparison reveals two specifics of the CG paradigm: Firstly, the emerging categories draw basically on the predefined topology of the learning domain. Secondly, the kind of categories that can be learnt in CGs is bound to context-independent intersective categories. This suggests that the NG and the CG focus on a special aspect of natural language categorization, which disregards context-sensitive categories used in a non-compositional manner.},
address={Kyoto, Japan}}
• M. Sukhareva, M. Z. Islam, A. Hoenen, and A. Mehler, “A Three-step Model of Language Detection in Multilingual Ancient Texts,” in Proceedings of Workshop on Annotation of Corpora for Research in the Humanities, Heidelberg, Germany, 2012.
[Abstract] [BibTeX]

Ancient corpora contain various multilingual patterns. This imposes numerous problems on their manual annotation and automatic processing. We introduce a lexicon building system, called Lexicon Expander, that has an integrated language detection module, Language Detection (LD) Toolkit. The Lexicon Expander post-processes the output of the LD Toolkit which leads to the improvement of f-score and accuracy values. Furthermore, the functionality of the Lexicon Expander also includes manual editing of lexical entries and automatic morphological expansion by means of a morphological grammar.
@INPROCEEDINGS{Sukhareva:Islam:Hoenen:Mehler:2012,
booktitle={Proceedings of Workshop on Annotation of Corpora for Research in the Humanities},
author={Sukhareva, Maria and Islam, Md. Zahurul and Hoenen, Armin and Mehler, Alexander},
year={2012},
title={A Three-step Model of Language Detection in Multilingual Ancient Texts},
abstract={Ancient corpora contain various multilingual patterns. This imposes numerous problems on their manual annotation and automatic processing. We introduce a lexicon building system, called Lexicon Expander, that has an integrated language detection module, Language Detection (LD) Toolkit. The Lexicon Expander post-processes the output of the LD Toolkit which leads to the improvement of f-score and accuracy values. Furthermore, the functionality of the Lexicon Expander also includes manual editing of lexical entries and automatic morphological expansion by means of a morphological grammar.},
website={https://www.academia.edu/2236625/A_Three-step_Model_of_Language_Detection_in_Multilingual_Ancient_Texts}}
• A. Mehler and L. Romary, Handbook of Technical Communication, Berlin: De Gruyter Mouton, 2012.
[BibTeX]

@BOOK{Mehler:Romary:2012,
publisher={De Gruyter Mouton},
author={Mehler, Alexander and Romary, Laurent},
year={2012},
pagetotal={839},
title={Handbook of Technical Communication},
address={Berlin}}

### 2011 (24)

• U. Waltinger, On Social Semantics in Information Retrieval, Saarbrücken: Südwestdeutscher Verlag für Hochschulschriften, 2011. Zugl. Diss Univ. Bielefeld (2010)
[Abstract] [BibTeX]

In this thesis we analyze the performance of social semantics in textual information retrieval. By means of collaboratively constructed knowledge derived from web-based social networks, inducing both common-sense and domain-specific knowledge as constructed by a multitude of users, we will establish an improvement in performance of selected tasks within different areas of information retrieval. This work connects the concepts and the methods of social networks and the semantic web to support the analysis of a social semantic web that combines human intelligence with machine learning and natural language processing. In this context, social networks, as instances of the social web, are capable in delivering social network data and document collections on a tremendous scale, inducing thematic dynamics that cannot be achieved by traditional expert resources. The question of an automatic conversion, annotation and processing, however, is central to the debate of the benefits of the social semantic web. Which kind of technologies and methods are available, adequate and contribute to the processing of this rapidly rising flood of information and at the same time being capable of using the wealth of information in this large, but more importantly decentralized internet. The present work researches the performance of social semantic-induced categorization by means of different document models. We will shed light on the question, to which level social networks and social ontologies contribute to selected areas within the information retrieval area, such as automatically determining term and text associations, identifying topics, text and web genre categorization, and also the domain of sentiment analysis. We will show in extensive evaluations, comparing the classical apparatus of text categorization -- Vector Space Model, Latent Semantic Analysis and Support Vector Maschine -- that significant improvements can be obtained by considering the collaborative knowledge derived from the social web.
@BOOK{Waltinger:2011,
year={2011},
author={Waltinger, Ulli},
title={On Social Semantics in Information Retrieval},
website={http://www.ulliwaltinger.de/on-social-semantics-in-information-retrieval/},
publisher={Südwestdeutscher Verlag für Hochschulschriften},
note={Zugl. Diss Univ. Bielefeld (2010)},
abstract={In this thesis we analyze the performance of social semantics in textual information retrieval. By means of collaboratively constructed knowledge derived from web-based social networks, inducing both common-sense and domain-specific knowledge as constructed by a multitude of users, we will establish an improvement in performance of selected tasks within different areas of information retrieval. This work connects the concepts and the methods of social networks and the semantic web to support the analysis of a social semantic web that combines human intelligence with machine learning and natural language processing. In this context, social networks, as instances of the social web, are capable in delivering social network data and document collections on a tremendous scale, inducing thematic dynamics that cannot be achieved by traditional expert resources. The question of an automatic conversion, annotation and processing, however, is central to the debate of the benefits of the social semantic web. Which kind of technologies and methods are available, adequate and contribute to the processing of this rapidly rising flood of information and at the same time being capable of using the wealth of information in this large, but more importantly decentralized internet. The present work researches the performance of social semantic-induced categorization by means of different document models. We will shed light on the question, to which level social networks and social ontologies contribute to selected areas within the information retrieval area, such as automatically determining term and text associations, identifying topics, text and web genre categorization, and also the domain of sentiment analysis. We will show in extensive evaluations, comparing the classical apparatus of text categorization -- Vector Space Model, Latent Semantic Analysis and Support Vector Maschine -- that significant improvements can be obtained by considering the collaborative knowledge derived from the social web.}}
• G. Doeben-Henisch, G. Abrami, M. Pfaff, and M. Struwe, “Conscious learning semiotics systems to assist human persons (CLS2H),” in AFRICON, 2011, 2011, pp. 1-7.
[Abstract] [BibTeX]

Challenged by the growing societal demand for Ambient Assistive Living (AAL) technologies, we are dedicated to develop intelligent technical devices which are able to communicate with human persons in a truly human-like manner. The core of the project is a simulation environment which enables the development of conscious learning semiotic agents which will be able to assist human persons in their daily life. We are reporting first results and future perspectives.
@INPROCEEDINGS{Doebenhenisch:Abrami:Pfaff:Struwe:2011,
booktitle={AFRICON, 2011},
pages={1 -7},
number={},
author={Doeben-Henisch, Gerd and Abrami, Giuseppe and Pfaff, Marcus and Struwe, Marvin},
keywords={ambient assistive living;conscious learning semiotic agents;conscious learning semiotics systems;human persons;intelligent technical devices;simulation environment;learning (artificial intelligence);multi-agent systems;},
volume={},
doi={10.1109/AFRCON.2011.6072043},
month={sept.},
year={2011},
title={Conscious learning semiotics systems to assist human persons (CLS2H)},
issn={2153-0025},
abstract={Challenged by the growing societal demand for Ambient Assistive Living (AAL) technologies, we are dedicated to develop intelligent technical devices which are able to communicate with human persons in a truly human-like manner. The core of the project is a simulation environment which enables the development of conscious learning semiotic agents which will be able to assist human persons in their daily life. We are reporting first results and future perspectives.},
website={http://www.researchgate.net/publication/261451874_Conscious_Learning_Semiotics_Systems_to_Assist_Human_Persons_(CLS(2)H)},
pdf={http://www.doeben-henisch.de/gdhnp/csg/africon2011.pdf}}
• V. Ries and A. Lücking, “The SoSaBiEC Corpus: Social Structure and Bilinguality in Everyday Conversation,” in Multilingual Resources and Multilingual Applications: Proceedings of the German Society for Computational Linguistics 2011, 2011, pp. 207-210.
[Abstract] [BibTeX]

The SoSaBiEC corpus is comprised audio recordings of everyday interactions between familiar subjects. Thus, the material the corpus is based on is not gained in task-oriented dialogue under strict experimental control; rather, it is made up of spontaneous conversations. We describe the raw data and the annotations that constitute the corpus. Speech is transcribed at the level of words. Dialogue act oriented codings constitute a functional, qualitative annotation level. The corpus so far provides an empirical basis for studying social aspects of unrestricted language use in a familiar context .
@INPROCEEDINGS{Ries:Luecking:2011,
booktitle={Multilingual Resources and Multilingual Applications: Proceedings of the German Society for Computational Linguistics 2011},
pages={207--210},
author={Ries, Veronika and Lücking, Andy},
series={GSCL 2011},
editor={Hanna Hedeland and Thomas Schmidt and Kai Wörner},
year={2011},
title={The SoSaBiEC Corpus: Social Structure and Bilinguality in Everyday Conversation},
abstract={The SoSaBiEC corpus is comprised audio recordings of everyday interactions between familiar subjects. Thus, the material the corpus is based on is not gained in task-oriented dialogue under strict experimental control; rather, it is made up of spontaneous conversations. We describe the raw data and the annotations that constitute the corpus. Speech is transcribed at the level of words. Dialogue act oriented codings constitute a functional, qualitative annotation level. The corpus so far provides an empirical basis for studying social aspects of unrestricted language use in a familiar context .},
location={Hamburg}}
• U. Waltinger, A. Mehler, M. Lösch, and W. Horstmann, “Hierarchical Classification of OAI Metadata Using the DDC Taxonomy,” in Advanced Language Technologies for Digital Libraries (ALT4DL), R. Bernardi, S. Chambers, B. Gottfried, F. Segond, and I. Zaihrayeu, Eds., Berlin: Springer, 2011, pp. 29-40.
[Abstract] [BibTeX]

In the area of digital library services, the access to subject-specific metadata of scholarly publications is of utmost interest. One of the most prevalent approaches for metadata exchange is the XML-based Open Archive Initiative (OAI) Protocol for Metadata Harvesting (OAI-PMH). However, due to its loose requirements regarding metadata content there is no strict standard for consistent subject indexing specified, which is furthermore needed in the digital library domain. This contribution addresses the problem of automatic enhancement of OAI metadata by means of the most widely used universal classification schemes in libraries—the Dewey Decimal Classification (DDC). To be more specific, we automatically classify scientific documents according to the DDC taxonomy within three levels using a machine learning-based classifier that relies solely on OAI metadata records as the document representation. The results show an asymmetric distribution of documents across the hierarchical structure of the DDC taxonomy and issues of data sparseness. However, the performance of the classifier shows promising results on all three levels of the DDC.
@INCOLLECTION{Waltinger:Mehler:Loesch:Horstmann:2011,
publisher={Springer},
booktitle={Advanced Language Technologies for Digital Libraries (ALT4DL)},
pages={29-40},
author={Waltinger, Ulli and Mehler, Alexander and Lösch, Mathias and Horstmann, Wolfram},
series={LNCS},
editor={Raffaella Bernardi and Sally Chambers and Bjoern Gottfried and Frederique Segond and Ilya Zaihrayeu},
year={2011},
title={Hierarchical Classification of OAI Metadata Using the DDC Taxonomy},
abstract={In the area of digital library services, the access to subject-specific metadata of scholarly publications is of utmost interest. One of the most prevalent approaches for metadata exchange is the XML-based Open Archive Initiative (OAI) Protocol for Metadata Harvesting (OAI-PMH). However, due to its loose requirements regarding metadata content there is no strict standard for consistent subject indexing specified, which is furthermore needed in the digital library domain. This contribution addresses the problem of automatic enhancement of OAI metadata by means of the most widely used universal classification schemes in libraries—the Dewey Decimal Classification (DDC). To be more specific, we automatically classify scientific documents according to the DDC taxonomy within three levels using a machine learning-based classifier that relies solely on OAI metadata records as the document representation. The results show an asymmetric distribution of documents across the hierarchical structure of the DDC taxonomy and issues of data sparseness. However, the performance of the classifier shows promising results on all three levels of the DDC.},
address={Berlin}}
• A. Mehler, S. Schwandt, R. Gleim, and B. Jussen, “Der eHumanities Desktop als Werkzeug in der historischen Semantik: Funktionsspektrum und Einsatzszenarien,” Journal for Language Technology and Computational Linguistics (JLCL), vol. 26, iss. 1, pp. 97-117, 2011.
[Abstract] [BibTeX]

Die Digital Humanities bzw. die Computational Humanities entwickeln sich zu eigenständigen Disziplinen an der Nahtstelle von Geisteswissenschaft und Informatik. Diese Entwicklung betrifft zunehmend auch die Lehre im Bereich der geisteswissenschaftlichen Fachinformatik. In diesem Beitrag thematisieren wir den eHumanities Desktop als ein Werkzeug für diesen Bereich der Lehre. Dabei geht es genauer um einen Brückenschlag zwischen Geschichtswissenschaft und Informatik: Am Beispiel der historischen Semantik stellen wir drei Lehrszenarien vor, in denen der eHumanities Desktop in der geschichtswissenschaftlichen Lehre zum Einsatz kommt. Der Beitrag schliesst mit einer Anforderungsanalyse an zukünftige Entwicklungen in diesem Bereich.
@ARTICLE{Mehler:Schwandt:Gleim:Jussen:2011,
journal={Journal for Language Technology and Computational Linguistics (JLCL)},
pdf={http://media.dwds.de/jlcl/2011_Heft1/8.pdf },
pages={97-117},
number={1},
author={Mehler, Alexander and Schwandt, Silke and Gleim, Rüdiger and Jussen, Bernhard},
volume={26},
year={2011},
title={Der eHumanities Desktop als Werkzeug in der historischen Semantik: Funktionsspektrum und Einsatzszenarien},
abstract={Die Digital Humanities bzw. die Computational Humanities entwickeln sich zu eigenst{\"a}ndigen Disziplinen an der Nahtstelle von Geisteswissenschaft und Informatik. Diese Entwicklung betrifft zunehmend auch die Lehre im Bereich der geisteswissenschaftlichen Fachinformatik. In diesem Beitrag thematisieren wir den eHumanities Desktop als ein Werkzeug für diesen Bereich der Lehre. Dabei geht es genauer um einen Brückenschlag zwischen Geschichtswissenschaft und Informatik: Am Beispiel der historischen Semantik stellen wir drei Lehrszenarien vor, in denen der eHumanities Desktop in der geschichtswissenschaftlichen Lehre zum Einsatz kommt. Der Beitrag schliesst mit einer Anforderungsanalyse an zukünftige Entwicklungen in diesem Bereich.}}
• T. Dong and T. vor der Brück, “Qualitative Spatial Knowledge Acquisition Based on the Connection Relation,” in Proceedings of the 3rd International Conference on Advanced Cognitive Technologies and Applications (COGNITIVE), Rome, Italy, 2011, pp. 70-75.
[Abstract] [BibTeX]

Research in cognitive psychology shows that the connection relation is the primitive spatial relation. This paper proposes a novel spatial knowledge representation of indoor environments based on the connection relation, and demonstrates how deictic orientation relations can be acquired from a map, which is constructed purely on connection relations between extended objects. Without loss of generality, we restrict indoor environments to be constructed by a set of rectangles, each representing either a room or a corridor. The term fiat cell is coined to represent a subjective partition along a corridor. Spatial knowledge includes rectangles, sides information of rectangles, connection relations among rectangles, and fiat cells of rectangles. Efficient algorithms are given for identifying one shortest path between two locations, transforming paths into fiat paths, and acquiring deictic orientations.
@INPROCEEDINGS{Dong:vor:der:Brueck:2011,
booktitle={Proceedings of the 3rd International Conference on Advanced Cognitive Technologies and Applications (COGNITIVE)},
pages={70--75},
author={Dong, Tiansi and vor der Brück, Tim},
editor={Terry Bossomaier and Pascal Lorenz},
year={2011},
title={Qualitative Spatial Knowledge Acquisition Based on the Connection Relation},
abstract={Research in cognitive psychology shows that the connection relation is the primitive spatial relation. This paper proposes a novel spatial knowledge representation of indoor environments based on the connection relation, and demonstrates how deictic orientation relations can be acquired from a map, which is constructed purely on connection relations between extended objects. Without loss of generality, we restrict indoor environments to be constructed by a set of rectangles, each representing either a room or a corridor. The term fiat cell is coined to represent a subjective partition along a corridor. Spatial knowledge includes rectangles, sides information of rectangles, connection relations among rectangles, and fiat cells of rectangles. Efficient algorithms are given for identifying one shortest path between two locations, transforming paths into fiat paths, and acquiring deictic orientations.},
website={http://www.thinkmind.org/index.php?view=article&articleid=cognitive_2011_3_40_40123},
pdf={http://www.thinkmind.org/download.php?articleid=cognitive_2011_3_40_40123}}
• M. Z. Islam, R. Mittmann, and A. Mehler, “Multilingualism in Ancient Texts: Language Detection by Example of Old High German and Old Saxon,” in GSCL conference on Multilingual Resources and Multilingual Applications (GSCL 2011), 28-30 September, Hamburg, Germany, 2011.
[Abstract] [BibTeX]

In this paper, we present an approach to language d etection in streams of multilingual ancient texts. We introduce a supervised classifier that detects, amongst others, Old High G erman (OHG) and Old Saxon (OS). We evaluate our mod el by means of three experiments that show that language detection is po ssible even for dead languages. Finally, we present an experiment in unsupervised language detection as a tertium comparationis for o ur supervised classifier.
@INPROCEEDINGS{Zahurul:Mittmann:Mehler:2011,
booktitle={GSCL conference on Multilingual Resources and Multilingual Applications (GSCL 2011), 28-30 September, Hamburg, Germany},
author={Islam, Md. Zahurul and Mittmann, Roland and Mehler, Alexander},
timestamp={2011.08.25},
year={2011},
title={Multilingualism in Ancient Texts: Language Detection by Example of Old High German and Old Saxon},
abstract={In this paper, we present an approach to language d etection in streams of multilingual ancient texts. We introduce a supervised classifier that detects, amongst others, Old High G erman (OHG) and Old Saxon (OS). We evaluate our mod el by means of three experiments that show that language detection is po ssible even for dead languages. Finally, we present an experiment in unsupervised language detection as a tertium comparationis for o ur supervised classifier.}}
• A. Lücking, S. Ptock, and K. Bergmann, “Staccato: Segmentation Agreement Calculator,” in Gesture in Embodied Communication and Human-Computer Interaction. Proceedings of the 9th International Gesture Workshop, Athens, Greece, 2011, pp. 50-53.
[BibTeX]

@INPROCEEDINGS{Luecking:Ptock:Bergmann:2011,
publisher={National and Kapodistrian University of Athens},
booktitle={Gesture in Embodied Communication and Human-Computer Interaction. Proceedings of the 9th International Gesture Workshop},
pages={50--53},
author={Lücking, Andy and Ptock, Sebastian and Bergmann, Kirsten},
series={GW 2011},
editor={Eleni Efthimiou and Georgios Kouroupetroglou},
month={5},
year={2011},
title={Staccato: Segmentation Agreement Calculator},
address={Athens, Greece}}
• A. Mehler and A. Lücking, “A Graph Model of Alignment in Multilog,” in Proceedings of IEEE Africon 2011, Zambia, 2011.
[BibTeX]

@INPROCEEDINGS{Mehler:Luecking:2011,
organization={IEEE},
booktitle={Proceedings of IEEE Africon 2011},
author={Mehler, Alexander and Lücking, Andy},
series={IEEE Africon},
month={9},
year={2011},
title={A Graph Model of Alignment in Multilog},
website={https://www.researchgate.net/publication/267941012_A_Graph_Model_of_Alignment_in_Multilog}}
• C. Stegbauer and A. Mehler, “Positionssensitive Dekomposition von Potenzgesetzen am Beispiel von Wikipedia-basierten Kollaborationsnetzwerken,” in Proceedings of the 4th Workshop Digital Social Networks at INFORMATIK 2011: Informatik schafft Communities, Oct 4-7, 2011, Berlin, 2011.
[BibTeX]

@INPROCEEDINGS{Stegbauer:Mehler:2011,
booktitle={Proceedings of the 4th Workshop Digital Social Networks at INFORMATIK 2011: Informatik schafft Communities, Oct 4-7, 2011, Berlin},
pdf={http://www.user.tu-berlin.de/komm/CD/paper/090423.pdf},
author={Stegbauer, Christian and Mehler, Alexander},
year={2011},
title={Positionssensitive Dekomposition von Potenzgesetzen am Beispiel von Wikipedia-basierten Kollaborationsnetzwerken},
specialnote={Best Paper Award},
specialnotewebsite={http://www.digitale-soziale-netze.de/gi-workshop/index.php?site=review2011}}
• M. Lösch, U. Waltinger, W. Horstmann, and A. Mehler, “Building a DDC-annotated Corpus from OAI Metadata,” Journal of Digital Information, vol. 12, iss. 2, 2011.
[Abstract] [BibTeX]

Checking for readability or simplicity of texts is important for many institutional and individual users. Formulas for approximately measuring text readability have a long tradition. Usually, they exploit surface-oriented indicators like sentence length, word length, word frequency, etc. However, in many cases, this information is not adequate to realistically approximate the cognitive difficulties a person can have to understand a text. Therefore we use deep syntactic and semantic indicators in addition. The syntactic information is represented by a dependency tree, the semantic information by a semantic network. Both representations are automatically generated by a deep syntactico-semantic analysis. A global readability score is determined by applying a nearest neighbor algorithm on 3,000 ratings of 300 test persons. The evaluation showed that the deep syntactic and semantic indicators lead to promising results comparable to the best surface-based indicators. The combination of deep and shallow indicators leads to an improvement over shallow indicators alone. Finally, a graphical user interface was developed which highlights difficult passages, depending on the individual indicator values, and displays a global readability score.
@ARTICLE{Loesch:Waltinger:Horstmann:Mehler:2011,
journal={Journal of Digital Information},
number={2},
author={Lösch, Mathias and Waltinger, Ulli and Horstmann, Wolfram and Mehler, Alexander},
volume={12},
abstract={Checking for readability or simplicity of texts is important for many institutional and individual users. Formulas for approximately measuring text readability have a long tradition. Usually, they exploit surface-oriented indicators like sentence length, word length, word frequency, etc. However, in many cases, this information is not adequate to realistically approximate the cognitive difficulties a person can have to understand a text. Therefore we use deep syntactic and semantic indicators in addition. The syntactic information is represented by a dependency tree, the semantic information by a semantic network. Both representations are automatically generated by a deep syntactico-semantic analysis. A global readability score is determined by applying a nearest neighbor algorithm on 3,000 ratings of 300 test persons. The evaluation showed that the deep syntactic and semantic indicators lead to promising results comparable to the best surface-based indicators. The combination of deep and shallow indicators leads to an improvement over shallow indicators alone. Finally, a graphical user interface was developed which highlights difficult passages, depending on the individual indicator values, and displays a global readability score.},
website={http://journals.tdl.org/jodi/article/view/1765},
bibsource={DBLP, http://dblp.uni-trier.de},
year={2011},
title={Building a DDC-annotated Corpus from OAI Metadata},
pdf={https://journals.tdl.org/jodi/index.php/jodi/article/download/1765/1767}}
• M. Lux, J. Laußmann, A. Mehler, and C. Menßen, “An Online Platform for Visualizing Time Series in Linguistic Networks,” in Proceedings of the Demonstrations Session of the 2011 IEEE / WIC / ACM International Conferences on Web Intelligence and Intelligent Agent Technology, 22 – 27 August 2011, Lyon, France, 2011.
[Poster][BibTeX]

@INPROCEEDINGS{Lux:Laussmann:Mehler:Menssen:2011,
booktitle={Proceedings of the Demonstrations Session of the 2011 IEEE / WIC / ACM International Conferences on Web Intelligence and Intelligent Agent Technology, 22 - 27 August 2011, Lyon, France},
website={http://dl.acm.org/citation.cfm?id=2052396},
author={Lux, Markus and Lau{\ss}mann, Jan and Mehler, Alexander and Men{\ss}en, Christian},
year={2011},
title={An Online Platform for Visualizing Time Series in Linguistic Networks}}
• A. Mehler, N. Diewald, U. Waltinger, R. Gleim, D. Esch, B. Job, T. Küchelmann, O. Abramov, and P. Blanchard, “Evolution of Romance Language in Written Communication: Network Analysis of Late Latin and Early Romance Corpora,” Leonardo, vol. 44, iss. 3, 2011.
[Abstract] [BibTeX]

In this paper, the authors induce linguistic networks as a prerequisite for detecting language change by means of the Patrologia Latina, a corpus of Latin texts from the 4th to the 13th century.
@ARTICLE{Mehler:Diewald:Waltinger:et:al:2010,
publisher={MIT Press},
journal={Leonardo},
number={3},
author={Mehler, Alexander and Diewald, Nils and Waltinger, Ulli and Gleim, Rüdiger and Esch, Dietmar and Job, Barbara and Küchelmann, Thomas and Abramov, Olga and Blanchard, Philippe},
volume={44},
year={2011},
title={Evolution of Romance Language in Written Communication: Network Analysis of Late Latin and Early Romance Corpora},
website={http://www.mitpressjournals.org/doi/abs/10.1162/LEON_a_00175#.VLzsoivF_Cc},
abstract={In this paper, the authors induce linguistic networks as a prerequisite for detecting language change by means of the Patrologia Latina, a corpus of Latin texts from the 4th to the 13th century.}}
• A. Mehler, A. Lücking, and P. Menke, “From Neural Activation to Symbolic Alignment: A Network-Based Approach to the Formation of Dialogue Lexica,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN 2011), San Jose, California, July 31 — August 5, 2011.
[BibTeX]

@INPROCEEDINGS{Mehler:Luecking:Menke:2011,
booktitle={Proceedings of the International Joint Conference on Neural Networks (IJCNN 2011), San Jose, California, July 31 -- August 5},
author={Mehler, Alexander and Lücking, Andy and Menke, Peter},
year={2011},
title={From Neural Activation to Symbolic Alignment: A Network-Based Approach to the Formation of Dialogue Lexica},
website={{http://dx.doi.org/10.1109/IJCNN.2011.6033266}}
}
• A. Lücking, O. Abramov, A. Mehler, and P. Menke, “The Bielefeld Jigsaw Map Game (JMG) Corpus,” in Abstracts of the Corpus Linguistics Conference 2011, Birmingham, 2011.
[BibTeX]

@INPROCEEDINGS{Luecking:Abramov:Mehler:Menke:2011,
booktitle={Abstracts of the Corpus Linguistics Conference 2011},
author={Lücking, Andy and Abramov, Olga and Mehler, Alexander and Menke, Peter},
series={CL2011},
year={2011},
title={The Bielefeld Jigsaw Map Game (JMG) Corpus},
website={http://www.birmingham.ac.uk/research/activity/corpus/publications/conference-archives/2011-birmingham.aspx},
pdf={http://www.birmingham.ac.uk/documents/college-artslaw/corpus/conference-archives/2011/Paper-137.pdf},
address={Birmingham}}
• R. Gleim, A. Hoenen, N. Diewald, A. Mehler, and A. Ernst, “Modeling, Building and Maintaining Lexica for Corpus Linguistic Studies by Example of Late Latin,” in Corpus Linguistics 2011, 20-22 July, Birmingham, 2011.
[BibTeX]

@INPROCEEDINGS{Gleim:Hoenen:Diewald:Mehler:Ernst:2011,
booktitle={Corpus Linguistics 2011, 20-22 July, Birmingham},
author={Gleim, Rüdiger and Hoenen, Armin and Diewald, Nils and Mehler, Alexander and Ernst, Alexandra},
year={2011},
title={Modeling, Building and Maintaining Lexica for Corpus Linguistic Studies by Example of Late Latin},
pdf={https://hucompute.org/wp-content/uploads/2015/08/Paper-48.pdf}}
• P. Menke and A. Mehler, “From experiments to corpora: The Ariadne Corpus Management System,” in Corpus Linguistics 2011, 20-22 July, Birmingham, 2011.
[BibTeX]

@INPROCEEDINGS{Menke:Mehler:2011,
booktitle={Corpus Linguistics 2011, 20-22 July, Birmingham},
author={Menke, Peter and Mehler, Alexander},
year={2011},
title={From experiments to corpora: The Ariadne Corpus Management System},
website={https://www.researchgate.net/publication/260186214_From_Experiments_to_Corpora_The_Ariadne_Corpus_Management_System}}
• Towards an Information Theory of Complex Networks: Statistical Methods and Applications, M. Dehmer, F. Emmert-Streib, and A. Mehler, Eds., Boston/Basel: Birkhäuser, 2011.
[BibTeX]

@BOOK{Dehmer:EmmertStreib:Mehler:2009:a,
publisher={Birkh{\"a}user},
editor={Dehmer, Matthias and Emmert-Streib, Frank and Mehler, Alexander},
year={2011},
title={Towards an Information Theory of Complex Networks: Statistical Methods and Applications},
pagetotal={395}}
• A. Mehler, A. Lücking, and P. Menke, “Assessing Lexical Alignment in Spontaneous Direction Dialogue Data by Means of a Lexicon Network Model,” in Proceedings of 12th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing), February 20–26, Tokyo, Berlin/New York, 2011, pp. 368-379.
[BibTeX]

@INPROCEEDINGS{Mehler:Luecking:Menke:2011:a,
publisher={Springer},
booktitle={Proceedings of 12th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing), February 20--26, Tokyo},
author={Mehler, Alexander and Lücking, Andy and Menke, Peter},
series={CICLing'11},
pages={368-379},
year={2011},
title={Assessing Lexical Alignment in Spontaneous Direction Dialogue Data by Means of a Lexicon Network Model},
address={Berlin/New York}}
• P. Geibel, A. Mehler, and K. Kühnberger, “Learning Methods for Graph Models of Document Structure,” in Modeling, Learning and Processing of Text Technological Data Structures, A. Mehler, K. Kühnberger, H. Lobin, H. Lüngen, A. Storrer, and A. Witt, Eds., Berlin/New York: Springer, 2011.
[BibTeX]

@INCOLLECTION{Geibel:Mehler:Kuehnberger:2011:a,
publisher={Springer},
booktitle={Modeling, Learning and Processing of Text Technological Data Structures},
author={Geibel, Peter and Mehler, Alexander and Kühnberger, Kai-Uwe},
series={Studies in Computational Intelligence},
editor={Mehler, Alexander and Kühnberger, Kai-Uwe and Lobin, Henning and Lüngen, Harald and Storrer, Angelika and Witt, Andreas},
year={2011},
title={Learning Methods for Graph Models of Document Structure},
address={Berlin/New York}}
• A. Mehler and U. Waltinger, “Integrating Content and Structure Learning: A Model of Hypertext Zoning and Sounding,” in Modeling, Learning and Processing of Text Technological Data Structures, A. Mehler, K. Kühnberger, H. Lobin, H. Lüngen, A. Storrer, and A. Witt, Eds., Berlin/New York: Springer, 2011.
[BibTeX]

@INCOLLECTION{Mehler:Waltinger:2011:a,
publisher={Springer},
booktitle={Modeling, Learning and Processing of Text Technological Data Structures},
website={http://rd.springer.com/chapter/10.1007/978-3-642-22613-7_15},
author={Mehler, Alexander and Waltinger, Ulli},
series={Studies in Computational Intelligence},
editor={Mehler, Alexander and Kühnberger, Kai-Uwe and Lobin, Henning and Lüngen, Harald and Storrer, Angelika and Witt, Andreas},
year={2011},
title={Integrating Content and Structure Learning: A Model of Hypertext Zoning and Sounding},
address={Berlin/New York}}
• O. Abramov and A. Mehler, “Automatic Language Classification by Means of Syntactic Dependency Networks,” Journal of Quantitative Linguistics, vol. 18, iss. 4, pp. 291-336, 2011.
[Abstract] [BibTeX]

This article presents an approach to automatic language classification by means of linguistic networks. Networks of 11 languages were constructed from dependency treebanks, and the topology of these networks serves as input to the classification algorithm. The results match the genealogical similarities of these languages. In addition, we test two alternative approaches to automatic language classification – one based on n-grams and the other on quantitative typological indices. All three methods show good results in identifying genealogical groups. Beyond genetic similarities, network features (and feature combinations) offer a new source of typological information about languages. This information can contribute to a better understanding of the interplay of single linguistic phenomena observed in language.
@ARTICLE{Abramov:Mehler:2011:a,
journal={Journal of Quantitative Linguistics},
pages={291-336},
number={4},
author={Abramov, Olga and Mehler, Alexander},
volume={18},
year={2011},
title={Automatic Language Classification by Means of Syntactic Dependency Networks},
website={http://www.researchgate.net/publication/220469321_Automatic_Language_Classification_by_means_of_Syntactic_Dependency_Networks},
abstract={This article presents an approach to automatic language classification by means of linguistic networks. Networks of 11 languages were constructed from dependency treebanks, and the topology of these networks serves as input to the classification algorithm. The results match the genealogical similarities of these languages. In addition, we test two alternative approaches to automatic language classification – one based on n-grams and the other on quantitative typological indices. All three methods show good results in identifying genealogical groups. Beyond genetic similarities, network features (and feature combinations) offer a new source of typological information about languages. This information can contribute to a better understanding of the interplay of single linguistic phenomena observed in language.}}
• A. Mehler, K. Kühnberger, H. Lobin, H. Lüngen, A. Storrer, and A. Witt, Modeling, Learning and Processing of Text Technological Data Structures, A. Mehler, K. Kühnberger, H. Lobin, H. Lüngen, A. Storrer, and A. Witt, Eds., Berlin/New York: Springer, 2011.
[BibTeX]

@BOOK{Mehler:Kuehnberger:Lobin:Luengen:Storrer:Witt:2011,
publisher={Springer},
series={Studies in Computational Intelligence},
website={/books/texttechnologybook/},
editor={Mehler, Alexander and Kühnberger, Kai-Uwe and Lobin, Henning and Lüngen, Harald and Storrer, Angelika and Witt, Andreas},
author={Mehler, Alexander and Kühnberger, Kai-Uwe and Lobin, Henning and Lüngen, Harald and Storrer, Angelika and Witt, Andreas},
year={2011},
pagetotal={400},
title={Modeling, Learning and Processing of Text Technological Data Structures},
address={Berlin/New York}}
• A. Mehler, “Social Ontologies as Generalized Nearly Acyclic Directed Graphs: A Quantitative Graph Model of Social Ontologies by Example of Wikipedia,” in Towards an Information Theory of Complex Networks: Statistical Methods and Applications, M. Dehmer, F. Emmert-Streib, and A. Mehler, Eds., Boston/Basel: Birkhäuser, 2011, pp. 259-319.
[BibTeX]

@INCOLLECTION{Mehler:2011:c,
publisher={Birkh{\"a}user},
booktitle={Towards an Information Theory of Complex Networks: Statistical Methods and Applications},
pages={259-319},
author={Mehler, Alexander},
editor={Dehmer, Matthias and Emmert-Streib, Frank and Mehler, Alexander},
year={2011},
title={Social Ontologies as Generalized Nearly Acyclic Directed Graphs: A Quantitative Graph Model of Social Ontologies by Example of Wikipedia},
address={Boston/Basel}}

### 2010 (21)

• S. Eger and I. Sejane, “Computing Semantic Similarity from Bilingual Dictionaries,” in Proceedings of the 10th International Conference on the Statistical Analysis of Textual Data (JADT-2010), Rome, Italy, 2010, pp. 1217-1225.
[BibTeX]

@INPROCEEDINGS{Eger:Sejane:2010,
booktitle={Proceedings of the 10th International Conference on the Statistical Analysis of Textual Data (JADT-2010)},
pages={1217-1225},
author={Eger, Steffen and Sejane, Ineta},
year={2010},
title={Computing Semantic Similarity from Bilingual Dictionaries},
publisher={JADT-2010}}
• T. vor der Brück and H. Helbig, “Validating Meronymy Hypotheses with Support Vector Machines and Graph Kernels,” in Proceedings of the Ninth International Conference on Machine Learning and Applications (ICMLA), Washington, D.C., 2010, pp. 243-250.
[Abstract] [BibTeX]

There is a substantial body of work on the extraction of relations from texts, most of which is based on pattern matching or on applying tree kernel functions to syntactic structures. Whereas pattern application is usually more efficient, tree kernels can be superior when assessed by the F-measure. In this paper, we introduce a hybrid approach to extracting meronymy relations, which is based on both patterns and kernel functions. In a first step, meronymy relation hypotheses are extracted from a text corpus by applying patterns. In a second step these relation hypotheses are validated by using several shallow features and a graph kernel approach. In contrast to other meronymy extraction and validation methods which are based on surface or syntactic representations we use a purely semantic approach based on semantic networks. This involves analyzing each sentence of the Wikipedia corpus by a deep syntactico-semantic parser and converting it into a semantic network. Meronymy relation hypotheses are extracted from the semantic networks by means of an automated theorem prover, which employs a set of logical axioms and patterns in the form of semantic networks. The meronymy candidates are then validated by means of a graph kernel approach based on common walks. The evaluation shows that this method achieves considerably higher accuracy, recall, and F-measure than a method using purely shallow validation.
@INPROCEEDINGS{vor:der:Brueck:Helbig:2010:a,
publisher={IEEE Press},
booktitle={Proceedings of the Ninth International Conference on Machine Learning and Applications (ICMLA)},
pages={243--250},
author={vor der Brück, Tim and Helbig, Hermann},
year={2010},
title={Validating Meronymy Hypotheses with Support Vector Machines and Graph Kernels},
abstract={There is a substantial body of work on the extraction of relations from texts, most of which is based on pattern matching or on applying tree kernel functions to syntactic structures. Whereas pattern application is usually more efficient, tree kernels can be superior when assessed by the F-measure. In this paper, we introduce a hybrid approach to extracting meronymy relations, which is based on both patterns and kernel functions. In a first step, meronymy relation hypotheses are extracted from a text corpus by applying patterns. In a second step these relation hypotheses are validated by using several shallow features and a graph kernel approach. In contrast to other meronymy extraction and validation methods which are based on surface or syntactic representations we use a purely semantic approach based on semantic networks. This involves analyzing each sentence of the Wikipedia corpus by a deep syntactico-semantic parser and converting it into a semantic network. Meronymy relation hypotheses are extracted from the semantic networks by means of an automated theorem prover, which employs a set of logical axioms and patterns in the form of semantic networks. The meronymy candidates are then validated by means of a graph kernel approach based on common walks. The evaluation shows that this method achieves considerably higher accuracy, recall, and F-measure than a method using purely shallow validation.},
website={http://www.computer.org/csdl/proceedings/icmla/2010/4300/00/4300a243-abs.html}}
• T. vor der Brück and H. Stenzhorn, “Logical Ontology Validation Using an Automatic Theorem Prover,” in Proceedings of the 19th European Conference on Artificial Intelligence (ECAI), Lisbon, Portugal, 2010, pp. 491-496.
[Abstract] [BibTeX]

Ontologies are utilized for a wide range of tasks, like information retrieval/extraction or text generation, and in a multitude of domains, such as biology, medicine or business and commerce. To be actually usable in such real-world scenarios, ontologies usually have to encompass a large number of factual statements. However, with increasing size, it becomes very diffcult to ensure their complete correctness. This is particularly true in the case when an ontology is not hand-crafted but constructed (semi)automatically through text mining, for example. As a consequence, when inference mechanisms are applied on these ontologies, even minimal inconsistencies of tentimes lead to serious errors and are hard to trace back and find. This paper addresses this issue and describes a method to validate ontologies using an automatic theorem prover and MultiNet axioms. This logic-based approach allows to detect many inconsistencies, which are diffcult or even impossible to identify through statistical methods or by manual investigation in reasonable time. To make this approach accessible for ontology developers, a graphical user interface is provided that highlights erroneous axioms directly in the ontology for quicker fixing.
@INPROCEEDINGS{vor:der:Brueck:Stenzhorn:2010,
booktitle={Proceedings of the 19th European Conference on Artificial Intelligence (ECAI)},
pages={491--496},
author={vor der Brück, Tim and Stenzhorn, Holger},
year={2010},
title={Logical Ontology Validation Using an Automatic Theorem Prover},
abstract={Ontologies are utilized for a wide range of tasks, like information retrieval/extraction or text generation, and in a multitude of domains, such as biology, medicine or business and commerce. To be actually usable in such real-world scenarios, ontologies usually have to encompass a large number of factual statements. However, with increasing size, it becomes very diffcult to ensure their complete correctness. This is particularly true in the case when an ontology is not hand-crafted but constructed (semi)automatically through text mining, for example. As a consequence, when inference mechanisms are applied on these ontologies, even minimal inconsistencies of tentimes lead to serious errors and are hard to trace back and find. This paper addresses this issue and describes a method to validate ontologies using an automatic theorem prover and MultiNet axioms. This logic-based approach allows to detect many inconsistencies, which are diffcult or even impossible to identify through statistical methods or by manual investigation in reasonable time. To make this approach accessible for ontology developers, a graphical user interface is provided that highlights erroneous axioms directly in the ontology for quicker fixing.},
address={Lisbon, Portugal}}
• T. vor der Brück, “Hypernymy Extraction Using a Semantic Network Representation,” International Journal of Computational Linguistics and Applications, vol. 1, iss. 1, pp. 105-119, 2010.
[Abstract] [BibTeX]

There are several approaches to detect hypernymy relations from texts by text mining. Usually these approaches are based on supervised learning and in a first step are extracting several patterns. These patterns are then applied to previously unseen texts and used to recognize hypernym/hyponym pairs. Normally these approaches are only based on a surface representation or a syntactical tree structure, i.e., constituency or dependency trees derived by a syntactical parser. In this work, however, we present an approach that operates directly on a semantic network (SN), which is generated by a deep syntactico-semantic analysis. Hyponym/hypernym pairs are then extracted by the application of graph matching. This algorithm is combined with a shallow approach enriched with semantic information.
@ARTICLE{vor:der:Brueck:2010,
journal={International Journal of Computational Linguistics and Applications},
pages={105--119},
number={1},
author={vor der Brück, Tim},
volume={1},
year={2010},
title={Hypernymy Extraction Using a Semantic Network Representation},
pdf={http://www.gelbukh.com/ijcla/2010-1-2/Hypernymy Extraction Using.pdf},
website={http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.358.533},
abstract={There are several approaches to detect hypernymy relations from texts by text mining. Usually these approaches are based on supervised learning and in a first step are extracting several patterns. These patterns are then applied to previously unseen texts and used to recognize hypernym/hyponym pairs. Normally these approaches are only based on a surface representation or a syntactical tree structure, i.e., constituency or dependency trees derived by a syntactical parser. In this work, however, we present an approach that operates directly on a semantic network (SN), which is generated by a deep syntactico-semantic analysis. Hyponym/hypernym pairs are then extracted by the application of graph matching. This algorithm is combined with a shallow approach enriched with semantic information.}}
• T. vor der Brück, “Learning Deep Semantic Patterns for Hypernymy Extraction Following the Minimum Description Length Principle,” in Proceedings of the 29th International Conference on Lexis and Grammar (LGC), Belgrade, Serbia, 2010, pp. 39-49.
[Abstract] [BibTeX]

Current approaches of hypernymy acquisition are mostly based on syntactic or surface representations and extract hypernymy relations between surface word forms and not word readings. In this paper we present a purely semantic approach for hypernymy extraction based on semantic networks (SNs). This approach employs a set of patterns sub0 (a1,a2) <-- premise where the premise part of a pattern is given by a SN. Furthermore this paper describes how the patterns can be derived by relational statistical learning following the Minimum Description Length principle (MDL). The evaluation demonstrates the usefulness of the learned patterns and also of the entire hypernymy extraction system.
@INPROCEEDINGS{vor:der:Brueck:2010:a,
booktitle={Proceedings of the 29th International Conference on Lexis and Grammar (LGC)},
pages={39--49},
author={vor der Brück, Tim},
year={2010},
title={Learning Deep Semantic Patterns for Hypernymy Extraction Following the Minimum Description Length Principle},
abstract={Current approaches of hypernymy acquisition are mostly based on syntactic or surface representations and extract hypernymy relations between surface word forms and not word readings. In this paper we present a purely semantic approach for hypernymy extraction based on semantic networks (SNs). This approach employs a set of patterns sub0 (a1,a2) <-- premise where the premise part of a pattern is given by a SN. Furthermore this paper describes how the patterns can be derived by relational statistical learning following the Minimum Description Length principle (MDL). The evaluation demonstrates the usefulness of the learned patterns and also of the entire hypernymy extraction system.}}
• T. vor der Brück, “Learning Semantic Network Patterns for Hypernymy Extraction,” in Proceedings of the 6th Workshop on Ontologies and Lexical Resources (OntoLex), Beijing, China, 2010, pp. 38-47.
[BibTeX]

@INPROCEEDINGS{vor:der:Brueck:2010:b,
booktitle={Proceedings of the 6th Workshop on Ontologies and Lexical Resources (OntoLex)},
pages={38--47},
author={vor der Brück, Tim},
year={2010},
title={Learning Semantic Network Patterns for Hypernymy Extraction},
website={http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.358.533},
address={Beijing, China}}
• S. Hartrumpf, T. vor der Brück, and C. Eichhorn, “Detecting Duplicates with Shallow and Parser-based Methods,” in Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering (NLPKE), Beijing, China, 2010, pp. 142-149.
[Abstract] [BibTeX]

Identifying duplicate texts is important in many areas like plagiarism detection, information retrieval, text summarization, and question answering. Current approaches are mostly surface-oriented (or use only shallow syntactic representations) and see each text only as a token list. In this work however, we describe a deep, semantically oriented method based on semantic networks which are derived by a syntactico-semantic parser. Semantically identical or similar semantic networks for each sentence of a given base text are efficiently retrieved by using a specialized semantic network index. In order to detect many kinds of paraphrases the current base semantic network is varied by applying inferences: lexico-semantic relations, relation axioms, and meaning postulates. Some important phenomena occurring in difficult-to-detect duplicates are discussed. The deep approach profits from background knowledge, whose acquisition from corpora like Wikipedia is explained briefly. This deep duplicate recognizer is combined with two shallow duplicate recognizers in order to guarantee high recall for texts which are not fully parsable. The evaluation shows that the combined approach preserves recall and increases precision considerably, in comparison to traditional shallow methods. For the evaluation, a standard corpus of German plagiarisms was extended by four diverse components with an emphasis on duplicates (and not just plagiarisms), e.g., news feed articles from different web sources and two translations of the same short story.
@INPROCEEDINGS{vor:der:Brueck:Hartrumpf:Eichhorn:2010:a,
booktitle={Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering (NLPKE)},
pages={142--149},
author={Hartrumpf, Sven and vor der Brück, Tim and Eichhorn, Christian},
year={2010},
title={Detecting Duplicates with Shallow and Parser-based Methods},
abstract={Identifying duplicate texts is important in many areas like plagiarism detection, information retrieval, text summarization, and question answering. Current approaches are mostly surface-oriented (or use only shallow syntactic representations) and see each text only as a token list. In this work however, we describe a deep, semantically oriented method based on semantic networks which are derived by a syntactico-semantic parser. Semantically identical or similar semantic networks for each sentence of a given base text are efficiently retrieved by using a specialized semantic network index. In order to detect many kinds of paraphrases the current base semantic network is varied by applying inferences: lexico-semantic relations, relation axioms, and meaning postulates. Some important phenomena occurring in difficult-to-detect duplicates are discussed. The deep approach profits from background knowledge, whose acquisition from corpora like Wikipedia is explained briefly. This deep duplicate recognizer is combined with two shallow duplicate recognizers in order to guarantee high recall for texts which are not fully parsable. The evaluation shows that the combined approach preserves recall and increases precision considerably, in comparison to traditional shallow methods. For the evaluation, a standard corpus of German plagiarisms was extended by four diverse components with an emphasis on duplicates (and not just plagiarisms), e.g., news feed articles from different web sources and two translations of the same short story.},
website={http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5587838&abstractAccess=no&userType=inst}}
• S. Hartrumpf, T. vor der Brück, and C. Eichhorn, “Semantic Duplicate Identification with Parsing and Machine Learning,” in Proceedings of the 13th International Conference on Text, Speech and Dialogue (TSD 2010), Brno, Czech Republic, 2010, pp. 84-92.
[Abstract] [BibTeX]

Identifying duplicate texts is important in many areas like plagiarism detection, information retrieval, text summarization, and question answering. Current approaches are mostly surface-oriented (or use only shallow syntactic representations) and see each text only as a token list. In this work however, we describe a deep, semantically oriented method based on semantic networks which are derived by a syntacticosemantic parser. Semantically identical or similar semantic networks for each sentence of a given base text are efficiently retrieved by using a specialized index. In order to detect many kinds of paraphrases the semantic networks of a candidate text are varied by applying inferences: lexico- semantic relations, relation axioms, and meaning postulates. Important phenomena occurring in difficult duplicates are discussed. The deep approach profits from background knowledge, whose acquisition from corpora is explained briefly. The deep duplicate recognizer is combined with two shallow duplicate recognizers in order to guarantee a high recall for texts which are not fully parsable. The evaluation shows that the combined approach preserves recall and increases precision considerably in comparison to traditional shallow methods.
@INPROCEEDINGS{vor:der:Brueck:Hartrumpf:Eichhorn:2010:b,
booktitle={Proceedings of the 13th International Conference on Text, Speech and Dialogue (TSD 2010)},
pages={84--92},
author={Hartrumpf, Sven and vor der Brück, Tim and Eichhorn, Christian},
series={Lecture Notes in Artificial Intelligence},
volume={6231},
editor={Petr Sojka and Aleš Horák and Ivan Kopeček and Karel Pala},
month={September},
year={2010},
title={Semantic Duplicate Identification with Parsing and Machine Learning},
abstract={Identifying duplicate texts is important in many areas like plagiarism detection, information retrieval, text summarization, and question answering. Current approaches are mostly surface-oriented (or use only shallow syntactic representations) and see each text only as a token list. In this work however, we describe a deep, semantically oriented method based on semantic networks which are derived by a syntacticosemantic parser. Semantically identical or similar semantic networks for each sentence of a given base text are efficiently retrieved by using a specialized index. In order to detect many kinds of paraphrases the semantic networks of a candidate text are varied by applying inferences: lexico- semantic relations, relation axioms, and meaning postulates. Important phenomena occurring in difficult duplicates are discussed. The deep approach profits from background knowledge, whose acquisition from corpora is explained briefly. The deep duplicate recognizer is combined with two shallow duplicate recognizers in order to guarantee a high recall for texts which are not fully parsable. The evaluation shows that the combined approach preserves recall and increases precision considerably in comparison to traditional shallow methods.},
website={http://link.springer.com/chapter/10.1007/978-3-642-15760-8_12}}
• T. vor der Brück and H. Helbig, “Retrieving Meronyms from Texts Using An Automated Theorem Prover,” Journal for Language Technology and Computational Linguistics (JLCL), vol. 25, iss. 1, pp. 57-81, 2010.
[Abstract] [BibTeX]

In this paper we present a truly semantic-oriented approach for meronymy relation extraction. It directly operates, instead of syntactic trees or surface representations, on semantic networks (SNs). These SNs are derived from texts (in our case, the German Wikip edia) by a deep linguistic syntactico-semantic analysis. The extraction of meronym/holonym pairs is carried out by using, among other components, an automated theorem prover, whose work is based on a set of logical axioms. The corresponding algorithm is combined with a shallow approach enriched with semantic information. Through the employment of logical methods, the recall and precision of the semantic patterns pertinent to the extracted relations can be increased considerably.
@ARTICLE{vor:der:Brueck:Helbig:2010:b,
pdf={http://www.jlcl.org/2010_Heft1/tim_vorderbrueck.pdf},
journal={Journal for Language Technology and Computational Linguistics (JLCL)},
pages={57--81},
number={1},
author={vor der Brück, Tim and Helbig, Hermann},
volume={25},
year={2010},
title={Retrieving Meronyms from Texts Using An Automated Theorem Prover},
abstract={In this paper we present a truly semantic-oriented approach for meronymy relation extraction. It directly operates, instead of syntactic trees or surface representations, on semantic networks (SNs). These SNs are derived from texts (in our case, the German Wikip edia) by a deep linguistic syntactico-semantic analysis. The extraction of meronym/holonym pairs is carried out by using, among other components, an automated theorem prover, whose work is based on a set of logical axioms. The corresponding algorithm is combined with a shallow approach enriched with semantic information. Through the employment of logical methods, the recall and precision of the semantic patterns pertinent to the extracted relations can be increased considerably.}}
• A. Lücking and K. Bergmann, Introducing the Bielefeld SaGA CorpusEuropa Universität Viadrina Frankfurt/Oder: , 2010.
[Abstract] [BibTeX]

People communicate multimodally. Most prominently, they co-produce speech and gesture. How do they do that? Studying the interplay of both modalities has to be informed by empirically observed communication behavior. We present a corpus built of speech and gesture data gained in a controlled study. We describe 1) the setting underlying the data; 2) annotation of the data; 3) reliability evalution methods and results; and 4) applications of the corpus in the research domain of speech and gesture alignment.
@Misc{Luecking:Bergmann:2010,
author =     {Andy L\"{u}cking and Kirsten Bergmann},
title =     {Introducing the {B}ielefeld {SaGA} Corpus},
howpublished = {Talk given at \textit{Gesture: Evolution, Brain, and Linguistic
Structures.} 4th Conference of the International
Society for Gesture Studies (ISGS). Europa
abstract={People communicate multimodally. Most prominently, they co-produce speech and gesture. How do they do that? Studying the interplay of both modalities has to be informed by empirically observed communication behavior. We present a corpus built of speech and gesture data gained in a controlled study. We describe 1) the setting underlying the data; 2) annotation of the data; 3) reliability evalution methods and results; and 4) applications of the corpus in the research domain of speech and gesture alignment.},
year = {2010},
month = {07},
day = {28},
date =     {2010-07-28}
}
• A. Lücking, “A Semantic Account for Iconic Gestures,” in Gesture: Evolution, Brain, and Linguistic Structures, Europa Universität Viadrina Frankfurt/Oder, 2010, p. 210.
[BibTeX]

@INPROCEEDINGS{Luecking:2010,
organization={4th Conference of the International Society for Gesture Studies (ISGS)},
booktitle={Gesture: Evolution, Brain, and Linguistic Structures},
pages={210},
author={Lücking, Andy},
keywords={own},
month={7},
year={2010},
title={A Semantic Account for Iconic Gestures},
website={http://pub.uni-bielefeld.de/publication/2318565}}
• A. Lücking, K. Bergmann, F. Hahn, S. Kopp, and H. Rieser, “The Bielefeld Speech and Gesture Alignment Corpus (SaGA),” in Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, Malta, 2010, pp. 92-98.
[Abstract] [BibTeX]

People communicate multimodally. Most prominently, they co-produce speech and gesture. How do they do that? Studying the interplay of both modalities has to be informed by empirically observed communication behavior. We present a corpus built of speech and gesture data gained in a controlled study. We describe 1) the setting underlying the data; 2) annotation of the data; 3) reliability evalution methods and results; and 4) applications of the corpus in the research domain of speech and gesture alignment.
@INPROCEEDINGS{Luecking:et:al:2010,
organization={7th International Conference for Language Resources and Evaluation (LREC 2010)},
booktitle={Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality},
pages={92--98},
author={Lücking, Andy and Bergmann, Kirsten and Hahn, Florian and Kopp, Stefan and Rieser, Hannes},
keywords={own},
month={5},
year={2010},
title={The Bielefeld Speech and Gesture Alignment Corpus (SaGA)},
abstract={People communicate multimodally. Most prominently, they co-produce speech and gesture. How do they do that? Studying the interplay of both modalities has to be informed by empirically observed communication behavior. We present a corpus built of speech and gesture data gained in a controlled study. We describe 1) the setting underlying the data; 2) annotation of the data; 3) reliability evalution methods and results; and 4) applications of the corpus in the research domain of speech and gesture alignment.},
website={http://pub.uni-bielefeld.de/publication/2001935}}
• M. Z. Islam, J. Tiedemann, and A. Eisele, “English to Bangla Phrase – Based Machine Translation,” in The 14th Annual Conference of The European Association for Machine Translation. Saint-Raphaël, France, 27-28 May, 2010.
[BibTeX]

@INPROCEEDINGS{Zahurul:Tiedemann:Eisele:2010,
owner={zahurul},
booktitle={The 14th Annual Conference of The European Association for Machine Translation. Saint-Raphaël, France, 27-28 May},
author={Islam, Md. Zahurul and Tiedemann, Jörg and Eisele, Andreas},
timestamp={2011.08.02},
year={2010},
title={English to Bangla Phrase – Based Machine Translation},
pdf={https://hucompute.org/wp-content/uploads/2015/08/English_to_Bangla_Phrase–Based_Machine_Translation.pdf}}
• U. Waltinger, “GermanPolarityClues: A Lexical Resource for German Sentiment Analysis,” in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC ’10), Valletta, Malta, 2010.
[BibTeX]

@INPROCEEDINGS{Waltinger:2010:a,
publisher={European Language Resources Association (ELRA)},
booktitle={Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC '10)},
author={Waltinger, Ulli},
language={english},
editor={Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
month={may},
year={2010},
isbn={2-9517408-6-7},
title={GermanPolarityClues: A Lexical Resource for German Sentiment Analysis},
date_0={2010-05},
pdf={http://www.ulliwaltinger.de/pdf/91_Paper.pdf},
website={http://www.ulliwaltinger.de/sentiment/}}
• U. Waltinger, “GermanPolarityClues: A Lexical Resource for German Sentiment Analysis,” in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC ’10), Valletta, Malta, 2010.
[BibTeX]

@INPROCEEDINGS{Waltinger:2010:b,
publisher={European Language Resources Association (ELRA)},
booktitle={Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC '10)},
author={Waltinger, Ulli},
language={english},
editor={Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner, Daniel Tapias},
month={may},
year={2010},
isbn={2-9517408-6-7},
title={GermanPolarityClues: A Lexical Resource for German Sentiment Analysis},
date_0={2010-05}}
• A. Mehler, P. Weiß, P. Menke, and A. Lücking, “Towards a Simulation Model of Dialogical Alignment,” in Proceedings of the 8th International Conference on the Evolution of Language (Evolang8), 14-17 April 2010, Utrecht, 2010, pp. 238-245.
[BibTeX]

@INPROCEEDINGS{Mehler:Weiss:Menke:Luecking:2010,
booktitle={Proceedings of the 8th International Conference on the Evolution of Language (Evolang8), 14-17 April 2010, Utrecht},
website={http://www.let.uu.nl/evolang2010.nl/},
author={Mehler, Alexander and Wei{\ss}, Petra and Menke, Peter and Lücking, Andy},
year={2010},
title={Towards a Simulation Model of Dialogical Alignment},
pages={238-245}}
• F. Foscarini, Y. Kim, C. A. Lee, A. Mehler, G. Oliver, and S. Ross, “On the Notion of Genre in Digital Preservation,” in Automation in Digital Preservation, Dagstuhl, Germany, 2010.
[BibTeX]

@INPROCEEDINGS{Foscarini:Kim:Lee:Mehler:Oliver:Ross:2010,
publisher={Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany},
website={http://drops.dagstuhl.de/opus/volltexte/2010/2763},
pdf={http://drops.dagstuhl.de/opus/volltexte/2010/2763/pdf/10291.MehlerAlexander.Paper.2763.pdf},
booktitle={Automation in Digital Preservation},
number={10291},
author={Foscarini, Fiorella and Kim, Yunhyong and Lee, Christopher A. and Mehler, Alexander and Oliver, Gillian and Ross, Seamus},
series={Dagstuhl Seminar Proceedings},
editor={Chanod, Jean-Pierre and Dobreva, Milena and Rauber, Andreas and Ross, Seamus},
year={2010},
title={On the Notion of Genre in Digital Preservation},
issn={1862-4405},
annote={Keywords: Digital preservation, genre analysis, context modeling, diplomatics, information retrieval}}
• A. Mehler, R. Gleim, U. Waltinger, and N. Diewald, “Time Series of Linguistic Networks by Example of the Patrologia Latina,” in Proceedings of INFORMATIK 2010: Service Science, September 27 – October 01, 2010, Leipzig, 2010, pp. 609-616.
[BibTeX]

@INPROCEEDINGS{Mehler:Gleim:Waltinger:Diewald:2010,
publisher={GI},
booktitle={Proceedings of INFORMATIK 2010: Service Science, September 27 - October 01, 2010, Leipzig},
author={Mehler, Alexander and Gleim, Rüdiger and Waltinger, Ulli and Diewald, Nils},
editor={F{\"a}hnrich, Klaus-Peter and Franczyk, Bogdan},
year={2010},
volume={2},
pages={609-616},
title={Time Series of Linguistic Networks by Example of the Patrologia Latina},
series={Lecture Notes in Informatics},
pdf={http://subs.emis.de/LNI/Proceedings/Proceedings176/586.pdf}}
• R. Gleim, P. Warner, and A. Mehler, “eHumanities Desktop – An Architecture for Flexible Annotation in Iconographic Research,” in Proceedings of the 6th International Conference on Web Information Systems and Technologies (WEBIST ’10), April 7-10, 2010, Valencia, 2010.
[BibTeX]

@INPROCEEDINGS{Gleim:Warner:Mehler:2010,
booktitle={Proceedings of the 6th International Conference on Web Information Systems and Technologies (WEBIST '10), April 7-10, 2010, Valencia},
author={Gleim, Rüdiger and Warner, Paul and Mehler, Alexander},
year={2010},
title={eHumanities Desktop - An Architecture for Flexible Annotation in Iconographic Research},
website={https://www.researchgate.net/publication/220724277_eHumanities_Desktop_-_An_Architecture_for_Flexible_Annotation_in_Iconographic_Research}}
• P. Menke and A. Mehler, “The Ariadne System: A flexible and extensible framework for the modeling and storage of experimental data in the humanities,” in Proceedings of LREC 2010, Malta, 2010.
[Abstract] [BibTeX]

This paper introduces the Ariadne Corpus Management System. First, the underlying data model is presented which enables users to represent and process heterogeneous data sets within a single, consistent framework. Secondly, a set of automatized procedures is described that offers assistance to researchers in various data-related use cases. Finally, an approach to easy yet powerful data retrieval is introduced in form of a specialised querying language for multimodal data.
@INPROCEEDINGS{Menke:Mehler:2010,
publisher={ELDA},
booktitle={Proceedings of LREC 2010},
author={Menke, Peter and Mehler, Alexander},
year={2010},
title={The Ariadne System: A flexible and extensible framework for the modeling and storage of experimental data in the humanities},
abstract={This paper introduces the Ariadne Corpus Management System. First, the underlying data model is presented which enables users to represent and process heterogeneous data sets within a single, consistent framework. Secondly, a set of automatized procedures is described that offers assistance to researchers in various data-related use cases. Finally, an approach to easy yet powerful data retrieval is introduced in form of a specialised querying language for multimodal data.},
website={http://arnetminer.org/publication/the-ariadne-system-a-flexible-and-extensible-framework-for-the-modeling-and-storage-of-experimental-data-in-the-humanities-2839925.html}}
• T. Sutter and A. Mehler, “Einleitung: Der aktuelle Medienwandel im Blick einer interdisziplinären Medienwissenschaft,” in Medienwandel als Wandel von Interaktionsformen, T. Sutter and A. Mehler, Eds., Wiesbaden: VS Verlag für Sozialwissenschaften, 2010, pp. 7-16.
[Abstract] [BibTeX]

Die Herausforderung, die der Wandel von Kommunikationsmedien für die Medienwissenschaft darstellt, resultiert nicht nur aus der ungeheuren Beschleunigung des Medienwandels. Die Herausforderung stellt sich auch mit der Frage, welches die neuen Formen und Strukturen sind, die aus dem Wandel der Medien hervorgehen. Rückt man diese Frage in den Fokus der Überlegungen, kommen erstens Entwicklungen im Wechsel von Massenmedien zu neuen, „interaktiven Medien in den Blick. Dies betrifft den Wandel von den alten Medien in Form von Einwegkommunikation zu den neuen Medien in Form von Netzkommunikation. Dieser Wandel wurde in zahlreichen Analysen als eine Revolution beschrieben: Im Unterschied zur einseitigen, rückkopplungsarmen Kommunikationsform der Massenmedien sollen neue, computergestützte Formen der Medienkommunikation „interaktiv sein, d.h. gesteigerte Rückkopplungs- und Eingriffsmöglichkeiten für die Adressaten und Nutzer bieten. Sozialwissenschaftlich bedeutsam ist dabei die Einschätzung der Qualität und des Umfangs dieser neuen Möglichkeiten und Leistungen. Denn bislang bedeutete Medienwandel im Kern eine zunehmende Ausdifferenzierung alter und neuer Medien mit je spezifischen Leistungen, d.h. neue Medien ersetzen die älteren nicht, sondern sie ergänzen und erweitern sie. Allerdings wird im Zuge des aktuellen Medienwandels immer deutlicher, dass die neuen Medien durchaus imstande sind, die Leistungen massenmedialer Verbreitung von Kommunikation zu übernehmen. Stehen wir also, wie das schon seit längerem kühn vorhergesagt wird, vor der Etablierung eines Universalmediums, das in der Lage ist, die Formen und Funktionen anderer Medien zu übernehmen?
@Inbook{Sutter2010,
author={Sutter, Tilmann and Mehler, Alexander},
editor={Sutter, Tilmann and Mehler, Alexander},
title={Einleitung: Der aktuelle Medienwandel im Blick einer interdisziplin{\"a}ren Medienwissenschaft},
bookTitle={Medienwandel als Wandel von Interaktionsformen},
year={2010},
publisher={VS Verlag f{\"u}r Sozialwissenschaften},
pages={7--16},
abstract={Die Herausforderung, die der Wandel von Kommunikationsmedien f{\"u}r die Medienwissenschaft darstellt, resultiert nicht nur aus der ungeheuren Beschleunigung des Medienwandels. Die Herausforderung stellt sich auch mit der Frage, welches die neuen Formen und Strukturen sind, die aus dem Wandel der Medien hervorgehen. R{\"u}ckt man diese Frage in den Fokus der {\"U}berlegungen, kommen erstens Entwicklungen im Wechsel von Massenmedien zu neuen, „interaktiven Medien in den Blick. Dies betrifft den Wandel von den alten Medien in Form von Einwegkommunikation zu den neuen Medien in Form von Netzkommunikation. Dieser Wandel wurde in zahlreichen Analysen als eine Revolution beschrieben: Im Unterschied zur einseitigen, r{\"u}ckkopplungsarmen Kommunikationsform der Massenmedien sollen neue, computergest{\"u}tzte Formen der Medienkommunikation „interaktiv sein, d.h. gesteigerte R{\"u}ckkopplungs- und Eingriffsm{\"o}glichkeiten f{\"u}r die Adressaten und Nutzer bieten. Sozialwissenschaftlich bedeutsam ist dabei die Einsch{\"a}tzung der Qualit{\"a}t und des Umfangs dieser neuen M{\"o}glichkeiten und Leistungen. Denn bislang bedeutete Medienwandel im Kern eine zunehmende Ausdifferenzierung alter und neuer Medien mit je spezifischen Leistungen, d.h. neue Medien ersetzen die {\"a}lteren nicht, sondern sie erg{\"a}nzen und erweitern sie. Allerdings wird im Zuge des aktuellen Medienwandels immer deutlicher, dass die neuen Medien durchaus imstande sind, die Leistungen massenmedialer Verbreitung von Kommunikation zu {\"u}bernehmen. Stehen wir also, wie das schon seit l{\"a}ngerem k{\"u}hn vorhergesagt wird, vor der Etablierung eines Universalmediums, das in der Lage ist, die Formen und Funktionen anderer Medien zu {\"u}bernehmen?},
isbn={978-3-531-92292-8},
doi={10.1007/978-3-531-92292-8_1},
url={https://doi.org/10.1007/978-3-531-92292-8_1}
}

### 2009 (14)

• T. vor der Brück, “Approximation of the Parameters of a Readability Formula by Robust Regression,” in Machine Learning and Data Mining in Pattern recognition: Poster Proceedings of the International Conference on Machine Learning and Data Mining (MLDM), Leipzig, Germany, 2009, pp. 115-125.
[Abstract] [BibTeX]

Most readability formulas calculate a global readability score by combining several indicator values by a linear combination. Typical indicators are Average sentence length, Average number of syllables per word, etc. Usually the parameters of the linear combination are determined by a linear OLS (ordinary least square estimation) minimizing the sum of the squared residuals in comparison with human ratings for a given set of texts. The usage of OLS leads to several drawbacks. First, the parameters are not constraint in any way and are therefore not intuitive and difficult to interpret. Second, if the number of parameters become large, the effect of overfitting easily occurs. Finally, OLS is quite sensitive to outliers. Therefore, an alternative method is presented which avoids these drawbacks and is based on robust regression.
@INPROCEEDINGS{vor:der:Brueck:2009,
booktitle={Machine Learning and Data Mining in Pattern recognition: Poster Proceedings of the International Conference on Machine Learning and Data Mining (MLDM)},
pages={115--125},
author={vor der Brück, Tim},
year={2009},
title={Approximation of the Parameters of a Readability Formula by Robust Regression},
abstract={Most readability formulas calculate a global readability score by combining several indicator values by a linear combination. Typical indicators are Average sentence length, Average number of syllables per word, etc. Usually the parameters of the linear combination are determined by a linear OLS (ordinary least square estimation) minimizing the sum of the squared residuals in comparison with human ratings for a given set of texts. The usage of OLS leads to several drawbacks. First, the parameters are not constraint in any way and are therefore not intuitive and difficult to interpret. Second, if the number of parameters become large, the effect of overfitting easily occurs. Finally, OLS is quite sensitive to outliers. Therefore, an alternative method is presented which avoids these drawbacks and is based on robust regression.}}
• T. vor der Brück and S. Hartrumpf, “A Readability Checker Based on Deep Semantic Indicators,” in Human Language Technology. Challenges of the Information Society, Z. Vetulani and H. Uszkoreit, Eds., Berlin, Germany: Springer, 2009, vol. 5603, pp. 232-244.
[Abstract] [BibTeX]

One major reason that readability checkers are still far away from judging the understandability of texts consists in the fact that no semantic information is used. Syntactic, lexical, or morphological information can only give limited access for estimating the cognitive difficulties for a human being to comprehend a text. In this paper however, we present a readability checker which uses semantic information in addition. This information is represented as semantic networks and is derived by a deep syntactico-semantic analysis. We investigate in which situations a semantic readability indicator can lead to superior results in comparison with ordinary surface indicators like sentence length. Finally, we compute the weights of our semantic indicators in the readability function based on the user ratings collected in an online evaluation.
@INCOLLECTION{vor:der:Brueck:Hartrumpf:2009,
publisher={Springer},
booktitle={Human Language Technology. Challenges of the Information Society},
pages={232--244},
author={vor der Brück, Tim and Hartrumpf, Sven},
series={Lecture Notes in Computer Science (LNCS)},
volume={5603},
editor={Zygmunt Vetulani and Hans Uszkoreit},
year={2009},
website={http://rd.springer.com/chapter/10.1007/978-3-642-04235-5_20},
title={A Readability Checker Based on Deep Semantic Indicators},
abstract={One major reason that readability checkers are still far away from judging the understandability of texts consists in the fact that no semantic information is used. Syntactic, lexical, or morphological information can only give limited access for estimating the cognitive difficulties for a human being to comprehend a text. In this paper however, we present a readability checker which uses semantic information in addition. This information is represented as semantic networks and is derived by a deep syntactico-semantic analysis. We investigate in which situations a semantic readability indicator can lead to superior results in comparison with ordinary surface indicators like sentence length. Finally, we compute the weights of our semantic indicators in the readability function based on the user ratings collected in an online evaluation.},
address={Berlin, Germany}}
• T. vor der Brück, “Hypernymy Extraction Based on Shallow and Deep Patterns,” in From Form To Meaning: Processing Texts Automatically, Proceedings of the Biennial GSCL Conference 2009, Potsdam, Germany, 2009, pp. 41-52.
[Abstract] [BibTeX]

There exist various approaches to construct taxonomies by text mining. Usually these approaches are based on supervised learning and extract in a first step several patterns. These patterns are then applied to previously unseen texts and used to recognize hypernym/hyponym pairs. Normally these approaches are only based on a surface representation or a syntactic tree structure, i.e., a constituency or dependency tree derived by a syntactical parser. In this work we present an approach which, additionally to shallow patterns, directly operates on semantic networks which are derived by a deep linguistic syntacticosemantic analysis. Furthermore, the shallow approach heavily depends on semantic information, too. It is shown that recall and precision can be improved considerably than by relying on shallow patterns alone.
@INPROCEEDINGS{vor:der:Brueck:2009:b,
booktitle={From Form To Meaning: Processing Texts Automatically, Proceedings of the Biennial GSCL Conference 2009},
pages={41--52},
author={vor der Brück, Tim},
editor={Christian Chiarcos and Richard Eckart de Castilho},
year={2009},
title={Hypernymy Extraction Based on Shallow and Deep Patterns},
abstract={There exist various approaches to construct taxonomies by text mining. Usually these approaches are based on supervised learning and extract in a first step several patterns. These patterns are then applied to previously unseen texts and used to recognize hypernym/hyponym pairs. Normally these approaches are only based on a surface representation or a syntactic tree structure, i.e., a constituency or dependency tree derived by a syntactical parser. In this work we present an approach which, additionally to shallow patterns, directly operates on semantic networks which are derived by a deep linguistic syntacticosemantic analysis. Furthermore, the shallow approach heavily depends on semantic information, too. It is shown that recall and precision can be improved considerably than by relying on shallow patterns alone.},
address={Potsdam, Germany}}
• G. Bouma, S. Duarte, and M. Z. Islam, “Cross-lingual Alignment and Completion of Wikipedia Templates,” in Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (CLIAWS3), Boulder, Colorado, USA, June 4, 2009.
[Abstract] [BibTeX]

For many languages, the size of Wikipedia is an order of magnitude smaller than the English Wikipedia. We present a method for cross-lingual alignment of template and infobox attributes in Wikipedia. The alignment is used to add and complete templates and infoboxes in one language with information derived from Wikipedia in another language. We show that alignment between English and Dutch Wikipedia is accurate and that the result can be used to expand the number of template attribute-value pairs in Dutch Wikipedia by 50%. Furthermore, the alignment provides valuable information for normalization of template and attribute names and can be used to detect potential inconsistencies
@INPROCEEDINGS{Bouma:Duarte:Zahurul:2009,
owner={zahurul},
booktitle={Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (CLIAWS3), Boulder, Colorado, USA, June 4},
author={Bouma, Gosse and Duarte, Sergio and Islam, Md. Zahurul},
timestamp={2011.08.02},
year={2009},
title={Cross-lingual Alignment and Completion of Wikipedia Templates},
abstract={For many languages, the size of Wikipedia is an order of magnitude smaller than the English Wikipedia. We present a method for cross-lingual alignment of template and infobox attributes in Wikipedia. The alignment is used to add and complete templates and infoboxes in one language with information derived from Wikipedia in another language. We show that alignment between English and Dutch Wikipedia is accurate and that the result can be used to expand the number of template attribute-value pairs in Dutch Wikipedia by 50%. Furthermore, the alignment provides valuable information for normalization of template and attribute names and can be used to detect potential inconsistencies},
website={http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.148.1418}}
• U. Waltinger, “Polarity Reinforcement: Sentiment Polarity Identification By Means Of Social Semantics,” in Proceedings of the IEEE Africon 2009, September 23-25, Nairobi, Kenya, 2009.
[BibTeX]

@INPROCEEDINGS{Waltinger:2009:a,
booktitle={Proceedings of the IEEE Africon 2009, September 23-25, Nairobi, Kenya},
author={Waltinger, Ulli},
year={2009},
title={Polarity Reinforcement: Sentiment Polarity Identification By Means Of Social Semantics},
date_0={2009},
website={http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5308104},
pdf={http://www.ulliwaltinger.de/pdf/AfriconIEEE_2009_SentimentPolarity_Waltinger.pdf}}
• U. Waltinger, I. Cramer, and T. Wandmacher, “From Social Networks To Distributional Properties: A Comparative Study On Computing Semantic Relatedness,” in Proceedings of the 31th Annual Conference of the Cognitive Science Society, Austin, TX, 2009, pp. 3016-3021.
[BibTeX]

@INPROCEEDINGS{Waltinger:Cramer:Wandmacher:2009:a,
publisher={Cognitive Science Society},
booktitle={Proceedings of the 31th Annual Conference of the Cognitive Science Society},
pages={3016-3021},
author={Waltinger, Ulli and Cramer, Irene and Wandmacher, Tonio},
editor={Taatgen, N.A. and van Rijn, H.},
year={2009},
title={From Social Networks To Distributional Properties: A Comparative Study On Computing Semantic Relatedness},
date_0={2009},
pdf={http://csjarchive.cogsci.rpi.edu/proceedings/2009/papers/661/paper661.pdf}}
• U. Waltinger, “Polarity Reinforcement: Sentiment Polarity Identification By Means Of Social Semantics,” in Proceedings of the IEEE Africon 2009, September 23-25, Nairobi, Kenya, 2009.
[BibTeX]

@INPROCEEDINGS{Waltinger:2009:b,
booktitle={Proceedings of the IEEE Africon 2009, September 23-25, Nairobi, Kenya},
author={Waltinger, Ulli},
year={2009},
title={Polarity Reinforcement: Sentiment Polarity Identification By Means Of Social Semantics},
date_0={2009}}
• U. Waltinger, I. Cramer, and T. Wandmacher, “From Social Networks To Distributional Properties: A Comparative Study On Computing Semantic Relatedness,” in Proceedings of the 31th Annual Conference of the Cognitive Science Society, Austin, TX, 2009, pp. 3016-3021.
[BibTeX]

@INPROCEEDINGS{Waltinger:Cramer:Wandmacher:2009:b,
publisher={Cognitive Science Society},
booktitle={Proceedings of the 31th Annual Conference of the Cognitive Science Society},
pages={3016-3021},
author={Waltinger, Ulli and Cramer, Irene and Wandmacher, Tonio},
editor={N.A. Taatgen and H. van Rijn},
year={2009},
title={From Social Networks To Distributional Properties: A Comparative Study On Computing Semantic Relatedness},
date_0={2009}}
• A. Mehler and U. Waltinger, “Enhancing Document Modeling by Means of Open Topic Models: Crossing the Frontier of Classification Schemes in Digital Libraries by Example of the DDC,” Library Hi Tech, vol. 27, iss. 4, pp. 520-539, 2009.
[Abstract] [BibTeX]

Purpose: We present a topic classification model using the Dewey Decimal Classification (DDC) as the target scheme. This is done by exploring metadata as provided by the Open Archives Initiative (OAI) to derive document snippets as minimal document representations. The reason is to reduce the effort of document processing in digital libraries. Further, we perform feature selection and extension by means of social ontologies and related web-based lexical resources. This is done to provide reliable topic-related classifications while circumventing the problem of data sparseness. Finally, we evaluate our model by means of two language-specific corpora. This paper bridges digital libraries on the one hand and computational linguistics on the other. The aim is to make accessible computational linguistic methods to provide thematic classifications in digital libraries based on closed topic models as the DDC. Design/methodology/approach: text classification, text-technology, computational linguistics, computational semantics, social semantics. Findings: We show that SVM-based classifiers perform best by exploring certain selections of OAI document metadata. Research limitations/implications: The findings show that it is necessary to further develop SVM-based DDC-classifiers by using larger training sets possibly for more than two languages in order to get better F-measure values. Practical implications: We can show that DDC-classifications come into reach which primarily explore OAI metadata. Originality/value: We provide algorithmic and formal-mathematical information how to build DDC-classifiers for digital libraries.
@ARTICLE{Mehler:Waltinger:2009:b,
journal={Library Hi Tech},
number={4},
author={Mehler, Alexander and Waltinger, Ulli},
volume={27},
year={2009},
pages={520-539},
title={Enhancing Document Modeling by Means of Open Topic Models: Crossing the Frontier of Classification Schemes in Digital Libraries by Example of the DDC},
website={http://biecoll.ub.uni-bielefeld.de/frontdoor.php?source_opus=5001&la=de},
abstract={Purpose: We present a topic classification model using the Dewey Decimal Classification (DDC) as the target scheme. This is done by exploring metadata as provided by the Open Archives Initiative (OAI) to derive document snippets as minimal document representations. The reason is to reduce the effort of document processing in digital libraries. Further, we perform feature selection and extension by means of social ontologies and related web-based lexical resources. This is done to provide reliable topic-related classifications while circumventing the problem of data sparseness. Finally, we evaluate our model by means of two language-specific corpora. This paper bridges digital libraries on the one hand and computational linguistics on the other. The aim is to make accessible computational linguistic methods to provide thematic classifications in digital libraries based on closed topic models as the DDC. Design/methodology/approach: text classification, text-technology, computational linguistics, computational semantics, social semantics. Findings: We show that SVM-based classifiers perform best by exploring certain selections of OAI document metadata. Research limitations/implications: The findings show that it is necessary to further develop SVM-based DDC-classifiers by using larger training sets possibly for more than two languages in order to get better F-measure values. Practical implications: We can show that DDC-classifications come into reach which primarily explore OAI metadata. Originality/value: We provide algorithmic and formal-mathematical information how to build DDC-classifiers for digital libraries.}}
• R. Gleim, U. Waltinger, A. Ernst, A. Mehler, D. Esch, and T. Feith, “The eHumanities Desktop – An Online System for Corpus Management and Analysis in Support of Computing in the Humanities,” in Proceedings of the Demonstrations Session of the 12th Conference of the European Chapter of the Association for Computational Linguistics EACL 2009, 30 March – 3 April, Athens, 2009.
[BibTeX]

@INPROCEEDINGS{Gleim:Waltinger:Ernst:Mehler:Esch:Feith:2009,
booktitle={Proceedings of the Demonstrations Session of the 12th Conference of the European Chapter of the Association for Computational Linguistics EACL 2009, 30 March – 3 April, Athens},
author={Gleim, Rüdiger and Waltinger, Ulli and Ernst, Alexandra and Mehler, Alexander and Esch, Dietmar and Feith, Tobias},
year={2009},
title={The eHumanities Desktop – An Online System for Corpus Management and Analysis in Support of Computing in the Humanities}}
• A. Mehler, “Artifizielle Interaktivität. Eine semiotische Betrachtung,” in Medienwandel als Wandel von Interaktionsformen – von frühen Medienkulturen zum Web 2.0, T. Sutter and A. Mehler, Eds., Wiesbaden: VS, 2009.
[BibTeX]

@INCOLLECTION{Mehler:2009:d,
publisher={VS},
booktitle={Medienwandel als Wandel von Interaktionsformen – von frühen Medienkulturen zum Web 2.0},
author={Mehler, Alexander},
editor={Sutter, Tilmann and Mehler, Alexander},
year={2009},
title={Artifizielle Interaktivit{\"a}t. Eine semiotische Betrachtung},
address={Wiesbaden}}
• U. Waltinger and A. Mehler, “The Feature Difference Coefficient: Classification by Means of Feature Distributions,” in Proceedings of the Conference on Text Mining Services (TMS 2009), Leipzig, 2009, p. 159–168.
[BibTeX]

@INPROCEEDINGS{Waltinger:Mehler:2009:a,
publisher={Leipzig University},
booktitle={Proceedings of the Conference on Text Mining Services (TMS 2009)},
pages={159–168},
author={Waltinger, Ulli and Mehler, Alexander},
series={Leipziger Beitr{\"a}ge zur Informatik: Band XIV},
year={2009},
title={The Feature Difference Coefficient: Classification by Means of Feature Distributions},
address={Leipzig}}
• M. Santini, G. Rehm, S. Sharoff, and A. Mehler, Automatic Genre Identification: Issues and Prospects, M. Santini, G. Rehm, S. Sharoff, and A. Mehler, Eds., GSCL, 2009, vol. 24(1).
[BibTeX]

@BOOK{Santini:Rehm:Sharoff:Mehler:2009,
series={Journal for Language Technology and Computational Linguistics (JLCL)},
volume={24(1)},
editor={Santini, Marina and Rehm, Georg and Sharoff, Serge and Mehler, Alexander},
author={Santini, Marina and Rehm, Georg and Sharoff, Serge and Mehler, Alexander},
publisher={GSCL},
year={2009},
pagetotal={148},
title={Automatic Genre Identification: Issues and Prospects},
pdf={http://www.jlcl.org/2009_Heft1/JLCL24(1).pdf}}
• U. Waltinger, A. Mehler, and R. Gleim, “Social Semantics And Its Evaluation By Means of Closed Topic Models: An SVM-Classification Approach Using Semantic Feature Replacement By Topic Generalization,” in Proceedings of the Biennial GSCL Conference 2009, September 30 – October 2, Universität Potsdam, 2009.
[BibTeX]

@INPROCEEDINGS{Waltinger:Mehler:Gleim:2009:a,
booktitle={Proceedings of the Biennial GSCL Conference 2009, September 30 – October 2, Universit{\"a}t Potsdam},
author={Waltinger, Ulli and Mehler, Alexander and Gleim, Rüdiger},
year={2009},
title={Social Semantics And Its Evaluation By Means of Closed Topic Models: An SVM-Classification Approach Using Semantic Feature Replacement By Topic Generalization}}

### 2008 (13)

• T. vor der Brück and H. Stenzhorn, “A Dynamic Approach for Automatic Error Detection in Generation Grammars,” in Proceedings of the 18th European Conference on Artificial Intelligence (ECAI), Patras, Greece, 2008.
[Abstract] [BibTeX]

In any real world application scenario, natural language generation (NLG) systems have to employ grammars consisting of tremendous amounts of rules. Detecting and fixing errors in such grammars is therefore a highly tedious task. In this work we present a data mining algorithm which deduces incorrect grammar rules by abductive reasoning out of positive and negative training examples. More specifcally, the constituency trees belonging to successful generation processes and the incomplete trees of failed ones are analyzed. From this a quality score is derived for each grammar rule by analyzing the occurrences of the rules in the trees and by spotting the exact error locations in the incomplete trees. In prior work on automatic error detection v.d.Brück et al. [5] proposed a static error detection algorithm for generation grammars. The approach of Cussens et al. creates missing grammar rules for parsing using abduction [1]. Zeller introduced a dynamic approach in the related area of detecting errors in computer programs [6].
@INPROCEEDINGS{vor:der:Brueck:Stenzhorn:2008,
booktitle={Proceedings of the 18th European Conference on Artificial Intelligence (ECAI)},
author={vor der Brück, Tim and Stenzhorn, Holger},
month={July},
year={2008},
isbn={978-1-58603-891-5},
title={A Dynamic Approach for Automatic Error Detection in Generation Grammars},
abstract={In any real world application scenario, natural language generation (NLG) systems have to employ grammars consisting of tremendous amounts of rules. Detecting and fixing errors in such grammars is therefore a highly tedious task. In this work we present a data mining algorithm which deduces incorrect grammar rules by abductive reasoning out of positive and negative training examples. More specifcally, the constituency trees belonging to successful generation processes and the incomplete trees of failed ones are analyzed. From this a quality score is derived for each grammar rule by analyzing the occurrences of the rules in the trees and by spotting the exact error locations in the incomplete trees. In prior work on automatic error detection v.d.Brück et al. [5] proposed a static error detection algorithm for generation grammars. The approach of Cussens et al. creates missing grammar rules for parsing using abduction [1]. Zeller introduced a dynamic approach in the related area of detecting errors in computer programs [6].},
address={Patras, Greece}}
• T. vor der Brück, S. Hartrumpf, and H. Helbig, “A Readability Checker with Supervised Learning using Deep Syntactic and Semantic Indicators,” in Proceedings of the 11th International Multiconference: Information Society – IS 2008 – Language Technologies, Ljubljana, Slovenia, 2008, pp. 92-97.
[Abstract] [BibTeX]

Checking for readability or simplicity of texts is important for many institutional and individual users. Formulas for approximately measuring text readability have a long tradition. Usually, they exploit surfaceoriented indicators like sentence length, word length, word frequency, etc. However, in many cases, this information is not adequate to realistically approximate the cognitive difficulties a person can have to understand a text. Therefore we use deep syntactic and semantic indicators in addition. The syntactic information is represented by a dependency tree, the semantic information by a semantic network. Both representations are automatically generated by a deep syntactico-semantic analysis. A global readability score is determined by applying a nearest neighbor algorithm on 3,000 ratings of 300 test persons. The evaluation showed that the deep syntactic and semantic indicators lead to promising results comparable to the best surface-based indicators. The combination of deep and shallow indicators leads to an improvement over shallow indicators alone. Finally, a graphical user interface was developed which highlights difficult passages, depending on the individual indicator values, and displays a global readability score. Povzetek: Strojno učenje z odvisnostnimi drevesi je uporabljeno za ugotavljanje berljivosti besedil. 1
@INPROCEEDINGS{vor:der:Brueck:Hartrumpf:Helbig:2008:a,
url={http://pi7.fernuni-hagen.de/brueck/papers/brueck_hartrumpf_helbig08.pdf},
booktitle={Proceedings of the 11th International Multiconference: Information Society - IS 2008 - Language Technologies},
pages={92--97},
author={vor der Brück, Tim and Hartrumpf, Sven and Helbig, Hermann},
editor={Erjavec, Tomaž and Gros, Jerneja Žganec},
month={October},
year={2008},
isbn={987-961-264-006-4},
website={http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.160.5878},
title={A Readability Checker with Supervised Learning using Deep Syntactic and Semantic Indicators},
abstract={Checking for readability or simplicity of texts is important for many institutional and individual users. Formulas for approximately measuring text readability have a long tradition. Usually, they exploit surfaceoriented indicators like sentence length, word length, word frequency, etc. However, in many cases, this information is not adequate to realistically approximate the cognitive difficulties a person can have to understand a text. Therefore we use deep syntactic and semantic indicators in addition. The syntactic information is represented by a dependency tree, the semantic information by a semantic network. Both representations are automatically generated by a deep syntactico-semantic analysis. A global readability score is determined by applying a nearest neighbor algorithm on 3,000 ratings of 300 test persons. The evaluation showed that the deep syntactic and semantic indicators lead to promising results comparable to the best surface-based indicators. The combination of deep and shallow indicators leads to an improvement over shallow indicators alone. Finally, a graphical user interface was developed which highlights difficult passages, depending on the individual indicator values, and displays a global readability score. Povzetek: Strojno učenje z odvisnostnimi drevesi je uporabljeno za ugotavljanje berljivosti besedil. 1},
address={Ljubljana, Slovenia}}
• T. vor der Brück, S. Hartrumpf, and H. Helbig, “A Readability Checker with Supervised Learning using Deep Indicators,” Informatica, vol. 32, iss. 4, pp. 429-435, 2008.
[Abstract] [BibTeX]

Checking for readability or simplicity of texts is important for many institutional and individual users. Formulas for approximately measuring text readability have a long tradition. Usually, they exploit surface-oriented indicators like sentence length, word length, word frequency, etc. However, in many cases, this information is not adequate to realistically approximate the cognitive difficulties a person can have to understand a text. Therefore we use deep syntactic and semantic indicators in addition. The syntactic information is represented by a dependency tree, the semantic information by a semantic network. Both representations are automatically generated by a deep syntactico-semantic analysis. A global readability score is determined by applying a nearest neighbor algorithm on 3,000 ratings of 300 test persons. The evaluation showed that the deep syntactic and semantic indicators lead to promising results comparable to the best surface-based indicators. The combination of deep and shallow indicators leads to an improvement over shallow indicators alone. Finally, a graphical user interface was developed which highlights difficult passages, depending on the individual indicator values, and displays a global readability score.
@ARTICLE{vor:der:Brueck:Hartrumpf:Helbig:2008:b,
journal={Informatica},
pages={429--435},
number={4},
author={vor der Brück, Tim and Hartrumpf, Sven and Helbig, Hermann},
volume={32},
year={2008},
title={A Readability Checker with Supervised Learning using Deep Indicators},
abstract={Checking for readability or simplicity of texts is important for many institutional and individual users. Formulas for approximately measuring text readability have a long tradition. Usually, they exploit surface-oriented indicators like sentence length, word length, word frequency, etc. However, in many cases, this information is not adequate to realistically approximate the cognitive difficulties a person can have to understand a text. Therefore we use deep syntactic and semantic indicators in addition. The syntactic information is represented by a dependency tree, the semantic information by a semantic network. Both representations are automatically generated by a deep syntactico-semantic analysis. A global readability score is determined by applying a nearest neighbor algorithm on 3,000 ratings of 300 test persons. The evaluation showed that the deep syntactic and semantic indicators lead to promising results comparable to the best surface-based indicators. The combination of deep and shallow indicators leads to an improvement over shallow indicators alone. Finally, a graphical user interface was developed which highlights difficult passages, depending on the individual indicator values, and displays a global readability score.},
}
• O. Pustylnikov and A. Mehler, “Text classification by means of structural features. What kind of information about texts is captured by their structure?,” in Proceedings of RUSSIR ’08, September 1-5, Taganrog, Russia, 2008.
[BibTeX]

@INPROCEEDINGS{Pustylnikov:Mehler:2008:c,
booktitle={Proceedings of RUSSIR '08, September 1-5, Taganrog, Russia},
author={Pustylnikov, Olga and Mehler, Alexander},
year={2008},
title={Text classification by means of structural features. What kind of information about texts is captured by their structure?},
pdf={http://www.hucompute.org/data/pdf/mehler_geibel_pustylnikov_2007.pdf}}
• U. Waltinger, A. Mehler, and M. Stührenberg, “An Integrated Model of Lexical Chaining: Applications, Resources and their Format,” in Proceedings of KONVENS 2008 – Ergänzungsband Textressourcen und lexikalisches Wissen, 2008, pp. 59-70.
[BibTeX]

@INPROCEEDINGS{Waltinger:Mehler:Stuehrenberg:2008,
booktitle={Proceedings of KONVENS 2008 – Erg{\"a}nzungsband Textressourcen und lexikalisches Wissen},
pages={59-70},
author={Waltinger, Ulli and Mehler, Alexander and Stührenberg, Maik},
editor={Storrer, Angelika and Geyken, Alexander and Siebert, Alexander and Würzner, Kay-Michael},
year={2008},
title={An Integrated Model of Lexical Chaining: Applications, Resources and their Format},
pdf={http://www.ulliwaltinger.de/pdf/Konvens_2008_Integrated_Model_of_Lexical_Chaining_WaltingerMehlerStuehrenberg.pdf}}
• A. Mehler, “A Model of the Distribution of the Distances of Alike Elements in Dialogical Communication,” in Proceedings of the International Conference on Information Theory and Statistical Learning (ITSL ’08), July 14-15, 2008, Las Vegas, 2008, pp. 45-50.
[BibTeX]

@INPROCEEDINGS{Mehler:2008:c,
booktitle={Proceedings of the International Conference on Information Theory and Statistical Learning (ITSL '08), July 14-15, 2008, Las Vegas},
pages={45-50},
author={Mehler, Alexander},
year={2008},
title={A Model of the Distribution of the Distances of Alike Elements in Dialogical Communication}}
• U. Waltinger, A. Mehler, and G. Heyer, “Towards Automatic Content Tagging: Enhanced Web Services in Digital Libraries Using Lexical Chaining,” in 4th Int. Conf. on Web Information Systems and Technologies (WEBIST ’08), 4-7 May, Funchal, Portugal, Barcelona, 2008, pp. 231-236.
[BibTeX]

@INPROCEEDINGS{Waltinger:Mehler:Heyer:2008,
publisher={INSTICC Press},
url={http://dblp.uni-trier.de/db/conf/webist/webist2008-2.html#WaltingerMH08},
booktitle={4th Int. Conf. on Web Information Systems and Technologies (WEBIST '08), 4-7 May, Funchal, Portugal},
pages={231-236},
author={Waltinger, Ulli and Mehler, Alexander and Heyer, Gerhard},
editor={Cordeiro, José and Filipe, Joaquim and Hammoudi, Slimane},
year={2008},
title={Towards Automatic Content Tagging: Enhanced Web Services in Digital Libraries Using Lexical Chaining},
website={http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.463.3097},
pdf={http://www.ulliwaltinger.de/pdf/Webist_2008_Towards_Automatic_Content_Tagging_WaltingerMehlerHeyer.pdf},
address={Barcelona}}
• A. Mehler, “A Short Note on Social-Semiotic Networks from the Point of View of Quantitative Semantics,” in Proceedings of the Dagstuhl Seminar on Social Web Communities, September 21-26, Dagstuhl, 2008.
[BibTeX]

@INPROCEEDINGS{Mehler:2008:f,
pdf={http://drops.dagstuhl.de/opus/volltexte/2008/1788/pdf/08391.MehlerAlexander.ExtAbstract.1788.pdf},
booktitle={Proceedings of the Dagstuhl Seminar on Social Web Communities, September 21-26, Dagstuhl},
author={Mehler, Alexander},
editor={Alani, Harith and Staab, Steffen and Stumme, Gerd},
year={2008},
title={A Short Note on Social-Semiotic Networks from the Point of View of Quantitative Semantics}}
• A. Mehler, R. Gleim, A. Ernst, and U. Waltinger, “WikiDB: Building Interoperable Wiki-Based Knowledge Resources for Semantic Databases,” Sprache und Datenverarbeitung. International Journal for Language Data Processing, vol. 32, iss. 1, pp. 47-70, 2008.
[Abstract] [BibTeX]

This article describes an API for exploring the logical document and the logical network structure of wikis. It introduces an algorithm for the semantic preprocessing, filtering and typing of these building blocks. Further, this article models the process of wiki generation based on a unified format of syntactic, semantic and pragmatic representations. This three-level approach to make accessible syntactic, semantic and pragmatic aspects of wiki-based structure formation is complemented by a corresponding database model – called WikiDB – and an API operating thereon. Finally, the article provides an empirical study of using the three-fold representation format in conjunction with WikiDB.
@ARTICLE{Mehler:Gleim:Ernst:Waltinger:2008,
pdf={http://www.ulliwaltinger.de/pdf/Konvens_2008_WikiDB_Building_Semantic_Databases_MehlerGleimErnstWaltinger.pdf},
journal={Sprache und Datenverarbeitung. International Journal for Language Data Processing},
pages={47-70},
number={1},
author={Mehler, Alexander and Gleim, Rüdiger and Ernst, Alexandra and Waltinger, Ulli},
volume={32},
year={2008},
title={WikiDB: Building Interoperable Wiki-Based Knowledge Resources for Semantic Databases},
abstract={This article describes an API for exploring the logical document and the logical network structure of wikis. It introduces an algorithm for the semantic preprocessing, filtering and typing of these building blocks. Further, this article models the process of wiki generation based on a unified format of syntactic, semantic and pragmatic representations. This three-level approach to make accessible syntactic, semantic and pragmatic aspects of wiki-based structure formation is complemented by a corresponding database model – called WikiDB – and an API operating thereon. Finally, the article provides an empirical study of using the three-fold representation format in conjunction with WikiDB.}}
• U. Waltinger and A. Mehler, “Who is it? Context sensitive named entity and instance recognition by means of Wikipedia,” in Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence (WI-2008), 2008, p. 381–384.
[BibTeX]

@INPROCEEDINGS{Waltinger:Mehler:2008:a,
publisher={IEEE Computer Society},
booktitle={Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence (WI-2008)},
pages={381–384},
author={Waltinger, Ulli and Mehler, Alexander},
year={2008},
title={Who is it? Context sensitive named entity and instance recognition by means of Wikipedia},
website={http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.324.5881},
pdf={http://www.ulliwaltinger.de/pdf/WI_2008_Context_Sensitive_Instance_Recognition_WaltingerMehler.pdf}}
• A. Lücking, A. Mehler, and P. Menke, “Taking Fingerprints of Speech-and-Gesture Ensembles: Approaching Empirical Evidence of Intrapersonal Alignment in Multimodal Communication,” in LONDIAL 2008: Proceedings of the 12th Workshop on the Semantics and Pragmatics of Dialogue (SEMDIAL), King’s College London, 2008, p. 157–164.
[BibTeX]

@INPROCEEDINGS{Luecking:Mehler:Menke:2008,
booktitle={LONDIAL 2008: Proceedings of the 12th Workshop on the Semantics and Pragmatics of Dialogue (SEMDIAL)},
pages={157–164},
author={Lücking, Andy and Mehler, Alexander and Menke, Peter},
month={June 2–4},
year={2008},
title={Taking Fingerprints of Speech-and-Gesture Ensembles: Approaching Empirical Evidence of Intrapersonal Alignment in Multimodal Communication},
website={https://www.researchgate.net/publication/237305375_Taking_Fingerprints_of_Speech-and-Gesture_Ensembles_Approaching_Empirical_Evidence_of_Intrapersonal_Alignment_in_Multimodal_Communication},
address={King's College London}}
• A. Mehler and T. Sutter, “Interaktive Textproduktion in Wiki-basierten Kommunikationssystemen,” in Kommunikation, Partizipation und Wirkungen im Social Web – Weblogs, Wikis, Podcasts und Communities aus interdisziplinärer Sicht, A. Zerfaß, M. Welker, and J. Schmidt, Eds., Köln: Herbert von Halem, 2008, pp. 267-300.
[Abstract] [BibTeX]

This article addresses challenges in maintaining and annotating image resources in the field of iconographic research. We focus on the task of bringing together generic and extensible techniques for resource and anno- tation management with the highly specific demands in this area of research. Special emphasis is put on the interrelation of images, image segements and textual contents. In addition, we describe the architecture, data model and user interface of the open annotation system used in the image database application that is a part of the eHumanities Desktop.
@INCOLLECTION{Mehler:Sutter:2008,
publisher={Herbert von Halem},
booktitle={Kommunikation, Partizipation und Wirkungen im Social Web – Weblogs, Wikis, Podcasts und Communities aus interdisziplin{\"a}rer Sicht},
pages={267-300},
author={Mehler, Alexander and Sutter, Tilmann},
editor={Zerfa{\ss}, Ansgar and Welker, Martin and Schmidt, Jan},
year={2008},
title={Interaktive Textproduktion in Wiki-basierten Kommunikationssystemen},
abstract={This article addresses challenges in maintaining and annotating image resources in the field of iconographic research. We focus on the task of bringing together generic and extensible techniques for resource and anno- tation management with the highly specific demands in this area of research. Special emphasis is put on the interrelation of images, image segements and textual contents. In addition, we describe the architecture, data model and user interface of the open annotation system used in the image database application that is a part of the eHumanities Desktop.}}
• A. Mehler, “On the Impact of Community Structure on Self-Organizing Lexical Networks,” in Proceedings of the 7th Evolution of Language Conference (Evolang 2008), March 11-15, 2008, Barcelona, 2008, pp. 227-234.
[Abstract] [BibTeX]

This paper presents a simulation model of self-organizing lexical networks. Its starting point is the notion of an association game in which the impact of varying community models is studied on the emergence of lexical networks. The paper reports on experiments whose results are in accordance with findings in the framework of the naming game. This is done by means of a multilevel network model in which the correlation of social and of linguistic networks is studied
@INPROCEEDINGS{Mehler:2008:e,
publisher={World Scientific},
website={http://stel.ub.edu/evolang2008/evo10.htm},
booktitle={Proceedings of the 7th Evolution of Language Conference (Evolang 2008), March 11-15, 2008, Barcelona},
pages={227-234},
author={Mehler, Alexander},
editor={Smith, Andrew D. M. and Smith, Kenny and Cancho, Ramon Ferrer i},
year={2008},
title={On the Impact of Community Structure on Self-Organizing Lexical Networks},
abstract={This paper presents a simulation model of self-organizing lexical networks. Its starting point is the notion of an association game in which the impact of varying community models is studied on the emergence of lexical networks. The paper reports on experiments whose results are in accordance with findings in the framework of the naming game. This is done by means of a multilevel network model in which the correlation of social and of linguistic networks is studied}}

### 2007 (21)

• T. vor der Brück and S. Hartrumpf, “A Semantically Oriented Readability Checker for German,” in Proceedings of the 3rd Language & Technology Conference, Z. Vetulani, Ed., Poznań, Poland: Wydawnictwo Poznańskie, 2007, pp. 270-274.
[Abstract] [BibTeX]

One major reason that readability checkers are still far away from judging the understandability of texts consists in the fact that no semantic information is used. Syntactic, lexical, or morphological information can only give limited access for estimating the cognitive difficulties for a human being to comprehend a text. In this paper however, we present a readability checker which uses semantic information in addition. This information is represented as semantic networks and is derived by a deep syntactico-semantic analysis. We investigate in which situations a semantic readability indicator can lead to superior results in comparison with ordinary surface indicators like sentence length. Finally, we compute the correlations and absolute errors for our semantic indicators related to user ratings collected in an online evaluation.
@INCOLLECTION{vor:der:Brueck:Hartrumpf:2007,
publisher={Wydawnictwo Poznańskie},
url={http://pi7.fernuni-hagen.de/papers/brueck_hartrumpf07_online.pdf},
booktitle={Proceedings of the 3rd Language \& Technology Conference},
pages={270--274},
author={vor der Brück, Tim and Hartrumpf, Sven},
editor={Zygmunt Vetulani},
month={October},
year={2007},
isbn={978-83-7177-407-2},
title={A Semantically Oriented Readability Checker for German},
abstract={One major reason that readability checkers are still far away from judging the understandability of texts consists in the fact that no semantic information is used. Syntactic, lexical, or morphological information can only give limited access for estimating the cognitive difficulties for a human being to comprehend a text. In this paper however, we present a readability checker which uses semantic information in addition. This information is represented as semantic networks and is derived by a deep syntactico-semantic analysis. We investigate in which situations a semantic readability indicator can lead to superior results in comparison with ordinary surface indicators like sentence length. Finally, we compute the correlations and absolute errors for our semantic indicators related to user ratings collected in an online evaluation.},
address={Poznań, Poland}}
• T. vor der Brück and S. Busemann, “Suggesting Error Corrections of Path Expressions and Categories for Tree-Mapping Grammars,” Zeitschrift für Sprachwissenschaft, vol. 26, iss. 2, 2007.
[Abstract] [BibTeX]

Tree mapping grammars are used in natural language generation (NLG) to map non-linguistic input onto a derivation tree from which the target text can be trivially read off as the terminal yield. Such grammars may consist of a large number of rules. Finding errors is quite tedious and sometimes very time-consuming. Often the generation fails because the relevant input subtree is not specified correctly. This work describes a method to detect and correct wrong assignments of input subtrees to grammar categories by cross-validating grammar rules with the given input structures. The method also detects and corrects the usage of a category in a grammar rule. The result is implemented in a grammar development workbench and accelerates the grammar writer's work considerably. The paper suggests the algorithms can be ported to other areas in which tree mapping is required.
@ARTICLE{vor:der:Brueck:Busemann:2007,
url={http://www.reference-global.com/doi/pdfplus/10.1515/ZFS.2007.021},
journal={Zeitschrift für Sprachwissenschaft},
number={2},
author={vor der Brück, Tim and Busemann, Stephan},
volume={26},
year={2007},
abstract={Tree mapping grammars are used in natural language generation (NLG) to map non-linguistic input onto a derivation tree from which the target text can be trivially read off as the terminal yield. Such grammars may consist of a large number of rules. Finding errors is quite tedious and sometimes very time-consuming. Often the generation fails because the relevant input subtree is not specified correctly. This work describes a method to detect and correct wrong assignments of input subtrees to grammar categories by cross-validating grammar rules with the given input structures. The method also detects and corrects the usage of a category in a grammar rule. The result is implemented in a grammar development workbench and accelerates the grammar writer's work considerably. The paper suggests the algorithms can be ported to other areas in which tree mapping is required.},
title={Suggesting Error Corrections of Path Expressions and Categories for Tree-Mapping Grammars}
}
• T. vor der Brück and J. Leveling, “Parameter Learning for a Readability Checking Tool,” in Proceedings of the LWA 2007 (Lernen-Wissen-Adaption), Workshop KDML, A. Hinneburg, Ed., Halle/Saale, Germany: Gesellschaft für Informatik, 2007.
[Abstract] [BibTeX]

This paper describes the application of machine learning methods to determine parameters for DeLite, a readability checking tool. DeLite pinpoints text segments that are difficult to understand and computes for a given text a global readability score, which is a weighted sum of normalized indicator values. Indicator values are numeric properties derived from linguistic units in the text, such as the distance between a verb and its complements or the number of possible antecedents for a pronoun. Indicators are normalized by means of a derivation of the Fermi function with two parameters. DeLite requires individual parameters for this normalization function and a weight for each indicator to compute the global readability score. Several experiments to determine these parameters were conducted, using different machine learning approaches. The training data consists of more than 300 user ratings of texts from the municipality domain. The weights for the indicators are learned using two approaches: i) robust regression with linear optimization and ii) an approximative iterative linear regression algorithm. For evaluation, the computed readability scores are compared to user ratings. The evaluation showed that iterative linear regression yields a smaller square error than robust regression although this method is only approximative. Both methods yield results outperforming a first manual setting, and for both methods, basically the same set of non-zero weights remain.
@INCOLLECTION{vor:der:Brueck:Leveling:2007,
publisher={Gesellschaft für Informatik},
booktitle={Proceedings of the LWA 2007 (Lernen-Wissen-Adaption), Workshop KDML},
author={vor der Brück, Tim and Leveling, Johannes},
editor={Alexander Hinneburg},
year={2007},
title={Parameter Learning for a Readability Checking Tool},
abstract={This paper describes the application of machine learning methods to determine parameters for DeLite, a readability checking tool. DeLite pinpoints text segments that are difficult to understand and computes for a given text a global readability score, which is a weighted sum of normalized indicator values. Indicator values are numeric properties derived from linguistic units in the text, such as the distance between a verb and its complements or the number of possible antecedents for a pronoun. Indicators are normalized by means of a derivation of the Fermi function with two parameters. DeLite requires individual parameters for this normalization function and a weight for each indicator to compute the global readability score. Several experiments to determine these parameters were conducted, using different machine learning approaches. The training data consists of more than 300 user ratings of texts from the municipality domain. The weights for the indicators are learned using two approaches: i) robust regression with linear optimization and ii) an approximative iterative linear regression algorithm. For evaluation, the computed readability scores are compared to user ratings. The evaluation showed that iterative linear regression yields a smaller square error than robust regression although this method is only approximative. Both methods yield results outperforming a first manual setting, and for both methods, basically the same set of non-zero weights remain.},
website={http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.69.6079}}
• C. Borr, M. Hielscher-Fastabend, and A. Lücking, “Reliability and Validity of Cervical Auscultation,” Dysphagia, vol. 22, pp. 225-234, 2007.
[Abstract] [BibTeX]

We conducted a two-part study that contributes to the discussion about cervical auscultation (CA) as a scientifically justifiable and medically useful tool to identify patients with a high risk of aspiration/penetration. We sought to determine (1) acoustic features that mark a deglutition act as dysphagic; (2) acoustic changes in healthy older deglutition profiles compared with those of younger adults; (3) the correctness and concordance of rater judgments based on CA; and (4) if education in CA improves individual reliability. The first part of the study focused on a comparison of the swallow morphology of dysphagic as opposed to healthy subjects� deglutition in terms of structure properties of the pharyngeal phase of deglutition. We obtained the following results. The duration of deglutition apnea is significantly higher in the older group than in the younger one. Comparing the younger group and the dysphagic group we found significant differences in duration of deglutition apnea, onset time, and number of gulps. Just one parameter, number of gulps, distinguishes significantly between the older and the dysphagic groups. The second part of the study aimed at evaluating the reliability of CA in detecting dysphagia measured as the concordance and the correctness of CA experts in classifying swallowing sounds. The interrater reliability coefficient AC1 resulted in a value of 0.46, which is to be interpreted as fair agreement. Furthermore, we found that comparison with radiologically defined aspiration/penetration for the group of experts (speech and language therapists) yielded 70% specificity and 94% sensitivity. We conclude that the swallowing sounds contain audible cues that should, in principle, permit reliable classification and view CA as an early warning system for identifying patients with a high risk of aspiration/penetration; however, it is not appropriate as a stand-alone tool.
@ARTICLE{Borr:Luecking:Hierlscher:2007,
publisher={Springer New York},
url={http://dx.doi.org/10.1007/s00455-007-9078-3},
journal={Dysphagia},
pages={225--234},
author={Borr, Christiane and Hielscher-Fastabend, Martina and Lücking, Andy},
volume={22},
doi={10.1007/s00455-007-9078-3},
year={2007},
abstract={We conducted a two-part study that contributes to the discussion about cervical auscultation (CA) as a scientifically justifiable and medically useful tool to identify patients with a high risk of aspiration/penetration. We sought to determine (1) acoustic features that mark a deglutition act as dysphagic; (2) acoustic changes in healthy older deglutition profiles compared with those of younger adults; (3) the correctness and concordance of rater judgments based on CA; and (4) if education in CA improves individual reliability. The first part of the study focused on a comparison of the swallow morphology of dysphagic as opposed to healthy subjects� deglutition in terms of structure properties of the pharyngeal phase of deglutition. We obtained the following results. The duration of deglutition apnea is significantly higher in the older group than in the younger one. Comparing the younger group and the dysphagic group we found significant differences in duration of deglutition apnea, onset time, and number of gulps. Just one parameter, number of gulps, distinguishes significantly between the older and the dysphagic groups. The second part of the study aimed at evaluating the reliability of CA in detecting dysphagia measured as the concordance and the correctness of CA experts in classifying swallowing sounds. The interrater reliability coefficient AC1 resulted in a value of 0.46, which is to be interpreted as fair agreement. Furthermore, we found that comparison with radiologically defined aspiration/penetration for the group of experts (speech and language therapists) yielded 70% specificity and 94% sensitivity. We conclude that the swallowing sounds contain audible cues that should, in principle, permit reliable classification and view CA as an early warning system for identifying patients with a high risk of aspiration/penetration; however, it is not appropriate as a stand-alone tool.},
title={Reliability and Validity of Cervical Auscultation},
issue={3},
pdf={http://www.shkim.eu/cborr/ca5manuscript.pdf}}
• A. Kranstedt, A. Lücking, T. Pfeiffer, H. Rieser, and M. Staudacher, Locating Objects by Pointing, 2007.
[BibTeX]

@MISC{Kranstedt:et:al:2007,
author={Kranstedt, Alfred and Lücking, Andy and Pfeiffer, Thies and Rieser, Hannes and Staudacher, Marc},
keywords={own},
month={6},
year={2007},
howpublished={3rd International Conference of the International Society for Gesture Studies. Evanston, IL, USA},
title={Locating Objects by Pointing}}
• M. Asadullah, M. Z. Islam, and M. Khan, “Error-tolerant Finite-state Recognizer and String Pattern Similarity Based Spell-Checker for Bengali,” in 5th International Conference on Natural Language Processing (ICON) as a poster,Hyderabad, India, January 2007, 2007.
[Abstract] [BibTeX]

A crucial figure of merit for a spelling checker is not just whether it can detect misspelled words, but also in how it ranks the sugges tions for the word. Spelling checker algorithms using edit distance methods tend to produce a large number of possibilities for misspelled words. We propose an alternative approach to checking the spelling of Bangla text that uses a finite state automaton (FSA) to probabilistically create the suggestion list for a misspelled word. FSA has proven to be an effective method for problems requiring probabilistic solution and high error tolerance. We start by using a finite state representation for all the words in the Bangla dictionary; the algorithm then uses the state tables to test a string, and in case of an erroneous string, try to find all possible solutions by attempting singular and multi - step transitions to consume one or more characters and using the su bsequent characters as look - ahead; and finally, we use backtracking to add each possible solution to the suggestion list. The use of finite state representation for the word implies that the algorithm is much more efficient in the case of non - inflected for ms; in case of nouns, it is even more significant as Bangla nouns are heavily used in the non - inflected form. In terms of error detection and correction, the algorithm uses the statistics of Bangla error pattern and thus produces a small number of signific ant suggestions. One notable limitation is the inability to handle transposition errors as a single edit distance errors. This is not as significant as it may seem since the number of transposition errors are not as common as other errors in Bangla. This p aper presents the structure and the algorithm to implement a Practical Bangla spell - checker, and discusses the results obtained from the prototype implementation.
@INPROCEEDINGS{Asadullah:Zahurul:Khan:2007,
owner={zahurul},
booktitle={5th International Conference on Natural Language Processing (ICON) as a poster,Hyderabad, India, January 2007},
author={Asadullah, Munshi and Islam, Md. Zahurul and Khan, Mumit},
timestamp={2011.08.02},
year={2007},
title={Error-tolerant Finite-state Recognizer and String Pattern Similarity Based Spell-Checker for Bengali},
abstract={A crucial figure of merit for a spelling checker is not just whether it can detect misspelled words, but also in how it ranks the sugges tions for the word. Spelling checker algorithms using edit distance methods tend to produce a large number of possibilities for misspelled words. We propose an alternative approach to checking the spelling of Bangla text that uses a finite state automaton (FSA) to probabilistically create the suggestion list for a misspelled word. FSA has proven to be an effective method for problems requiring probabilistic solution and high error tolerance. We start by using a finite state representation for all the words in the Bangla dictionary; the algorithm then uses the state tables to test a string, and in case of an erroneous string, try to find all possible solutions by attempting singular and multi - step transitions to consume one or more characters and using the su bsequent characters as look - ahead; and finally, we use backtracking to add each possible solution to the suggestion list. The use of finite state representation for the word implies that the algorithm is much more efficient in the case of non - inflected for ms; in case of nouns, it is even more significant as Bangla nouns are heavily used in the non - inflected form. In terms of error detection and correction, the algorithm uses the statistics of Bangla error pattern and thus produces a small number of signific ant suggestions. One notable limitation is the inability to handle transposition errors as a single edit distance errors. This is not as significant as it may seem since the number of transposition errors are not as common as other errors in Bangla. This p aper presents the structure and the algorithm to implement a Practical Bangla spell - checker, and discusses the results obtained from the prototype implementation.},
pdf={https://hucompute.org/wp-content/uploads/2015/08/Error-tolerant_Finite-state_Recognizer_and_String_Pattern_Similarity_Based_Spell-Checker_for_Bengali.pdf}}
• M. Z. Islam, M. N. Uddin, and M. Khan, “A Light Weight Stemmer for Bengali and Its Use in Spelling Checker,” in 1st International Conference on Digital Communications and Computer Applications (DCCA2007), 2007.
[Abstract] [BibTeX]

Stemming is an operation that splits a word into the constituent root part and affix without doing complete morphological analysis. It is used to impr ove the performance of spelling checkers and informatio n retrieval applications, where morphological analysi would be too computationally expensive. For spellin g checkers specifically, using stemming may drastical ly reduce the dictionary size, often a bottleneck for mobile and embedded devices. This paper presents a computationally inexpensive stemming algorithm for Bengali, which handles suffix removal in a domain independent way. The evaluation of the proposed algorithm in a Bengali spelling checker indicates t hat it can be effectively used in information retrieval applications in general.
@INPROCEEDINGS{Zahurul:Uddin:Khan:2007,
owner={zahurul},
booktitle={1st International Conference on Digital Communications and Computer Applications (DCCA2007)},
author={Islam, Md. Zahurul and Uddin, Md. Nizam and Khan, Mumit},
timestamp={2011.08.02},
year={2007},
title={A Light Weight Stemmer for Bengali and Its Use in Spelling Checker},
abstract={Stemming is an operation that splits a word into the constituent root part and affix without doing complete morphological analysis. It is used to impr ove the performance of spelling checkers and informatio n retrieval applications, where morphological analysi would be too computationally expensive. For spellin g checkers specifically, using stemming may drastical ly reduce the dictionary size, often a bottleneck for mobile and embedded devices. This paper presents a computationally inexpensive stemming algorithm for Bengali, which handles suffix removal in a domain independent way. The evaluation of the proposed algorithm in a Bengali spelling checker indicates t hat it can be effectively used in information retrieval applications in general.}}
• M. Z. Islam and M. Khan, “Bangla Verb Morphology and a Multilingual Computational Morphology FrameWork for PC-KIMMO,” in The Proceedings of Workshop on Morpho – Syntactic Analysis by the School of Asian Applied Natural Language Processing for Language Diversity and Language Resource Development (ADD), Bangkok, Thailand, 2007.
[BibTeX]

@INPROCEEDINGS{Zahurul:Khan:2007,
owner={zahurul},
booktitle={The Proceedings of Workshop on Morpho - Syntactic Analysis by the School of Asian Applied Natural Language Processing for Language Diversity and Language Resource Development (ADD), Bangkok, Thailand},
author={Islam, Md. Zahurul and Khan, Mumit},
timestamp={2011.08.02},
year={2007},
title={Bangla Verb Morphology and a Multilingual Computational Morphology FrameWork for PC-KIMMO},
pdf={https://hucompute.org/wp-content/uploads/2015/08/Bangla_Verb_Morphology_and_a_Multilingual_Computational_Morphology_FrameWork_for_PC-KIMMO-talk.pdf}}
• A. Mehler, P. Geibel, and O. Abramov, “Structural Classifiers of Text Types: Towards a Novel Model of Text Representation,” Journal for Language Technology and Computational Linguistics (JLCL), vol. 22, iss. 2, pp. 51-66, 2007.
[Abstract] [BibTeX]

Texts can be distinguished in terms of their content, function, structure or layout (Brinker, 1992; Bateman et al., 2001; Joachims, 2002; Power et al., 2003). These reference points do not open necessarily orthogonal perspectives on text classification. As part of explorative data analysis, text classification aims at automatically dividing sets of textual objects into classes of maximum internal homogeneity and external heterogeneity. This paper deals with classifying texts into text types whose instances serve more or less homogeneous functions. Other than mainstream approaches, which rely on the vector space model (Sebastiani, 2002) or some of its descendants (Baeza-Yates and Ribeiro-Neto, 1999) and, thus, on content-related lexical features, we solely refer to structural differentiae. That is, we explore patterns of text structure as determinants of class membership. Our starting point are tree-like text representations which induce feature vectors and tree kernels. These kernels are utilized in supervised learning based on cross-validation as a method of model selection (Hastie et al., 2001) by example of a corpus of press communication. For a subset of categories we show that classification can be performed very well by structural differentia only.
@ARTICLE{Mehler:Geibel:Pustylnikov:2007,
journal={Journal for Language Technology and Computational Linguistics (JLCL)},
pages={51-66},
number={2},
author={Mehler, Alexander and Geibel, Peter and Abramov, Olga},
volume={22},
year={2007},
title={Structural Classifiers of Text Types: Towards a Novel Model of Text Representation},
website={http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.154.604},
abstract={Texts can be distinguished in terms of their content, function, structure or layout (Brinker, 1992; Bateman et al., 2001; Joachims, 2002; Power et al., 2003). These reference points do not open necessarily orthogonal perspectives on text classification. As part of explorative data analysis, text classification aims at automatically dividing sets of textual objects into classes of maximum internal homogeneity and external heterogeneity. This paper deals with classifying texts into text types whose instances serve more or less homogeneous functions. Other than mainstream approaches, which rely on the vector space model (Sebastiani, 2002) or some of its descendants (Baeza-Yates and Ribeiro-Neto, 1999) and, thus, on content-related lexical features, we solely refer to structural differentiae. That is, we explore patterns of text structure as determinants of class membership. Our starting point are tree-like text representations which induce feature vectors and tree kernels. These kernels are utilized in supervised learning based on cross-validation as a method of model selection (Hastie et al., 2001) by example of a corpus of press communication. For a subset of categories we show that classification can be performed very well by structural differentia only.}}
• O. Abramov and A. Mehler, “Structural Differentiae of Text Types. A Quantitative Model,” in Proceedings of the 31st Annual Conference of the German Classification Society on Data Analysis, Machine Learning, and Applications (GfKl), 2007, p. 655–662.
[BibTeX]

@INPROCEEDINGS{Abramov:Mehler:2007:b,
booktitle={Proceedings of the 31st Annual Conference of the German Classification Society on Data Analysis, Machine Learning, and Applications (GfKl)},
pages={655–662},
author={Abramov, Olga and Mehler, Alexander},
year={2007},
title={Structural Differentiae of Text Types. A Quantitative Model},
pdf={http://wwwhomes.uni-bielefeld.de/opustylnikov/pustylnikov/pdfs/gfkl.pdf},
website={http://www.springerprofessional.de/077---structural-differentiae-of-text-types--a-quantitative-model/1957362.html}}
• A. Mehler and R. Köhler, Aspects of Automatic Text Analysis: Festschrift in Honor of Burghard Rieger, A. Mehler and R. Köhler, Eds., Berlin/New York: Springer, 2007.
[BibTeX]

@BOOK{Mehler:Koehler:2007:a,
publisher={Springer},
website={http://www.springer.com/de/book/9783540375203},
review={http://www.degruyter.com/view/j/zrs.2011.3.issue-2/zrs.2011.050/zrs.2011.050.xml},
review2={http://irsg.bcs.org/informer/Informer27.pdf},
series={Studies in Fuzziness and Soft Computing},
editor={Mehler, Alexander and Köhler, Reinhard},
author={Mehler, Alexander and Köhler, Reinhard},
year={2007},
pagetotal={464},
title={Aspects of Automatic Text Analysis: Festschrift in Honor of Burghard Rieger},
address={Berlin/New York}}
• A. Mehler and A. Storrer, “What are Ontologies Good For? Evaluating Terminological Ontologies in the Framework of Text Graph Classification,” in Proceedings of OTT ’06 – Ontologies in Text Technology: Approaches to Extract Semantic Knowledge from Structured Information, Osnabrück, 2007, pp. 11-18.
[BibTeX]

@INPROCEEDINGS{Mehler:Storrer:2007,
booktitle={Proceedings of OTT '06 – Ontologies in Text Technology: Approaches to Extract Semantic Knowledge from Structured Information},
pages={11-18},
author={Mehler, Alexander and Storrer, Angelika},
series={Publications of the Institute of Cognitive Science (PICS)},
editor={Mönnich, Uwe and Kühnberger, Kai-Uwe},
year={2007},
title={What are Ontologies Good For? Evaluating Terminological Ontologies in the Framework of Text Graph Classification},
pdf={http://cogsci.uni-osnabrueck.de/~ott06/ott06-abstracts/Mehler_Storrer_abstract.pdf},
website={http://citeseer.uark.edu:8080/citeseerx/viewdoc/summary?doi=10.1.1.91.2979},
address={Osnabrück}}
• M. Stührenberg, D. Goecke, N. Diewald, A. Mehler, and I. Cramer, “Web-based Annotation of Anaphoric Relations and Lexical Chains,” in Proceedings of the Linguistic Annotation Workshop, ACL 2007, 2007, p. 140–147.
[BibTeX]

@INPROCEEDINGS{Stuehrenberg:Goecke:Diewald:Mehler:Cramer:2007:a,
booktitle={Proceedings of the Linguistic Annotation Workshop, ACL 2007},
pages={140–147},
author={Stührenberg, Maik and Goecke, Daniela and Diewald, Nils and Mehler, Alexander and Cramer, Irene},
year={2007},
title={Web-based Annotation of Anaphoric Relations and Lexical Chains},
pdf={http://www.aclweb.org/anthology/W07-1523},
website={https://www.researchgate.net/publication/234800610_Web-based_annotation_of_anaphoric_relations_and_lexical_chains}}
• R. Ferrer i Cancho, A. Mehler, O. Abramov, and A. Díaz-Guilera, “Correlations in the organization of large-scale syntactic dependency networks,” in Proceedings of Graph-based Methods for Natural Language Processing (TextGraphs-2) at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2007), Rochester, New York, 2007, pp. 65-72.
[BibTeX]

@INPROCEEDINGS{Ferrer:i:Cancho:Mehler:Pustylnikov:Diaz-Guilera:2007:a,
booktitle={Proceedings of Graph-based Methods for Natural Language Processing (TextGraphs-2) at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2007), Rochester, New York},
pages={65-72},
author={Ferrer i Cancho, Ramon and Mehler, Alexander and Abramov, Olga and Díaz-Guilera, Albert},
year={2007},
title={Correlations in the organization of large-scale syntactic dependency networks}}
• R. Gleim, A. Mehler, H. Eikmeyer, and H. Rieser, “Ein Ansatz zur Repräsentation und Verarbeitung großer Korpora multimodaler Daten,” in Data Structures for Linguistic Resources and Applications. Proceedings of the Biennial GLDV Conference 2007, 11.–13. April, Universität Tübingen, Tübingen, 2007, pp. 275-284.
[BibTeX]

@INPROCEEDINGS{Gleim:Mehler:Eikmeyer:Rieser:2007,
publisher={Narr},
booktitle={Data Structures for Linguistic Resources and Applications. Proceedings of the Biennial GLDV Conference 2007, 11.–13. April, Universit{\"a}t Tübingen},
pages={275-284},
author={Gleim, Rüdiger and Mehler, Alexander and Eikmeyer, Hans-Jürgen and Rieser, Hannes},
editor={Rehm, Georg and Witt, Andreas and Lemnitzer, Lothar},
year={2007},
title={Ein Ansatz zur Repr{\"a}sentation und Verarbeitung gro{\ss}er Korpora multimodaler Daten},
address={Tübingen}}
• A. Mehler, “Aspectos Metodológicos da Semiótica Computacional,” in Computação, Cognição e Semiose, J. Queiroz, R. Gudwin, and A. Loula, Eds., Federal University of Bahia: EDUFBA, 2007, pp. 145-157.
[BibTeX]

@INCOLLECTION{Mehler:2004:2007,
publisher={EDUFBA},
booktitle={Computação, Cognição e Semiose},
pages={145-157},
author={Mehler, Alexander},
editor={Queiroz, João and Gudwin, Ricardo and Loula, Angelo},
year={2007},
title={Aspectos Metodológicos da Semiótica Computacional},
address={Federal University of Bahia}}
• A. Mehler, “Compositionality in Quantitative Semantics. A Theoretical Perspective on Text Mining,” in Aspects of Automatic Text Analysis, A. Mehler and R. Köhler, Eds., Berlin/New York: Springer, 2007, pp. 139-167.
[Abstract] [BibTeX]

This chapter introduces a variant of the principle of compositionality in quantitative text semantics as an alternative to the bag-of-features approach. The variant includes effects of context-sensitive interpretation as well as processes of meaning constitution and change in the sense of usage-based semantics. Its starting point is a combination of semantic space modeling and text structure analysis. The principle is implemented by means of a hierarchical constraint satisfaction process which utilizes the notion of hierarchical text structure superimposed by graph-inducing coherence relations. The major contribution of the chapter is a conceptualization and formalization of the principle of compositionality in terms of semantic spaces which tackles some well known deficits of existing approaches. In particular this relates to the missing linguistic interpretability of statistical meaning representations. 
@INCOLLECTION{Mehler:2007:b,
publisher={Springer},
booktitle={Aspects of Automatic Text Analysis},
pages={139-167},
author={Mehler, Alexander},
series={Studies in Fuzziness and Soft Computing},
editor={Mehler, Alexander and Köhler, Reinhard},
year={2007},
title={Compositionality in Quantitative Semantics. A Theoretical Perspective on Text Mining},
abstract={This chapter introduces a variant of the principle of compositionality in quantitative text semantics as an alternative to the bag-of-features approach. The variant includes effects of context-sensitive interpretation as well as processes of meaning constitution and change in the sense of usage-based semantics. Its starting point is a combination of semantic space modeling and text structure analysis. The principle is implemented by means of a hierarchical constraint satisfaction process which utilizes the notion of hierarchical text structure superimposed by graph-inducing coherence relations. The major contribution of the chapter is a conceptualization and formalization of the principle of compositionality in terms of semantic spaces which tackles some well known deficits of existing approaches. In particular this relates to the missing linguistic interpretability of statistical meaning representations. }}
• M. Dehmer and A. Mehler, “A New Method of Measuring the Similarity for a Special Class of Directed Graphs,” Tatra Mountains Mathematical Publications, vol. 36, pp. 39-59, 2007.
[Abstract] [BibTeX]

The problem of graph similarity is challenging and important in many areas of science, e.g., mathematics Sobik, F.: Graphmetriken und Klassifikation strukturierter Objekte, ZKI-Informationen, Akad. Wiss. DDR, 2, (1982), 63122]] Zelinka, B.: On a certain distance between isomorphism classes of graphs, Cas. Pest. Mat., 100, (1975), 371373], biology Koch, I., Lengauer, T.. Wanke, E.: An algorithm for nding maximal common subtopologies in a set of protein structures, J. Comput. Biology, 3, (1996), 289306], and chemistry Skvortsova, M. I., Baskin, I. I., Stankevich, I. V., Palyulin, V. A., Zerirow, N. S.: Molecular similarity in structure-property relationship studies. Analytical description of the complete set of graph similarity measures, International Symposium CACR'96, (1996) pp. 542646]. In this paper, we design a new method, which uses sequence alignment techniques Altschul,.: Gapped BLAST and PSI BLAST: a new generation of protein database search programs, Nucleic Acids Res. 25, (1997), 33893402], Kilian, J., Hoos, H. H.: MusicBLASTgapped sequence alignment for MIR, in: Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR), (2004)], to measure the structural similarity of unlabeled, hierarchical, and directed graphs. More precisely, i f h denotes the maximal length of a path from the root to a leaf of a given hierarchical and directed graph ^ H, w e align out-degree and in-degree sequences induced by the vertex sequences on a level i , 0 6 i 6 h . On the basis of the level alignments, we construct measured values and prove that they are similarity measures. In our algorithm, which uses the well-known technique of dynamic programming, the alignments of out-degree and in-degree sequences are decoupled. Therefore, we obtain a family (d i (^ H 1 ^ H 2)) 16i63 of graph similarity measures. As an applica-tion, we examine the measures on a graph corpus of 464 graphs, where the graphs represent web-based hypertext structures (websites). 2000 Mathematics Subject Classification: Primary 05C75, 05C20, 68R155 Secondary 90C39, 68R10, 91B82. Keywords: digraphs, similarity measures, sequence alignments, degree sequences.,     website=https://www.researchgate.net/publication/228905939_A_new_method_of_measuring_similarity_for_a_special_class_of_directed_graphs    @INPROCEEDINGSMehler:2006:a,     booktitle=Proceedings of the 2006 International Conference on Bioinformatics & Computational Biology (BIOCOMP '06), June 26, 2006, Las Vegas, USA,     pages=496-500,     author=Mehler, Alexander,     editor=Arabnia, Hamid R. and Valafar, Homayoun,     year=2006,     title=In Search of a Bridge between Network Analysis in Computational Linguistics and Computational Biology – A Conceptual Note,     pdf=http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.92.9842&rep=rep1&type=pdf    @INPROCEEDINGSPustylnikov:Mehler:2008:a,     booktitle=Proceedings of First International Conference on Global Interoperability for Language Resources (ICGL 2008), Hong Kong SAR, January 9-11,     author=Abramov, Olga and Mehler, Alexander,     year=2008,     title=Towards a Uniform Representation of Treebanks: Providing Interoperability for Dependency Tree Data,     website=https://www.researchgate.net/publication/242681771_Towards_a_Uniform_Representation_of_Treebanks_Providing_Interoperability_for_Dependency_Tree_Data,     pdf=http://wwwhomes.uni-bielefeld.de/opustylnikov/pustylnikov/pdfs/acl07.1.0.pdf,     abstract=In this paper we present a corpus representation format which unifies the representation of a wide range of dependency treebanks within a single model. This approach provides interoperability and reusability of annotated syntactic data which in turn extends its applicability within various research contexts. We demonstrate our approach by means of dependency treebanks of 11 languages. Further, we perform a comparative quantitative analysis of these treebanks in order to demonstrate the interoperability of our approach.     @INPROCEEDINGSRehm:Santini:Mehler:Braslavski:Gleim:Stubbe:Symonenko:Tavosanis:Vidulin:2008,     pdf=https://hucompute.org/wp-content/uploads/2015/08/rehm_santini_mehler_braslavski_gleim_stubbe_symonenko_tavosanis_vidulin_2008.pdf,     booktitle=Proceedings of the 6th Language Resources and Evaluation Conference (LREC 2008), Marrakech (Morocco),     author=Rehm, Georg and Santini, Marina and Mehler, Alexander and Braslavski, Pavel and Gleim, Rüdiger and Stubbe, Andrea and Symonenko, Svetlana and Tavosanis, Mirko and Vidulin, Vedrana,     year=2008,     title=Towards a Reference Corpus of Web Genres for the Evaluation of Genre Identification Systems,     website=http://www.lrec-conf.org/proceedings/lrec2008/summaries/94.html,     abstract=We present initial results from an international and multi-disciplinary research collaboration that aims at the construction of a reference corpus of web genres. The primary application scenario for which we plan to build this resource is the automatic identification of web genres. Web genres are rather difficult to capture and to describe in their entirety, but we plan for the finished reference corpus to contain multi-level tags of the respective genre or genres a web document or a website instantiates. As the construction of such a corpus is by no means a trivial task, we discuss several alternatives that are, for the time being, mostly based on existing collections. Furthermore, we discuss a shared set of genre categories and a multi-purpose tool as two additional prerequisites for a reference corpus of web genres.     @INPROCEEDINGSGeibel:Krumnack:Pustylnikov:Mehler:Gust:Kuehnberger:2007,     publisher=Springer,     booktitle=Proceedings of AI 2007: Advances in Artificial Intelligence, 20th Australian Joint Conference on Artificial Intelligence, Gold Coast, Australia, December 2-6, 2007,     website=http://www.springerlink.com/content/w574377ww1h6m212/,     pages=642-646,     author=Geibel, Peter and Krumnack, Ulf and Abramov, Olga and Mehler, Alexander and Gust, Helmar and Kühnberger, Kai-Uwe,     series=Lecture Notes in Computer Science,     volume=4830,     editor=Orgun, Mehmet A. and Thornton, John,     year=2007,     title=Structure-Sensitive Learning of Text Types,     abstract=In this paper, we discuss the structure based classification of documents based on their logical document structure, i.e., their DOM trees. We describe a method using predefined structural features and also four tree kernels suitable for such structures. We evaluate the methods experimentally on a corpus containing the DOM trees of newspaper articles, and on the well-known SUSANNE corpus. We will demonstrate that, for the two corpora, many text types can be learned based on structural features only. @ARTICLEMehler:2005:a,     journal=Sprache und Datenverarbeitung. International Journal for Language Data Processing,     pages=29-53,     author=Mehler, Alexander,     publisher=GSCL,     volume=1,     year=2005,     title=Zur textlinguistischen Fundierung der Text- und Korpuskonversion,     abstract=Die automatische Konversion von Texten in Hypertexte ist mit der Erwartung verbunden, computerbasierte Rezeptionshilfen zu gewinnen. Dies betrifft insbesondere die Bewältigung der ungeheuren Menge an Fachliteratur im Rahmen der Wissenschaftskommunikation. Von einem thematisch relevanten Text zu einem thematisch verwandten Text per Hyperlink direkt gelangen zu können, stellt einen Anspruch dar, dessen Erfüllung mittels digitaler Bibliotheken näher gerückt zu sein scheint. Doch wie lassen sich die Kriterien, nach denen Texte automatisch verlinkt werden, genauer begründen? Dieser Beitrag geht dieser Frage aus der Sicht textlinguistischer Modellbildungen nach. Er zeigt, dass parallel zur Entwicklung der Textlinguistik, wenn auch mit einer gewissen Verzögerung, Konversionsansätze entwickelt wurden, die sich jeweils an einer bestimmten Stufe des Textbegriffs orientieren. Der Beitrag weist nicht nur das diesen Ansätzen gemeinsame Fundament in Form der so genannten Explikationshypothese nach, sondern verweist zugleich auf grundlegende Automatisierungsdefizite, die mit ihnen verbunden sind. Mit systemisch-funktionalen Hypertexten wird schließlich ein Ansatz skizziert, der darauf zielt, den Anspruch nach textlinguistischer Fundierung und Automatisierbarkeit zu vereinen.    @INPROCEEDINGSWaltinger:Mehler:2009:c,     booktitle=IEEE/WIC/ACM International Conference on Web Intelligence, September 15–18, Milano,     author=Waltinger, Ulli and Mehler, Alexander,     website=http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5284920&abstractAccess=no&userType=inst,     year=2009,     title=Social Semantics and Its Evaluation By Means Of Semantic Relatedness And Open Topic Models,     abstract=This paper presents an approach using social semantics for the task of topic labelling by means of Open Topic Models. Our approach utilizes a social ontology to create an alignment of documents within a social network. Comprised category information is used to compute a topic generalization. We propose a feature-frequency-based method for measuring semantic relatedness which is needed in order to reduce the number of document features for the task of topic labelling. This method is evaluated against multiple human judgement experiments comprising two languages and three different resources. Overall the results show that social ontologies provide a rich source of terminological knowledge. The performance of the semantic relatedness measure with correlation values of up to .77 are quite promising. Results on the topic labelling experiment show, with an accuracy of up to .79, that our approach can be a valuable method for various NLP applications.    @INPROCEEDINGSStuehrenberg:Beisswenger:Kuehnberger:Mehler:Luengen:Metzing:Moennich:2008,     booktitle=Proceedings of the Post LREC-2008 Workshop: Sustainability of Language Resources and Tools for Natural Language Processing Marrakech, Morocco,     author=Stührenberg, Maik and Beißwenger, Michael and Kühnberger, Kai-Uwe and Mehler, Alexander and Lüngen, Harald and Metzing, Dieter and Mönnich, Uwe,     year=2008,     title=Sustainability of Text-Technological Resources,     pdf=http://www.michael-beisswenger.de/pub/lrec-sustainability.pdf,     abstract=We consider that there are obvious relationships between research on sustainability of language and linguistic resources on the one hand and work undertaken in the Research Unit 'Text-Technological Modelling of Information' on the other. Currently the main focus in sustainability research is concerned with archiving methods of textual resources, i.e. methods for sustainability of primary and secondary data; these aspects are addressed in our work as well. However, we believe that there are additional certain aspects of sustainability on which new light is shed on by procedures, algorithms and dynamic processes undertaken in our Research Unit    @ARTICLEMehler:2002:l,     journal=Journal of Universal Computer Science (J.UCS),     pages=924-943,     number=10,     author=Mehler, Alexander,     volume=8,     year=2002,     title=Components of a Model of Context-Sensitive Hypertexts,     website=http://www.jucs.org/jucs_8_10/components_of_a_model,     pdf=https://hucompute.org/wp-content/uploads/2015/08/mehler_components_2002.pdf,     abstract=On the background of rising Intranet applications the automatic generation of adaptable, context-sensitive hypertexts becomes more and more important [El-Beltagy et al., 2001]. This observation contradicts the literature on hypertext authoring, where Information Retrieval techniques prevail, which disregard any linguistic and context-theoretical underpinning. As a consequence, resulting hypertexts do not manifest those schematic structures, which are constitutive for the emergence of text types and the context-mediated understanding of their instances, i.e. natural language texts. This paper utilizes Systemic Functional Linguistics (SFL) and its context model as a theoretical basis of hypertext authoring. So called Systemic Functional Hypertexts (SFHT) are proposed, which refer to a stratified context layer as the proper source of text linkage. The purpose of this paper is twofold: First, hypertexts are reconstructed from a linguistic point of view as a kind of supersign, whose constituents are natural language texts and whose structuring is due to intra- and intertextual coherence relations and their context-sensitive interpretation. Second, the paper prepares a formal notion of SFHTs as a first step towards operationalization of fundamental text linguistic concepts. On this background, SFHTs serve to overcome the theoretical poverty of many approaches to link generation.    @INPROCEEDINGSGleim:Mehler:Dehmer:Abramov:2007,     booktitle=3rd International Conference on Web Information Systems and Technologies (WEBIST '07), March 3-6, 2007, Barcelona,     pages=142-149,     author=Gleim, Rüdiger and Mehler, Alexander and Dehmer, Matthias and Abramov, Olga,     editor=Filipe, Joaquim and Cordeiro, José and Encarnação, Bruno and Pedrosa, Vitor,     year=2007,     title=Aisles through the Category Forest – Utilising the Wikipedia Category System for Corpus Building in Machine Learning,     address=Barcelona,     abstract=The Word Wide Web is a continuous challenge to machine learning. Established approaches have to be enhanced and new methods be developed in order to tackle the problem of finding and organising relevant information. It has often been motivated that semantic classifications of input documents help solving this task. But while approaches of supervised text categorisation perform quite well on genres found in written text, newly evolved genres on the web are much more demanding. In order to successfully develop approaches to web mining, respective corpora are needed. However, the composition of genre- or domain-specific web corpora is still an unsolved problem. It is time consuming to build large corpora of good quality because web pages typically lack reliable meta information. Wikipedia along with similar approaches of collaborative text production offers a way out of this dilemma. We examine how social tagging, as supported by the MediaWiki software, can be utilised as a source of corpus building. Further, we describe a representation format for social ontologies and present the Wikipedia Category Explorer, a tool which supports categorical views to browse through the Wikipedia and to construct domain specific corpora for machine learning.,     pdf=https://hucompute.org/wp-content/uploads/2016/10/webist_2007-gleim_mehler_dehmer_pustylnikov.pdf    @INCOLLECTIONMehler:Job:Blanchard:Eikmeyer:2008,     publisher=VS,     booktitle=Netzwerkanalyse und Netzwerktheorie,     pages=413-427,     author=Mehler, Alexander and Job, Barbara and Blanchard, Philippe and Eikmeyer, Hans-Jürgen,     editor=Stegbauer, Christian,     year=2008,     title=Sprachliche Netzwerke,     address=Wiesbaden,     abstract=In diesem Kapitel beschreiben wir so genannte sprachliche Netzwerke. Dabei handelt es sich um Netzwerke sprachlicher Einheiten, die in Zusammenhang mit ihrer Einbettung in das Netzwerk jener Sprachgemeinschaft analysiert werden, welche diese Einheiten und deren Vernetzung hervorgebracht hat. Wir erörtern ein Dreistufenmodell zur Analyse solcher Netzwerke und exemplifizieren dieses Modell anhand mehrerer Spezialwikis. Ein Hauptaugenmerk des Kapitels liegt dabei auf einem Mehrebenennetzwerkmodell, und zwar in Abkehr von den unipartiten Graphmodellen der Theorie komplexer Netzwerke.    @INPROCEEDINGSPustylnikov:Mehler:Gleim:2008,     booktitle=Proceedings of the 6th Language Resources and Evaluation Conference (LREC 2008), Marrakech (Morocco),     author=Abramov, Olga and Mehler, Alexander and Gleim, Rüdiger,     year=2008,     title=A Unified Database of Dependency Treebanks. Integrating, Quantifying and Evaluating Dependency Data,     pdf=http://wwwhomes.uni-bielefeld.de/opustylnikov/pustylnikov/pdfs/LREC08_full.pdf,     abstract=This paper describes a database of 11 dependency treebanks which were unified by means of a two-dimensional graph format. The format was evaluated with respect to storage-complexity on the one hand, and efficiency of data access on the other hand. An example of how the treebanks can be integrated within a unique interface is given by means of the DTDB interface.     @INPROCEEDINGSMehler:Gleim:Wegner:2007,     booktitle=Proceedings of the Workshop "Towards Genre-Enabled Search Engines: The Impact of NLP", September, 30, 2007, in conjunction with RANLP 2007, Borovets, Bulgaria,     pages=13-19,     author=Mehler, Alexander and Gleim, Rüdiger and Wegner, Armin,     editor=Rehm, Georg and Santini, Marina,     year=2007,     title=Structural Uncertainty of Hypertext Types. An Empirical Study,     pdf=https://hucompute.org/wp-content/uploads/2015/08/RANLP.pdf    @INPROCEEDINGSMehler:Gleim:Dehmer:2006,     publisher=Springer,     booktitle=Proceedings of the 29th Annual Conference of the German Classification Society, March 9-11, 2005, Universität Magdeburg,     website=http://www.springerlink.com/content/l7665tm3u241317l/,     pages=406-413,     author=Mehler, Alexander and Gleim, Rüdiger and Dehmer, Matthias,     editor=Spiliopoulou, Myra and Kruse, Rudolf and Borgelt, Christian and Nürnberger, Andreas and Gaul, Wolfgang,     year=2006,     title=Towards Structure-Sensitive Hypertext Categorization,     address=Berlin/New York,     abstract=Hypertext categorization is the task of automatically assigning category labels to hypertext units. Comparable to text categorization it stays in the area of function learning based on the bag-of-features approach. This scenario faces the problem of a many-to-many relation between websites and their hidden logical document structure. The paper argues that this relation is a prevalent characteristic which interferes any effort of applying the classical apparatus of categorization to web genres. This is confirmed by a threefold experiment in hypertext categorization. In order to outline a solution to this problem, the paper sketches an alternative method of unsupervised learning which aims at bridging the gap between statistical and structural pattern recognition (Bunke et al. 2001) in the area of web mining.    @INCOLLECTIONMehler:2010:a,     publisher=Birkhäuser Publishing,     booktitle=Structural Analysis of Complex Networks,     author=Mehler, Alexander,     editor=Dehmer, Matthias,     pages=381-401,     year=2010,     title=Minimum Spanning Markovian Trees: Introducing Context-Sensitivity into the Generation of Spanning Trees,     address=Basel,     abstract=This chapter introduces a novel class of graphs: Minimum Spanning Markovian Trees (MSMTs). The idea behind MSMTs is to provide spanning trees that minimize the costs of edge traversals in a Markovian manner, that is, in terms of the path starting with the root of the tree and ending at the vertex under consideration. In a second part, the chapter generalizes this class of spanning trees in order to allow for damped Markovian effects in the course of spanning. These two effects, (1) the sensitivity to the contexts generated by consecutive edges and (2) the decreasing impact of more antecedent (or 'weakly remembered') vertices, are well known in cognitive modeling [6, 10, 21, 23]. In this sense, the chapter can also be read as an effort to introduce a graph model to support the simulation of cognitive systems. Note that MSMTs are not to be confused with branching Markov chains or Markov trees [20] as we focus on generating spanning trees from given weighted undirected networks.,     website=https://www.researchgate.net/publication/226700676_Minimum_Spanning_Markovian_Trees_Introducing_Context-Sensitivity_into_the_Generation_of_Spanning_Trees    @INCOLLECTIONMehler:2006:d,     publisher=De Gruyter,     booktitle=Exact Methods in the Study of Language and Text,     pages=437-446,     author=Mehler, Alexander,     series=Quantitative Linguistics,     editor=Grzybek, Peter and Köhler, Reinhard,     year=2006,     title=A Network Perspective on Intertextuality,     address=Berlin/New York    @INCOLLECTIONSantini:Mehler:Sharoff:2009,     publisher=Springer,     booktitle=Genres on the Web: Computational Models and Empirical Studies,     crossref=Genres on the Web: Computational Models and Empirical Studies,     pages=3-32,     author=Santini, Marina and Mehler, Alexander and Sharoff, Serge,     editor=Mehler, Alexander and Sharoff, Serge and Santini, Marina,     year=2009,     title=Riding the Rough Waves of Genre on the Web: Concepts and Research Questions,     address=Berlin/New York,     abstract=This chapter outlines the state of the art of empirical and computational webgenre research. First, it highlights why the concept of genre is profitable for a range of disciplines. At the same time, it lists a number of recent interpretations that can inform and influence present and future genre research. Last but not least, it breaks down a series of open issues that relate to the modelling of the concept of webgenre in empirical and computational studies.    @ARTICLEMehler:2008:a,     pdf=https://hucompute.org/wp-content/uploads/2016/10/mehler_2008_Structural_Similarities_of_Complex_Networks.pdf,     journal=Applied Artificial Intelligence,     pages=619–683,     number=7&8,     author=Mehler, Alexander,     volume=22,     doi=10.1080/08839510802164085,     year=2008,     title=Structural Similarities of Complex Networks: A Computational Model by Example of Wiki Graphs,     website=https://www.researchgate.net/publication/200772675_Structural_similarities_of_complex_networks_A_computational_model_by_example_of_wiki_graphs,     abstract=This article elaborates a framework for representing and classifying large complex networks by example of wiki graphs. By means of this framework we reliably measure the similarity of document, agent, and word networks by solely regarding their topology. In doing so, the article departs from classical approaches to complex network theory which focuses on topological characteristics in order to check their small world property. This does not only include characteristics that have been studied in complex network theory, but also some of those which were invented in social network analysis and hypertext theory. We show that network classifications come into reach which go beyond the hypertext structures traditionally analyzed in web mining. The reason is that we focus on networks as a whole as units to be classified—above the level of websites and their constitutive pages. As a consequence, we bridge classical approaches to text and web mining on the one hand and complex network theory on the other hand. Last but not least, this approach also provides a framework for quantifying the linguistic notion of intertextuality.    @BOOKLuengen:Mehler:Storrer:2008:a,     series=Journal for Language Technology and Computational Linguistics (JLCL),     volume=23(2),     editor=Lüngen, Harald and Mehler, Alexander and Storrer, Angelika,     author=Mehler, Alexander,     publisher=GSCL,     year=2008,     pagetotal=111,     image=https://hucompute.org/wp-content/uploads/2015/09/LexicalSemanticResources-300-20.png,     title=Lexical-Semantic Resources in Automated Discourse Analysis,     pdf=http://www.jlcl.org/2008_Heft2/JLCL23(2).pdf,     website=https://www.researchgate.net/publication/228956889_Lexical-Semantic_Resources_in_Automated_Discourse_Analysis    @INPROCEEDINGSGleim:Mehler:2010:b,     publisher=ELDA,     pdf=https://hucompute.org/wp-content/uploads/2015/08/gleim_mehler_2010.pdf,     booktitle=Proceedings of LREC 2010,     author=Gleim, Rüdiger and Mehler, Alexander,     year=2010,     title=Computational Linguistics for Mere Mortals – Powerful but Easy-to-use Linguistic Processing for Scientists in the Humanities,     address=Malta,     abstract=Delivering linguistic resources and easy-to-use methods to a broad public in the humanities is a challenging task. On the one hand users rightly demand easy to use interfaces but on the other hand want to have access to the full flexibility and power of the functions being offered. Even though a growing number of excellent systems exist which offer convenient means to use linguistic resources and methods, they usually focus on a specific domain, as for example corpus exploration or text categorization. Architectures which address a broad scope of applications are still rare. This article introduces the eHumanities Desktop, an online system for corpus management, processing and analysis which aims at bridging the gap between powerful command line tools and intuitive user interfaces.          @INPROCEEDINGSDehmer:Emmert:Streib:Mehler:Kilian:Muehlhaeuser:2005,     booktitle=Proceedings of VI. International Conference on Enformatika, Systems Sciences and Engineering, Budapest, Hungary, October 2005, International Academy of Sciences: Enformatika 8 (2005),     pages=77-81,     author=Dehmer, Matthias and Emmert-Streib, Frank and Mehler, Alexander and Kilian, Jürgen and Mühlhäuser, Max,     year=2005,     title=Application of a similarity measure for graphs to web-based document structures,     pdf=http://waset.org/publications/15299/application-of-a-similarity-measure-for-graphs-to-web-based-document-structures,     website=https://www.researchgate.net/publication/238687277_Application_of_a_Similarity_Measure_for_Graphs_to_Web-based_Document_Structures,     abstract=Due to the tremendous amount of information provided by the World Wide Web (WWW) developing methods for mining the structure of web-based documents is of considerable interest. In this paper we present a similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as linear integer strings, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments for solving a novel and challenging problem: Measuring the structural similarity of generalized trees. In other words: We first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem for developing a efficient graph similarity measure. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based document structures.    @INPROCEEDINGSMehler:Clarke:2002,     booktitle=New Directions in Humanities Computing. The 14th Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities (ALLC/ACH '02), July 24-28, University of Tübingen,     pages=68-69,     author=Mehler, Alexander and Clarke, Rodney,     year=2002,     title=Systemic Functional Hypertexts. An Architecture for Socialsemiotic Hypertext Systems    @INCOLLECTIONMehler:2004:h,     publisher=Stauffenburg,     booktitle=Texttechnologie. Perspektiven und Anwendungen,     pages=329-352,     author=Mehler, Alexander,     editor=Lobin, Henning and Lemnitzer, Lothar,     year=2004,     title=Textmining,     address=Tübingen    @INPROCEEDINGSMehler:2002:e,     publisher=Springer,     booktitle=Classification, Automation, and New Media. Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation, March 15-17, 2000, Universität Passau,     website=http://www.springerlink.com/content/x484814744877078/,     pages=199-206,     author=Mehler, Alexander,     editor=Gaul, Wolfgang and Ritter, Gunter,     year=2002,     title=Text Mining with the Help of Cohesion Trees,     address=Berlin/New York,     abstract=In the framework of automatic text processing, semantic spaces are used as a format for modeling similarities of natural language texts represented as vectors. They prove to be efficient in divergent areas, as information retrieval (Dumais 1995), computational psychology (Landauer, Dumais 1997), and computational linguistics (Rieger 1995; Mehler 1998). In order to group semantically similar texts, cluster analysis is used. A central problem of this method relates to the difficulty to name clusters, whereas lists neglect the polyhierarchical structure of semantic spaces. This paper introduces the concept of cohesion tree as an alternative tool for exploring similarity relations of texts represented in high dimensional spaces. Cohesion trees allow the perspective evaluation of numerically represented text similarities. They depart from minimal spanning trees (MST) by context-sensitively optimizing path costs. This central property underlies the linguistic interpretation of cohesion trees: instead of manifesting context-free associations, they model context priming effects.    @INPROCEEDINGSMehler:2005:c,     booktitle=Learning and Extending Lexical Ontologies. Proceedings of the Workshop at the 22nd International Conference on Machine Learning (ICML '05), August 7-11, 2005, Universität Bonn, Germany,     pages=41-47,     author=Mehler, Alexander,     editor=Biemann, Chris and Paaß, Gerhard,     year=2005,     title=Preliminaries to an Algebraic Treatment of Lexical Associations    @INPROCEEDINGSMehler:Gleim:2005:a,     booktitle=Proceedings of Corpus Linguistics '05, July 14-17, 2005, University of Birmingham, Great Britian,     author=Mehler, Alexander and Gleim, Rüdiger,     volume=Corpus Linguistics Conference Series 1(1),     year=2005,     title=Polymorphism in Generic Web Units. A corpus linguistic study,     pdf=http://www.birmingham.ac.uk/Documents/college-artslaw/corpus/conference-archives/2005-journal/Thewebasacorpus/AlexanderMehlerandRuedigerGleimCorpusLinguistics2005.pdf,     issn=1747-9398,     abstract=Corpus linguistics and related disciplines which focus on statistical analyses of textual units have substantial need for large corpora. More speciﬁcally, genre or register speciﬁc corpora are needed which allow studying variations in language use. Along with the incredible growth of the internet, the web became an important source of linguistic data. Of course, web corpora face the same problem of acquiring genre speciﬁc corpora. Amongst other things, web mining is a framework of methods for automatically assigning category labels to web units and thus may be seen as a solution to this corpus acquisition problem as far as genre categories are applied. The paper argues that this approach is faced with the problem of a many-to-many relation between expression units on the one hand and content or function units on the other hand. A quantitative study is performed which supports the argumentation that functions of web-based communication are very often concentrated on single web pages and thus interfere any effort of directly applying the classical apparatus of categorization on web page level. The paper outlines a two-level algorithm as an alternative approach to category assignment which is sensitive to genre speciﬁc structures and thus may be used to tackle the problem of acquiring genre speciﬁc corpora.    @INPROCEEDINGSMehler:2007:d,     pdf=https://hucompute.org/wp-content/uploads/2015/08/mehler_2007_d.pdf,     booktitle=Proceedings of the Workshop on Language, Games, and Evolution at the 9th European Summer School in Logic, Language and Information (ESSLLI 2007), Trinity College, Dublin, 6-17 August,     pages=57-67,     author=Mehler, Alexander,     editor=Benz, Anton and Ebert, Christian and van Rooij, Robert,     year=2007,     title=Evolving Lexical Networks. A Simulation Model of Terminological Alignment,     abstract=In this paper we describe a simulation model of terminological alignment in a multiagent community. It is based on the notion of an association game which is used instead of the classical notion of a naming game (Steels, 1996). The simulation model integrates a small world-like agent community which restricts agent communication. We hypothesize that this restriction is decisive when it comes to simulate terminological alignment based on lexical priming. The paper presents preliminary experimental results in support of this hypothesis.    @INCOLLECTIONMehler:Lobin:2004:b,     publisher=Verlag für Sozialwissenschaften,     booktitle=Automatische Textanalyse: Systeme und Methoden zur Annotation und Analyse natürlichsprachlicher Texte,     pages=1-21,     author=Mehler, Alexander and Lobin, Henning,     editor=Mehler, Alexander and Lobin, Henning,     year=2004,     title=Aspekte der texttechnologischen Modellierung,     address=Wiesbaden    @ARTICLEDehmer:Emmert:Streib:Mehler:Kilian:2006,     journal=International Journal of Computational Intelligence,     pages=1-7,     number=1,     author=Dehmer, Matthias and Emmert-Streib, Frank and Mehler, Alexander and Kilian, Jürgen,     volume=3,     year=2006,     title=Measuring the Structural Similarity of Web-based Documents: A Novel Approach,     pdf=http://waset.org/publications/15928/measuring-the-structural-similarity-of-web-based-documents-a-novel-approach,     website=http://connection.ebscohost.com/c/articles/24839145/measuring-structural-similarity-web-based-documents-novel-approach,     abstract=Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees. We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents.    @ARTICLEMehler:1996:b,     journal=Journal of Quantitative Linguistics,     pages=113-127,     number=2,     author=Mehler, Alexander,     volume=3,     year=1996,     title=A Multiresolutional Approach to Fuzzy Text Meaning,     abstract=In diesem Beitrag beschreiben wir den eHumanities Desktop3. Es handelt sich dabei um eine rein webbasierte Umgebung für die texttechnologische Arbeit mit Korpora, welche von der standardisierten Repräsentation textueller Einheiten über deren computerlinguistische Vorverarbeitung bis hin zu Text Mining–Funktionalitäten eine große Zahl von Werkzeugen integriert. Diese Integrationsleistung betrifft neben den Textkorpora und den hierauf operierenden texttechnologischen Werkzeugen auch die je zum Einsatz kommenden lexikalischen Ressourcen. Aus dem Blickwinkel der geisteswissenschaftlichen Fachinformatik gesprochen fokussiert der Desktop somit darauf, eine Vielzahl heterogener sprachlicher Ressourcen mit grundlegenden texttechnologischen Methoden zu integrieren, und zwar so, dass das Integrationsresultat auch in den Händen von Nicht–Texttechnologen handhabbar bleibt. Wir exemplifizieren diese Handhabung an einem Beispiel aus der historischen Semantik, und damit an einem Bereich, der erst in jüngerer Zeit durch die Texttechnologie erschlossen wird.    @ARTICLEMehler:Weiss:Luecking:2010:a,     journal=Entropy,     author=Mehler, Alexander and Lücking, Andy and Weiß, Petra,     year=2010,     title=A Network Model of Interpersonal Alignment,     volume=12,     pages=1440-1483,     number=6,     doi=10.3390/e12061440,     website=http://www.mdpi.com/1099-4300/12/6/1440/,     pdf=http://www.mdpi.com/1099-4300/12/6/1440/pdf,     abstract=In dyadic communication, both interlocutors adapt to each other linguistically, that is, they align interpersonally. In this article, we develop a framework for modeling interpersonal alignment in terms of the structural similarity of the interlocutors’ dialog lexica. This is done by means of so-called two-layer time-aligned network series, that is, a time-adjusted graph model. The graph model is partitioned into two layers, so that the interlocutors’ lexica are captured as subgraphs of an encompassing dialog graph. Each constituent network of the series is updated utterance-wise. Thus, both the inherent bipartition of dyadic conversations and their gradual development are modeled. The notion of alignment is then operationalized within a quantitative model of structure formation based on the mutual information of the subgraphs that represent the interlocutor’s dialog lexica. By adapting and further developing several models of complex network theory, we show that dialog lexica evolve as a novel class of graphs that have not been considered before in the area of complex (linguistic) networks. Additionally, we show that our framework allows for classifying dialogs according to their alignment status. To the best of our knowledge, this is the first approach to measuring alignment in communication that explores the similarities of graph-like cognitive representations.    @ARTICLELuecking:Mehler:2011,     journal=International Journal of Signs and Semiotic Systems,     pdf=https://hucompute.org/wp-content/uploads/2015/08/luecking_mehler_article_IJSSS.pdf,     author=Lücking, Andy and Mehler, Alexander,     year=2011,     volume=1,     pages=18-38,     number=1,     title=A Model of Complexity Levels of Meaning Constitution in Simulation Models of Language Evolution,     abstract=Currently, some simulative accounts exist within dynamic or evolutionary frameworks that are concerned with the development of linguistic categories within a population of language users. Although these studies mostly emphasize that their models are abstract, the paradigm categorization domain is preferably that of colors. In this paper, the authors argue that color adjectives are special predicates in both linguistic and metaphysical terms: semantically, they are intersective predicates, metaphysically, color properties can be empirically reduced onto purely physical properties. The restriction of categorization simulations to the color paradigm systematically leads to ignoring two ubiquitous features of natural language predicates, namely relativity and context-dependency. Therefore, the models for simulation models of linguistic categories are not able to capture the formation of categories like perspective-dependent predicates ‘left’ and ‘right’, subsective predicates like ‘small’ and ‘big’, or predicates that make reference to abstract objects like ‘I prefer this kind of situation’. The authors develop a three-dimensional grid of ascending complexity that is partitioned according to the semiotic triangle. They also develop a conceptual model in the form of a decision grid by means of which the complexity level of simulation models of linguistic categorization can be assessed in linguistic terms.    @INPROCEEDINGSMehler:Gleim:Waltinger:Ernst:Esch:Feith:2009,     pdf=https://hucompute.org/wp-content/uploads/2015/08/mehler_gleim_waltinger_ernst_esch_feith_2009.pdf,     booktitle=Proceedings of the Symposium "Sprachtechnologie und eHumanities", 26.–27. Februar, Duisburg-Essen University,     author=Mehler, Alexander and Gleim, Rüdiger and Waltinger, Ulli and Ernst, Alexandra and Esch, Dietmar and Feith, Tobias,     website=http://duepublico.uni-duisburg-essen.de/servlets/DocumentServlet?id=37041,     year=2009,     title=eHumanities Desktop – eine webbasierte Arbeitsumgebung für die geisteswissenschaftliche Fachinformatik    @INPROCEEDINGSClarke:Mehler:1999,     booktitle=Proceedings of the 7th International Congress of the IASS-AIS: International Association for Semiotic Studies – Sign Processes in Complex Systems, Dresden, University of Technology, October 6-11,     author=Clarke, Rodney and Mehler, Alexander,     year=1999,     title=Theorising Print Media in Contexts: A Systemic Semiotic Contribution to Computational Semiotics @BOOKMehler:Sharoff:Santini:2010:a,     publisher=Springer,     booktitle=Genres on the Web: Computational Models and Empirical Studies,     editor=Mehler, Alexander and Sharoff, Serge and Santini, Marina,     author=Mehler, Alexander and Sharoff, Serge and Santini, Marina,     year=2010,     image=https://hucompute.org/wp-content/uploads/2015/09/GenresOnTheWeb.jpg,     pagetotal=376,     title=Genres on the Web: Computational Models and Empirical Studies,     address=Dordrecht,     website=http://www.springer.com/computer/ai/book/978-90-481-9177-2,     review=http://www.springerlink.com/content/ym07440380524721/,     abstract=The volume 'Genres on the Web' has been designed for a wide audience, from the expert to the novice. It is a required book for scholars, researchers and students who want to become acquainted with the latest theoretical, empirical and computational advances in the expanding field of web genre research. The study of web genre is an overarching and interdisciplinary novel area of research that spans from corpus linguistics, computational linguistics, NLP, and text-technology, to web mining, webometrics, social network analysis and information studies. This book gives readers a thorough grounding in the latest research on web genres and emerging document types. The book covers a wide range of web-genre focussed subjects, such as: -The identification of the sources of web genres -Automatic web genre identification -The presentation of structure-oriented models -Empirical case studies One of the driving forces behind genre research is the idea of a genre-sensitive information system, which incorporates genre cues complementing the current keyword-based search and retrieval applications.    @BOOKSutter:Mehler:2010,     publisher=Verlag für Sozialwissenschaften,     editor=Sutter, Tilmann and Mehler, Alexander,     author=Sutter, Tilmann and Mehler, Alexander,     year=2010,     pagetotal=289,     image=https://hucompute.org/wp-content/uploads/2015/09/Medienwandel.jpg,     title=Medienwandel als Wandel von Interaktionsformen – von frühen Medienkulturen zum Web 2.0,     address=Wiesbaden,     website=http://www.springer.com/de/book/9783531156422,     abstract=Die Beiträge des Bandes untersuchen den Medienwandel von frühen europäischen Medienkulturen bis zu aktuellen Formen der Internetkommunikation unter soziologischer, kulturwissenschaftlicher und linguistischer Perspektive. Zwar haben sich die Massenmedien von den Beschränkungen sozialer Interaktionen gelöst, sie weisen dem Publikum aber eine distanzierte, bloß rezipierende Rolle zu. Dagegen eröffnen neue Formen 'interaktiver' Medien gesteigerte Möglichkeiten der Rückmeldung und der Mitgestaltung für die Nutzer. Der vorliegende Band fragt nach der Qualität dieses Medienwandels: Werden Medien tatsächlich interaktiv? Was bedeutet die Interaktivität neuer Medien? Werden die durch neue Medien eröffneten Beteiligungsmöglichkeiten realisiert?    @INPROCEEDINGSWagner:Mehler:Wolff:Dotzler:2009,     booktitle=Proceedings of the Symposium "Sprachtechnologie und eHumanities", 26.–27. Februar, Duisburg-Essen University,     author=Wagner, Benno and Mehler, Alexander and Wolff, Christian and Dotzler, Bernhard,     website=http://epub.uni-regensburg.de/6795/,     year=2009,     title=Bausteine eines Literary Memory Information System (LiMeS) am Beispiel der Kafka-Forschung,     abstract=In dem Paper beschreiben wir Bausteine eines Literary Memory Information System (LiMeS), das die literaturwissenschaftliche Erforschung von so genannten Matrixtexten – das sind Primärtexte eines bestimmten literarischen Gesamtwerks – unter dem Blickwinkel großer Mengen so genannter Echotexte (Topia 1984; Wagner/Reinhard 2007) – das sind Subtexte im Sinne eines literaturwissenschaftlichen Intertextualitätsbegriffs – ermöglicht. Den Ausgangspunkt dieses computerphilologischen Informationssystems bildet ein Text-Mining-Modell basierend auf dem Intertextualitätsbegriff in Verbindung mit dem Begriff des Semantic Web (Mehler, 2004b, 2005a, b, Wolff 2005). Wir zeigen, inwiefern dieses Modell über bestehende Informationssystemarchitekturen hinausgeht und schließen einen Brückenschlag zur derzeitigen Entwicklung von Arbeitsumgebungen in der geisteswissenschaftlichen Fachinformatik in Form eines eHumanities Desktop.,     pdf=https://hucompute.org/wp-content/uploads/2015/08/wagner_mehler_wolff_dotzler_2009.pdf    @ARTICLEMehler:Wolff:2005:b,     pdf=https://hucompute.org/wp-content/uploads/2015/08/mehler_wolff_2005_b.pdf,     journal=Journal for Language Technology and Computational Linguistics (JLCL),     pages=1-18,     number=1,     author=Mehler, Alexander and Wolff, Christian,     volume=20,     year=2005,     title=Einleitung: Perspektiven und Positionen des Text Mining,     website=http://epub.uni-regensburg.de/6844/,     abstract=Beiträge zum Thema Text Mining beginnen vielfach mit dem Hinweis auf die enorme Zunahme online verfügbarer Dokumente, ob nun im Internet oder in Intranets (Losiewicz et al. 2000; Merkl 2000; Feldman 2001; Mehler 2001; Joachims & Leopold 2002). Der hiermit einhergehenden „Informationsflut“ wird das Ungenügen des Information Retrieval (IR) bzw. seiner gängigen Verfahren der Informationsaufbereitung und Informationserschließung gegenübergestellt. Es wird bemängelt, dass sich das IR weitgehend darin erschöpft, Teilmengen von Textkollektionen auf Suchanfragen hin aufzufinden und in der Regel bloß listenförmig anzuordnen. Das auf diese Weise dargestellte Spannungsverhältnis von Informationsexplosion und Defiziten bestehender IR-Verfahren bildet den Hintergrund für die Entwicklung von Verfahren zur automatischen Verarbeitung textueller Einheiten, die sich stärker an den Anforderungen von Informationssuchenden orientieren. Anders ausgedrückt: Mit der Einführung der Neuen Medien wächst die Bedeutung digitalisierter Dokumente als Primärmedium für die Verarbeitung, Verbreitung und Verwaltung von Information in öffentlichen und betrieblichen Organisationen. Dabei steht wegen der Menge zu verarbeitender Einheiten die Alternative einer intellektuellen Dokumenterschließung nicht zur Verfügung. Andererseits wachsen die Anforderung an eine automatische Textanalyse, der das klassische IR nicht gerecht wird. Der Mehrzahl der hiervon betroffenen textuellen Einheiten fehlt die explizite Strukturiertheit formaler Datenstrukturen. Vielmehr weisen sie je nach Text- bzw. Dokumenttyp ganz unterschiedliche Strukturierungsgrade auf. Dabei korreliert die Flexibilität der Organisationsziele negativ mit dem Grad an explizierter Strukturiertheit und positiv mit der Anzahl jener Texte und Texttypen (E-Mails, Memos, Expertisen, technische Dokumentationen etc.), die im Zuge ihrer Realisierung produziert bzw. rezipiert werden. Vor diesem Hintergrund entsteht ein Bedarf an Texttechnologien, die ihren Benutzern nicht nur „intelligente“ Schnittstellen zur Textrezeption anbieten, sondern zugleich auf inhaltsorientierte Textanalysen zielen, um auf diese Weise aufgabenrelevante Daten explorieren und kontextsensitiv aufbereiten zu helfen.  Das Text Mining ist mit dem Versprechen verbunden, eine solche Technologie darzustellen bzw. sich als solche zu entwickeln.  Dieser einheitlichen Problembeschreibung stehen konkurrierende Textmining-Spezifikationen gegenüber, was bereits die Vielfalt der Namensgebungen verdeutlicht. So finden sich neben der Bezeichnung Text Mining (Joachims & Leopold 2002; Tan 1999) die Alternativen • Text Data Mining (Hearst 1999b; Merkl 2000), • Textual Data Mining (Losiewicz et al. 2000), • Text Knowledge Engineering (Hahn & Schnattinger 1998), Knowledge Discovery in Texts (Kodratoff 1999) oder Knowledge Discovery in Textual Databases (Feldman & Dagan 1995).  Dabei lässt bereits die Namensgebung erkennen, dass es sich um Analogiebildungen zu dem (nur unwesentlich älteren) Forschungsgebiet des Data Mining (DM; als Bestandteil des Knowledge Discovery in Databases – KDD) handelt. Diese Namensvielfalt findet ihre Entsprechung in widerstreitenden Aufgabenzuweisungen. So setzt beispielsweise Sebastiani (2002) Informationsextraktion und Text Mining weitgehend gleich, wobei er eine Schnittmenge zwischen Text Mining und Textkategorisierung ausmacht (siehe auch Dörre et al. 1999). Demgegenüber betrachten Kosala & Blockeel (2000) Informationsextraktion und Textkategorisierung lediglich als Teilbereiche des ihrer Ansicht nach umfassenderen Text Mining, während Hearst (1999a) im Gegensatz hierzu Informationsextraktion und Textkategorisierung explizit aus dem Bereich des explorativen Text Mining ausschließt.    @INPROCEEDINGSWaltinger:Mehler:Wegner:2009,     booktitle=Proceedings of the 5th International Conference on Web Information Systems and Technologies (WEBIST '09), March 23-26, 2009, Lisboa,     author=Waltinger, Ulli and Mehler, Alexander and Wegner, Armin,     year=2009,     title=A Two-Level Approach to Web Genre Classification,     abstract=This paper presents an approach of two-level categorization of web pages. In contrast to related approaches the model additionally explores and categorizes functionally and thematically demarcated segments of the hypertext types to be categorized. By classifying these segments conclusions can be drawn about the type of the corresponding compound web document.,     pdf=http://www.ulliwaltinger.de/pdf/Webist_2009_TwoLevel_Genre_Classification_WaltingerMehlerWegner.pdf    @ARTICLEMehler:Abramov:Diewald:2011:a,     journal=Computer Speech and Language,     website=http://www.sciencedirect.com/science/article/pii/S0885230810000434,     abstract=In this article, we test a variant of the Sapir-Whorf Hypothesis in the area of complex network theory. This is done by analyzing social ontologies as a new resource for automatic language classification. Our method is to solely explore structural features of social ontologies in order to predict family resemblances of languages used by the corresponding communities to build these ontologies. This approach is based on a reformulation of the Sapir-Whorf Hypothesis in terms of distributed cognition. Starting from a corpus of 160 Wikipedia-based social ontologies, we test our variant of the Sapir-Whorf Hypothesis by several experiments, and find out that we outperform the corresponding baselines. All in all, the article develops an approach to classify linguistic networks of tens of thousands of vertices by exploring a small range of mathematically well-established topological indices.,     author=Mehler, Alexander and Abramov, Olga and Diewald, Nils,     year=2011,     title=Geography of Social Ontologies: Testing a Variant of the Sapir-Whorf Hypothesis in the Context of Wikipedia,     volume=25,     number=3,     pages=716-740,     doi=10.1016/j.csl.2010.05.006    @INCOLLECTIONMehler:Gleim:2006:b,     publisher=Gedit,     booktitle=WaCky! Working Papers on the Web as Corpus,     pages=191-224,     author=Mehler, Alexander and Gleim, Rüdiger,     editor=Baroni, Marco and Bernardini, Silvia,     year=2006,     title=The Net for the Graphs – Towards Webgenre Representation for Corpus Linguistic Studies,     website=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.510.4125,     address=Bologna    @BOOKMehler:2005:e,     series=Journal for Language Technology and Computational Linguistics (JLCL),     volume=20(2),     editor=Mehler, Alexander,     author=Mehler, Alexander,     year=2005,     image=https://hucompute.org/wp-content/uploads/2015/09/Korpuslinguistik.png,     pagetotal=97,     title=Korpuslinguistik,     website=http://www.jlcl.org/2005_Heft2/LDV_Forum_Band_20_Heft_2.pdf    @INPROCEEDINGSMehler:Geibel:Gleim:Herold:Jain:Pustylnikov:2007,     pdf=http://ikw.uni-osnabrueck.de/~ott06/ott06-abstracts/Mehler_Geibel_abstract.pdf,     booktitle=Proceedings of OTT '06 – Ontologies in Text Technology: Approaches to Extract Semantic Knowledge from Structured Information,     pages=63-71,     author=Mehler, Alexander and Geibel, Peter and Gleim, Rüdiger and Herold, Sebastian and Jain, Brijnesh-Johannes and Abramov, Olga,     series=Publications of the Institute of Cognitive Science (PICS),     editor=Mönnich, Uwe and Kühnberger, Kai-Uwe,     year=2007,     title=Much Ado About Text Content. Learning Text Types Solely by Structural Differentiae,     address=Osnabrück,     abstract=In this paper, we deal with classifying texts into classes which denote text types whose textual instances serve more or less homogeneous functions. Other than mainstream approaches to text classification, which rely on the vector space model [30] or some of its descendants [2] and, thus, on content-related lexical features, we solely refer to structural differentiae, that is, to patterns of text structure as determinants of class membership. Further, we suppose that text types span a type hierarchy based on the type-subtype relation [31]. Thus, although we admit that class membership is fuzzy so that overlapping classes are inevitable, we suppose a non-overlapping type system structured into a rooted tree – whether solely based on functional or additional on, e.g., content- or mediabased criteria [1]. What regards criteria of goodness of classification, we perform a classical supervised categorization experiment [30] based on cross-validation as a method of model selection [11]. That is, we perform a categorization experiment in which for all training and test cases class membership is known ex ante. In summary, we perform a supervised experiment of text classification in order to learn functionally grounded text types where membership to these types is solely based on structural criteria.    @INPROCEEDINGSGleim:Mehler:Dehmer:2006:a,     booktitle=Proceedings of the EACL 2006 Workshop on Web as Corpus, April 3-7, 2006, Trento, Italy,     pages=67-74,     author=Gleim, Rüdiger and Mehler, Alexander and Dehmer, Matthias,     editor=Kilgariff, Adam and Baroni, Marco,     year=2006,     abstract=Workshop organizer: Adam Kilgarriff,     title=Web Corpus Mining by Instance of Wikipedia,     pdf=http://www.aclweb.org/anthology/W06-1710,     website=http://pub.uni-bielefeld.de/publication/1773538     @INPROCEEDINGSDehmer:Mehler:Emmert-Streib:2007:a,     booktitle=Proceedings of the 2007 International Conference on Machine Learning: Models, Technologies & Applications (MLMTA '07), June 25-28, 2007, Las Vegas,     pages=113-117,     author=Dehmer, Matthias and Mehler, Alexander and Emmert-Streib, Frank,     year=2007,     title=Graph-theoretical Characterizations of Generalized Trees,     website=https://www.researchgate.net/publication/221188591_Graph-theoretical_Characterizations_of_Generalized_Trees    @INPROCEEDINGSGleim:Mehler:Eikmeyer:2007:a,     pdf=https://hucompute.org/wp-content/uploads/2015/08/gleim_mehler_eikmeyer_2007_a.pdf,     booktitle=Proceedings of the Corpus Linguistics 2007 Conference, Birmingham (UK),     author=Gleim, Rüdiger and Mehler, Alexander and Eikmeyer, Hans-Jürgen,     year=2007,     title=Representing and Maintaining Large Corpora    @INPROCEEDINGSMehler:2002:f,     publisher=Peter Lang,     booktitle=Sprachwissenschaft auf dem Weg in das dritte Jahrtausend. Proceedings of the 34th Linguistics Colloquium, September 7-10, 1999, Universität Mainz,     pages=725-733,     author=Mehler, Alexander,     editor=Rapp, Reinhard,     year=2002,     title=Cohesive Paths: Applying the Concept of Cohesion to Hypertext,     address=Frankfurt a. M.    @INCOLLECTIONMehler:2009:b,     publisher=Springer,     booktitle=Linguistic Modeling of Information and Markup Languages. Contributions to Language Technology,     website=http://www.springerlink.com/content/t27782w8j2125112/,     author=Mehler, Alexander,     series=Text, Speech and Language Technology,     editor=Witt, Andreas and Metzing, Dieter,     year=2009,     title=Structure Formation in the Web. A Graph-Theoretical Model of Hypertext Types,     address=Dordrecht,     abstract=In this chapter we develop a representation model of web document networks. Based on the notion of uncertain web document structures, the model is defined as a template which grasps nested manifestation levels of hypertext types. Further, we specify the model on the conceptual, formal and physical level and exemplify it by reconstructing competing web document models.    @INPROCEEDINGSGleim:Mehler:Waltinger:Menke:2009,     booktitle=5th Corpus Linguistics Conference, University of Liverpool,     author=Gleim, Rüdiger and Mehler, Alexander and Waltinger, Ulli and Menke, Peter,     year=2009,     title=eHumanities Desktop – An extensible Online System for Corpus Management and Analysis,     abstract=This paper presents the eHumanities Desktop - an online system for corpus management and analysis in support of computing in the humanities. Design issues and the overall architecture are described, as well as an outline of the applications offered by the system.,     website=http://www.ulliwaltinger.de/ehumanities-desktop-an-extensible-online-system-for-corpus-management-and-analysis/,     pdf=http://www.ulliwaltinger.de/pdf/eHumanitiesDesktop-AnExtensibleOnlineSystem-CL2009.pdf    @INPROCEEDINGSMehler:Dehmer:Gleim:2005,     publisher=Lang,     booktitle=Sprachtechnologie, mobile Kommunikation und linguistische Ressourcen. Beiträge zur GLDV-Frühjahrstagung '05, 10. März – 01. April 2005, Universität Bonn,     pages=158-174,     author=Mehler, Alexander and Dehmer, Matthias and Gleim, Rüdiger,     editor=Fisseni, Bernhard and Schmitz, Hans-Christina and Schröder, Bernhard and Wagner, Petra,     year=2005,     title=Zur Automatischen Klassifikation von Webgenres,     address=Frankfurt a. M.    @INPROCEEDINGSMehler:Luecking:2009,     publisher=IEEE,     pdf=https://hucompute.org/wp-content/uploads/2015/08/mehler_luecking_2009.pdf,     website=http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?reload=true&arnumber=5308098,     booktitle=Proceedings of IEEE Africon 2009, September 23-25, Nairobi, Kenya,     author=Mehler, Alexander and Lücking, Andy,     year=2009,     title=A Structural Model of Semiotic Alignment: The Classification of Multimodal Ensembles as a Novel Machine Learning Task,     abstract=In addition to the well-known linguistic alignment processes in dyadic communication – e.g., phonetic, syntactic, semantic alignment – we provide evidence for a genuine multimodal alignment process, namely semiotic alignment. Communicative elements from different modalities 'routinize into' cross-modal 'super-signs', which we call multimodal ensembles. Computational models of human communication are in need of expressive models of multimodal ensembles. In this paper, we exemplify semiotic alignment by means of empirical examples of the building of multimodal ensembles. We then propose a graph model of multimodal dialogue that is expressive enough to capture multimodal ensembles. In line with this model, we define a novel task in machine learning with the aim of training classifiers that can detect semiotic alignment in dialogue. This model is in support of approaches which need to gain insights into realistic human-machine communication.    @INCOLLECTIONMehler:2009:c,     publisher=Wiley-VCH,     pdf=https://hucompute.org/wp-content/uploads/2015/08/mehler_2009_b.pdf,     booktitle=Analysis of Complex Networks: From Biology to Linguistics,     pages=175-220,     author=Mehler, Alexander,     editor=Dehmer, Matthias and Emmert-Streib, Frank,     website=https://www.researchgate.net/publication/255666602_1_Generalised_Shortest_Paths_Trees_A_Novel_Graph_Class_Applied_to_Semiotic_Networks,     year=2009,     title=Generalized Shortest Paths Trees: A Novel Graph Class Applied to Semiotic Networks,     address=Weinheim    @BOOKMehler:Lobin:2004:a,     publisher=Verlag für Sozialwissenschaften,     editor=Mehler, Alexander and Lobin, Henning,     author=Mehler, Alexander and Lobin, Henning,     year=2004,     pagetotal=290,     title=Automatische Textanalyse. Systeme und Methoden zur Annotation und Analyse natürlichsprachlicher Texte,     address=Wiesbaden,     website=http://www.v-r.de/de/Mehler-Lobin-Automatische-Textanalyse/t/352526527/ @BOOKMehler:Wolff:2005:a,     series=Journal for Language Technology and Computational Linguistics (JLCL),     volume=20(1),     editor=Mehler, Alexander and Wolff, Christian,     author=Mehler, Alexander and Wolff, Christian,     image=https://hucompute.org/wp-content/uploads/2015/09/TextMining.png,     publisher=GSCL,     year=2005,     pagetotal=143,     title=Text Mining,     website=http://www.jlcl.org/2005_Heft1/LDV-Forum1.2005.pdf    @INCOLLECTIONMehler:2008:b,     publisher=De Gruyter,     pdf=https://hucompute.org/wp-content/uploads/2015/08/mehler_2007_a.pdf,     booktitle=Corpus Linguistics. An International Handbook of the Science of Language and Society,     pages=328–382,     author=Mehler, Alexander,     editor=Lüdeling, Anke and Kytö, Merja,     year=2008,     title=Large Text Networks as an Object of Corpus Linguistic Studies,     address=Berlin/New York    @INPROCEEDINGSGeibel:Pustylnikov:Mehler:Gust:Kuehnberger:2007,     publisher=Springer,     booktitle=Proceedings of ICONIP 2007 (14th International Conference on Neural Information Processing),     website=http://www.springerlink.com/content/x414002113425742/,     pages=779–788,     author=Geibel, Peter and Abramov, Olga and Mehler, Alexander and Gust, Helmar and Kühnberger, Kai-Uwe,     series=Lecture Notes in Computer Science 4985,     year=2007,     title=Classification of Documents Based on the Structure of Their DOM Trees,     abstract=In this paper, we discuss kernels that can be applied for the classification of XML documents based on their DOM trees. DOM trees are ordered trees in which every node might be labeled by a vector of attributes including its XML tag and the textual content. We describe five new kernels suitable for such structures: a kernel based on predefined structural features, a tree kernel derived from the well-known parse tree kernel, the set tree kernel that allows permutations of children, the string tree kernel being an extension of the so-called partial tree kernel, and the soft tree kernel as a more efficient alternative. We evaluate the kernels experimentally on a corpus containing the DOM trees of newspaper articles and on the well-known SUSANNE corpus.    @INPROCEEDINGSMehler:2002:k,     publisher=Morgan Kaufmann,     pdf=https://hucompute.org/wp-content/uploads/2015/08/mehler_2002_k.pdf,     booktitle=Proceedings of the 19th International Conference on Computational Linguistics (COLING '02), August 24 – September 1, 2002, Taipei, Taiwan,     pages=646-652,     author=Mehler, Alexander,     year=2002,     title=Hierarchical Orderings of Textual Units,     address=San Francisco,     abstract=Text representation is a central task for any approach to automatic learning from texts. It requires a format which allows to interrelate texts even if they do not share content words, but deal with similar topics. Furthermore, measuring text similarities raises the question of how to organize the resulting clusters. This paper presents cohesion trees (CT) as a data structure for the perspective, hierarchical organization of text corpora. CTs operate on alternative text representation models taking lexical organization, quantitative text characteristics, and text structure into account. It is shown that CTs realize text linkages which are lexically more homogeneous than those produced by minimal spanning trees.
@ARTICLE{Dehmer:Mehler:2007:a,
journal={Tatra Mountains Mathematical Publications},
pages={39-59},
author={Dehmer, Matthias and Mehler, Alexander},
volume={36},
year={2007},
title={A New Method of Measuring the Similarity for a Special Class of Directed Graphs},
abstract={The problem of graph similarity is challenging and important in many areas of science, e.g., mathematics Sobik, F.: Graphmetriken und Klassifikation strukturierter Objekte, ZKI-Informationen, Akad. Wiss. DDR, 2, (1982), 63{122]] Zelinka, B.: On a certain distance between isomorphism classes of graphs, Cas. Pest. Mat., 100, (1975), 371{373], biology Koch, I., Lengauer, T.. Wanke, E.: An algorithm for nding maximal common subtopologies in a set of protein structures, J. Comput. Biology, 3, (1996), 289{306], and chemistry Skvortsova, M. I., Baskin, I. I., Stankevich, I. V., Palyulin, V. A., Zerirow, N. S.: Molecular similarity in structure-property relationship studies. Analytical description of the complete set of graph similarity measures, International Symposium CACR'96, (1996) pp. 542{646]. In this paper, we design a new method, which uses sequence alignment techniques Altschul,.: Gapped BLAST and PSI{ BLAST: a new generation of protein database search programs, Nucleic Acids Res. 25, (1997), 3389{3402], Kilian, J., Hoos, H. H.: MusicBLAST{gapped sequence alignment for MIR, in: Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR), (2004)], to measure the structural similarity of unlabeled, hierarchical, and directed graphs. More precisely, i f h denotes the maximal length of a path from the root to a leaf of a given hierarchical and directed graph ^ H, w e align out-degree and in-degree sequences induced by the vertex sequences on a level i , 0 6 i 6 h . On the basis of the level alignments, we construct measured values and prove that they are similarity measures. In our algorithm, which uses the well-known technique of dynamic programming, the alignments of out-degree and in-degree sequences are decoupled. Therefore, we obtain a family (d i (^ H 1 ^ H 2)) 16i63 of graph similarity measures. As an applica-tion, we examine the measures on a graph corpus of 464 graphs, where the graphs represent web-based hypertext structures (websites). 2000 Mathematics Subject Classification: Primary 05C75, 05C20, 68R155 Secondary 90C39, 68R10, 91B82. Keywords: digraphs, similarity measures, sequence alignments, degree sequences.},
website={https://www.researchgate.net/publication/228905939_A_new_method_of_measuring_similarity_for_a_special_class_of_directed_graphs}}

@INPROCEEDINGS{Mehler:2006:a,
booktitle={Proceedings of the 2006 International Conference on Bioinformatics \& Computational Biology (BIOCOMP '06), June 26, 2006, Las Vegas, USA},
pages={496-500},
author={Mehler, Alexander},
editor={Arabnia, Hamid R. and Valafar, Homayoun},
year={2006},
title={In Search of a Bridge between Network Analysis in Computational Linguistics and Computational Biology – A Conceptual Note},

}}}}@INPROCEEDINGS{Pustylnikov:Mehler:2008:a,
booktitle={Proceedings of First International Conference on Global Interoperability for Language Resources (ICGL 2008), Hong Kong SAR, January 9-11},
author={Abramov, Olga and Mehler, Alexander},
year={2008},
title={Towards a Uniform Representation of Treebanks: Providing Interoperability for Dependency Tree Data},
website={https://www.researchgate.net/publication/242681771_Towards_a_Uniform_Representation_of_Treebanks_Providing_Interoperability_for_Dependency_Tree_Data},
pdf={http://wwwhomes.uni-bielefeld.de/opustylnikov/pustylnikov/pdfs/acl07.1.0.pdf},
abstract={In this paper we present a corpus representation format which unifies the representation of a wide range of dependency treebanks within a single model. This approach provides interoperability and reusability of annotated syntactic data which in turn extends its applicability within various research contexts. We demonstrate our approach by means of dependency treebanks of 11 languages. Further, we perform a comparative quantitative analysis of these treebanks in order to demonstrate the interoperability of our approach. }}

@INPROCEEDINGS{Rehm:Santini:Mehler:Braslavski:Gleim:Stubbe:Symonenko:Tavosanis:Vidulin:2008,
booktitle={Proceedings of the 6th Language Resources and Evaluation Conference (LREC 2008), Marrakech (Morocco)},
author={Rehm, Georg and Santini, Marina and Mehler, Alexander and Braslavski, Pavel and Gleim, Rüdiger and Stubbe, Andrea and Symonenko, Svetlana and Tavosanis, Mirko and Vidulin, Vedrana},
year={2008},
title={Towards a Reference Corpus of Web Genres for the Evaluation of Genre Identification Systems},
website={http://www.lrec-conf.org/proceedings/lrec2008/summaries/94.html},
abstract={We present initial results from an international and multi-disciplinary research collaboration that aims at the construction of a reference corpus of web genres. The primary application scenario for which we plan to build this resource is the automatic identification of web genres. Web genres are rather difficult to capture and to describe in their entirety, but we plan for the finished reference corpus to contain multi-level tags of the respective genre or genres a web document or a website instantiates. As the construction of such a corpus is by no means a trivial task, we discuss several alternatives that are, for the time being, mostly based on existing collections. Furthermore, we discuss a shared set of genre categories and a multi-purpose tool as two additional prerequisites for a reference corpus of web genres. }}

@INPROCEEDINGS{Geibel:Krumnack:Pustylnikov:Mehler:Gust:Kuehnberger:2007,
publisher={Springer},
booktitle={Proceedings of AI 2007: Advances in Artificial Intelligence, 20th Australian Joint Conference on Artificial Intelligence, Gold Coast, Australia, December 2-6, 2007},
pages={642-646},
author={Geibel, Peter and Krumnack, Ulf and Abramov, Olga and Mehler, Alexander and Gust, Helmar and Kühnberger, Kai-Uwe},
series={Lecture Notes in Computer Science},
volume={4830},
editor={Orgun, Mehmet A. and Thornton, John},
year={2007},
title={Structure-Sensitive Learning of Text Types},
abstract={In this paper, we discuss the structure based classification of documents based on their logical document structure, i.e., their DOM trees. We describe a method using predefined structural features and also four tree kernels suitable for such structures. We evaluate the methods experimentally on a corpus containing the DOM trees of newspaper articles, and on the well-known SUSANNE corpus. We will demonstrate that, for the two corpora, many text types can be learned based on structural features only.}}

@ARTICLE{Mehler:2005:a,
journal={Sprache und Datenverarbeitung. International Journal for Language Data Processing},
pages={29-53},
author={Mehler, Alexander},
publisher={GSCL},
volume={1},
year={2005},
title={Zur textlinguistischen Fundierung der Text- und Korpuskonversion},
abstract={Die automatische Konversion von Texten in Hypertexte ist mit der Erwartung verbunden, computerbasierte Rezeptionshilfen zu gewinnen. Dies betrifft insbesondere die Bew{\"a}ltigung der ungeheuren Menge an Fachliteratur im Rahmen der Wissenschaftskommunikation. Von einem thematisch relevanten Text zu einem thematisch verwandten Text per Hyperlink direkt gelangen zu können, stellt einen Anspruch dar, dessen Erfüllung mittels digitaler Bibliotheken n{\"a}her gerückt zu sein scheint. Doch wie lassen sich die Kriterien, nach denen Texte automatisch verlinkt werden, genauer begründen? Dieser Beitrag geht dieser Frage aus der Sicht textlinguistischer Modellbildungen nach. Er zeigt, dass parallel zur Entwicklung der Textlinguistik, wenn auch mit einer gewissen Verzögerung, Konversionsans{\"a}tze entwickelt wurden, die sich jeweils an einer bestimmten Stufe des Textbegriffs orientieren. Der Beitrag weist nicht nur das diesen Ans{\"a}tzen gemeinsame Fundament in Form der so genannten Explikationshypothese nach, sondern verweist zugleich auf grundlegende Automatisierungsdefizite, die mit ihnen verbunden sind. Mit systemisch-funktionalen Hypertexten wird schlie{\ss}lich ein Ansatz skizziert, der darauf zielt, den Anspruch nach textlinguistischer Fundierung und Automatisierbarkeit zu vereinen.}}

@INPROCEEDINGS{Waltinger:Mehler:2009:c,
booktitle={IEEE/WIC/ACM International Conference on Web Intelligence, September 15–18, Milano},
author={Waltinger, Ulli and Mehler, Alexander},
website={http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5284920&abstractAccess=no&userType=inst},
year={2009},
title={Social Semantics and Its Evaluation By Means Of Semantic Relatedness And Open Topic Models},
abstract={This paper presents an approach using social semantics for the task of topic labelling by means of Open Topic Models. Our approach utilizes a social ontology to create an alignment of documents within a social network. Comprised category information is used to compute a topic generalization. We propose a feature-frequency-based method for measuring semantic relatedness which is needed in order to reduce the number of document features for the task of topic labelling. This method is evaluated against multiple human judgement experiments comprising two languages and three different resources. Overall the results show that social ontologies provide a rich source of terminological knowledge. The performance of the semantic relatedness measure with correlation values of up to .77 are quite promising. Results on the topic labelling experiment show, with an accuracy of up to .79, that our approach can be a valuable method for various NLP applications.}}

@INPROCEEDINGS{Stuehrenberg:Beisswenger:Kuehnberger:Mehler:Luengen:Metzing:Moennich:2008,
booktitle={Proceedings of the Post LREC-2008 Workshop: Sustainability of Language Resources and Tools for Natural Language Processing Marrakech, Morocco},
author={Stührenberg, Maik and Bei{\ss}wenger, Michael and Kühnberger, Kai-Uwe and Mehler, Alexander and Lüngen, Harald and Metzing, Dieter and Mönnich, Uwe},
year={2008},
title={Sustainability of Text-Technological Resources},
pdf={http://www.michael-beisswenger.de/pub/lrec-sustainability.pdf},
abstract={We consider that there are obvious relationships between research on sustainability of language and linguistic resources on the one hand and work undertaken in the Research Unit 'Text-Technological Modelling of Information' on the other. Currently the main focus in sustainability research is concerned with archiving methods of textual resources, i.e. methods for sustainability of primary and secondary data; these aspects are addressed in our work as well. However, we believe that there are additional certain aspects of sustainability on which new light is shed on by procedures, algorithms and dynamic processes undertaken in our Research Unit}}

@ARTICLE{Mehler:2002:l,
journal={Journal of Universal Computer Science (J.UCS)},
pages={924-943},
number={10},
author={Mehler, Alexander},
volume={8},
year={2002},
title={Components of a Model of Context-Sensitive Hypertexts},
website={http://www.jucs.org/jucs_8_10/components_of_a_model},
abstract={On the background of rising Intranet applications the automatic generation of adaptable, context-sensitive hypertexts becomes more and more important [El-Beltagy et al., 2001]. This observation contradicts the literature on hypertext authoring, where Information Retrieval techniques prevail, which disregard any linguistic and context-theoretical underpinning. As a consequence, resulting hypertexts do not manifest those schematic structures, which are constitutive for the emergence of text types and the context-mediated understanding of their instances, i.e. natural language texts. This paper utilizes Systemic Functional Linguistics (SFL) and its context model as a theoretical basis of hypertext authoring. So called Systemic Functional Hypertexts (SFHT) are proposed, which refer to a stratified context layer as the proper source of text linkage. The purpose of this paper is twofold: First, hypertexts are reconstructed from a linguistic point of view as a kind of supersign, whose constituents are natural language texts and whose structuring is due to intra- and intertextual coherence relations and their context-sensitive interpretation. Second, the paper prepares a formal notion of SFHTs as a first step towards operationalization of fundamental text linguistic concepts. On this background, SFHTs serve to overcome the theoretical poverty of many approaches to link generation.}}

@INPROCEEDINGS{Gleim:Mehler:Dehmer:Abramov:2007,
booktitle={3rd International Conference on Web Information Systems and Technologies (WEBIST '07), March 3-6, 2007, Barcelona},
pages={142-149},
author={Gleim, Rüdiger and Mehler, Alexander and Dehmer, Matthias and Abramov, Olga},
editor={Filipe, Joaquim and Cordeiro, José and Encarnação, Bruno and Pedrosa, Vitor},
year={2007},
title={Aisles through the Category Forest – Utilising the Wikipedia Category System for Corpus Building in Machine Learning},
abstract={The Word Wide Web is a continuous challenge to machine learning. Established approaches have to be enhanced and new methods be developed in order to tackle the problem of finding and organising relevant information. It has often been motivated that semantic classifications of input documents help solving this task. But while approaches of supervised text categorisation perform quite well on genres found in written text, newly evolved genres on the web are much more demanding. In order to successfully develop approaches to web mining, respective corpora are needed. However, the composition of genre- or domain-specific web corpora is still an unsolved problem. It is time consuming to build large corpora of good quality because web pages typically lack reliable meta information. Wikipedia along with similar approaches of collaborative text production offers a way out of this dilemma. We examine how social tagging, as supported by the MediaWiki software, can be utilised as a source of corpus building. Further, we describe a representation format for social ontologies and present the Wikipedia Category Explorer, a tool which supports categorical views to browse through the Wikipedia and to construct domain specific corpora for machine learning.},

@INCOLLECTION{Mehler:Job:Blanchard:Eikmeyer:2008,
publisher={VS},
booktitle={Netzwerkanalyse und Netzwerktheorie},
pages={413-427},
author={Mehler, Alexander and Job, Barbara and Blanchard, Philippe and Eikmeyer, Hans-Jürgen},
editor={Stegbauer, Christian},
year={2008},
title={Sprachliche Netzwerke},
abstract={In diesem Kapitel beschreiben wir so genannte sprachliche Netzwerke. Dabei handelt es sich um Netzwerke sprachlicher Einheiten, die in Zusammenhang mit ihrer Einbettung in das Netzwerk jener Sprachgemeinschaft analysiert werden, welche diese Einheiten und deren Vernetzung hervorgebracht hat. Wir erörtern ein Dreistufenmodell zur Analyse solcher Netzwerke und exemplifizieren dieses Modell anhand mehrerer Spezialwikis. Ein Hauptaugenmerk des Kapitels liegt dabei auf einem Mehrebenennetzwerkmodell, und zwar in Abkehr von den unipartiten Graphmodellen der Theorie komplexer Netzwerke.}}

@INPROCEEDINGS{Pustylnikov:Mehler:Gleim:2008,
booktitle={Proceedings of the 6th Language Resources and Evaluation Conference (LREC 2008), Marrakech (Morocco)},
author={Abramov, Olga and Mehler, Alexander and Gleim, Rüdiger},
year={2008},
title={A Unified Database of Dependency Treebanks. Integrating, Quantifying and Evaluating Dependency Data},
pdf={http://wwwhomes.uni-bielefeld.de/opustylnikov/pustylnikov/pdfs/LREC08_full.pdf},
abstract={This paper describes a database of 11 dependency treebanks which were unified by means of a two-dimensional graph format. The format was evaluated with respect to storage-complexity on the one hand, and efficiency of data access on the other hand. An example of how the treebanks can be integrated within a unique interface is given by means of the DTDB interface. }}

@INPROCEEDINGS{Mehler:Gleim:Wegner:2007,
booktitle={Proceedings of the Workshop "Towards Genre-Enabled Search Engines: The Impact of NLP", September, 30, 2007, in conjunction with RANLP 2007, Borovets, Bulgaria},
pages={13-19},
author={Mehler, Alexander and Gleim, Rüdiger and Wegner, Armin},
editor={Rehm, Georg and Santini, Marina},
year={2007},
title={Structural Uncertainty of Hypertext Types. An Empirical Study},

@INPROCEEDINGS{Mehler:Gleim:Dehmer:2006,
publisher={Springer},
booktitle={Proceedings of the 29th Annual Conference of the German Classification Society, March 9-11, 2005, Universit{\"a}t Magdeburg},
pages={406-413},
author={Mehler, Alexander and Gleim, Rüdiger and Dehmer, Matthias},
editor={Spiliopoulou, Myra and Kruse, Rudolf and Borgelt, Christian and Nürnberger, Andreas and Gaul, Wolfgang},
year={2006},
title={Towards Structure-Sensitive Hypertext Categorization},
abstract={Hypertext categorization is the task of automatically assigning category labels to hypertext units. Comparable to text categorization it stays in the area of function learning based on the bag-of-features approach. This scenario faces the problem of a many-to-many relation between websites and their hidden logical document structure. The paper argues that this relation is a prevalent characteristic which interferes any effort of applying the classical apparatus of categorization to web genres. This is confirmed by a threefold experiment in hypertext categorization. In order to outline a solution to this problem, the paper sketches an alternative method of unsupervised learning which aims at bridging the gap between statistical and structural pattern recognition (Bunke et al. 2001) in the area of web mining.}}

@INCOLLECTION{Mehler:2010:a,
publisher={Birkh{\"a}user Publishing},
booktitle={Structural Analysis of Complex Networks},
author={Mehler, Alexander},
editor={Dehmer, Matthias},
pages={381-401},
year={2010},
title={Minimum Spanning Markovian Trees: Introducing Context-Sensitivity into the Generation of Spanning Trees},
abstract={This chapter introduces a novel class of graphs: Minimum Spanning Markovian Trees (MSMTs). The idea behind MSMTs is to provide spanning trees that minimize the costs of edge traversals in a Markovian manner, that is, in terms of the path starting with the root of the tree and ending at the vertex under consideration. In a second part, the chapter generalizes this class of spanning trees in order to allow for damped Markovian effects in the course of spanning. These two effects, (1) the sensitivity to the contexts generated by consecutive edges and (2) the decreasing impact of more antecedent (or 'weakly remembered') vertices, are well known in cognitive modeling [6, 10, 21, 23]. In this sense, the chapter can also be read as an effort to introduce a graph model to support the simulation of cognitive systems. Note that MSMTs are not to be confused with branching Markov chains or Markov trees [20] as we focus on generating spanning trees from given weighted undirected networks.},
website={https://www.researchgate.net/publication/226700676_Minimum_Spanning_Markovian_Trees_Introducing_Context-Sensitivity_into_the_Generation_of_Spanning_Trees}}

@INCOLLECTION{Mehler:2006:d,
publisher={De Gruyter},
booktitle={Exact Methods in the Study of Language and Text},
pages={437-446},
author={Mehler, Alexander},
series={Quantitative Linguistics},
editor={Grzybek, Peter and Köhler, Reinhard},
year={2006},
title={A Network Perspective on Intertextuality},

@INCOLLECTION{Santini:Mehler:Sharoff:2009,
publisher={Springer},
booktitle={Genres on the Web: Computational Models and Empirical Studies},
crossref={Genres on the Web: Computational Models and Empirical Studies},
pages={3-32},
author={Santini, Marina and Mehler, Alexander and Sharoff, Serge},
editor={Mehler, Alexander and Sharoff, Serge and Santini, Marina},
year={2009},
title={Riding the Rough Waves of Genre on the Web: Concepts and Research Questions},
abstract={This chapter outlines the state of the art of empirical and computational webgenre research. First, it highlights why the concept of genre is profitable for a range of disciplines. At the same time, it lists a number of recent interpretations that can inform and influence present and future genre research. Last but not least, it breaks down a series of open issues that relate to the modelling of the concept of webgenre in empirical and computational studies.}}

@ARTICLE{Mehler:2008:a,
journal={Applied Artificial Intelligence},
pages={619–683},
number={7\&8},
author={Mehler, Alexander},
volume={22},
doi={10.1080/08839510802164085},
year={2008},
title={Structural Similarities of Complex Networks: A Computational Model by Example of Wiki Graphs},
website={https://www.researchgate.net/publication/200772675_Structural_similarities_of_complex_networks_A_computational_model_by_example_of_wiki_graphs},
abstract={This article elaborates a framework for representing and classifying large complex networks by example of wiki graphs. By means of this framework we reliably measure the similarity of document, agent, and word networks by solely regarding their topology. In doing so, the article departs from classical approaches to complex network theory which focuses on topological characteristics in order to check their small world property. This does not only include characteristics that have been studied in complex network theory, but also some of those which were invented in social network analysis and hypertext theory. We show that network classifications come into reach which go beyond the hypertext structures traditionally analyzed in web mining. The reason is that we focus on networks as a whole as units to be classified—above the level of websites and their constitutive pages. As a consequence, we bridge classical approaches to text and web mining on the one hand and complex network theory on the other hand. Last but not least, this approach also provides a framework for quantifying the linguistic notion of intertextuality.}}

@BOOK{Luengen:Mehler:Storrer:2008:a,
series={Journal for Language Technology and Computational Linguistics (JLCL)},
volume={23(2)},
editor={Lüngen, Harald and Mehler, Alexander and Storrer, Angelika},
author={Mehler, Alexander},
publisher={GSCL},
year={2008},
pagetotal={111},
title={Lexical-Semantic Resources in Automated Discourse Analysis},
pdf={{http://www.jlcl.org/2008_Heft2/JLCL23(2).pdf}},
website={https://www.researchgate.net/publication/228956889_Lexical-Semantic_Resources_in_Automated_Discourse_Analysis}}

@INPROCEEDINGS{Gleim:Mehler:2010:b,
publisher={ELDA},
booktitle={Proceedings of LREC 2010},
author={Gleim, Rüdiger and Mehler, Alexander},
year={2010},
title={Computational Linguistics for Mere Mortals – Powerful but Easy-to-use Linguistic Processing for Scientists in the Humanities},
abstract={Delivering linguistic resources and easy-to-use methods to a broad public in the humanities is a challenging task. On the one hand users rightly demand easy to use interfaces but on the other hand want to have access to the full flexibility and power of the functions being offered. Even though a growing number of excellent systems exist which offer convenient means to use linguistic resources and methods, they usually focus on a specific domain, as for example corpus exploration or text categorization. Architectures which address a broad scope of applications are still rare. This article introduces the eHumanities Desktop, an online system for corpus management, processing and analysis which aims at bridging the gap between powerful command line tools and intuitive user interfaces. }
}

@INPROCEEDINGS{Dehmer:Emmert:Streib:Mehler:Kilian:Muehlhaeuser:2005,
booktitle={Proceedings of VI. International Conference on Enformatika, Systems Sciences and Engineering, Budapest, Hungary, October 2005, International Academy of Sciences: Enformatika 8 (2005)},
pages={77-81},
author={Dehmer, Matthias and Emmert-Streib, Frank and Mehler, Alexander and Kilian, Jürgen and Mühlh{\"a}user, Max},
year={2005},
title={Application of a similarity measure for graphs to web-based document structures},
pdf={http://waset.org/publications/15299/application-of-a-similarity-measure-for-graphs-to-web-based-document-structures},
website={https://www.researchgate.net/publication/238687277_Application_of_a_Similarity_Measure_for_Graphs_to_Web-based_Document_Structures},
abstract={Due to the tremendous amount of information provided by the World Wide Web (WWW) developing methods for mining the structure of web-based documents is of considerable interest. In this paper we present a similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as linear integer strings, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments for solving a novel and challenging problem: Measuring the structural similarity of generalized trees. In other words: We first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem for developing a efficient graph similarity measure. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based document structures.}}

@INPROCEEDINGS{Mehler:Clarke:2002,
booktitle={New Directions in Humanities Computing. The 14th Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities (ALLC/ACH '02), July 24-28, University of Tübingen},
pages={68-69},
author={Mehler, Alexander and Clarke, Rodney},
year={2002},
title={Systemic Functional Hypertexts. An Architecture for Socialsemiotic Hypertext Systems}}

@INCOLLECTION{Mehler:2004:h,
publisher={Stauffenburg},
booktitle={Texttechnologie. Perspektiven und Anwendungen},
pages={329-352},
author={Mehler, Alexander},
editor={Lobin, Henning and Lemnitzer, Lothar},
year={2004},
title={Textmining},

@INPROCEEDINGS{Mehler:2002:e,
publisher={Springer},
booktitle={Classification, Automation, and New Media. Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation, March 15-17, 2000, Universit{\"a}t Passau},
pages={199-206},
author={Mehler, Alexander},
editor={Gaul, Wolfgang and Ritter, Gunter},
year={2002},
title={Text Mining with the Help of Cohesion Trees},
abstract={In the framework of automatic text processing, semantic spaces are used as a format for modeling similarities of natural language texts represented as vectors. They prove to be efficient in divergent areas, as information retrieval (Dumais 1995), computational psychology (Landauer, Dumais 1997), and computational linguistics (Rieger 1995; Mehler 1998). In order to group semantically similar texts, cluster analysis is used. A central problem of this method relates to the difficulty to name clusters, whereas lists neglect the polyhierarchical structure of semantic spaces. This paper introduces the concept of cohesion tree as an alternative tool for exploring similarity relations of texts represented in high dimensional spaces. Cohesion trees allow the perspective evaluation of numerically represented text similarities. They depart from minimal spanning trees (MST) by context-sensitively optimizing path costs. This central property underlies the linguistic interpretation of cohesion trees: instead of manifesting context-free associations, they model context priming effects.}}

@INPROCEEDINGS{Mehler:2005:c,
booktitle={Learning and Extending Lexical Ontologies. Proceedings of the Workshop at the 22nd International Conference on Machine Learning (ICML '05), August 7-11, 2005, Universit{\"a}t Bonn, Germany},
pages={41-47},
author={Mehler, Alexander},
editor={Biemann, Chris and Paa{\ss}, Gerhard},
year={2005},
title={Preliminaries to an Algebraic Treatment of Lexical Associations}}

@INPROCEEDINGS{Mehler:Gleim:2005:a,
booktitle={Proceedings of Corpus Linguistics '05, July 14-17, 2005, University of Birmingham, Great Britian},
author={Mehler, Alexander and Gleim, Rüdiger},
volume={Corpus Linguistics Conference Series 1(1)},
year={2005},
title={Polymorphism in Generic Web Units. A corpus linguistic study},
pdf={http://www.birmingham.ac.uk/Documents/college-artslaw/corpus/conference-archives/2005-journal/Thewebasacorpus/AlexanderMehlerandRuedigerGleimCorpusLinguistics2005.pdf},
issn={1747-9398},
abstract={Corpus linguistics and related disciplines which focus on statistical analyses of textual units have substantial need for large corpora. More speciﬁcally, genre or register speciﬁc corpora are needed which allow studying variations in language use. Along with the incredible growth of the internet, the web became an important source of linguistic data. Of course, web corpora face the same problem of acquiring genre speciﬁc corpora. Amongst other things, web mining is a framework of methods for automatically assigning category labels to web units and thus may be seen as a solution to this corpus acquisition problem as far as genre categories are applied. The paper argues that this approach is faced with the problem of a many-to-many relation between expression units on the one hand and content or function units on the other hand. A quantitative study is performed which supports the argumentation that functions of web-based communication are very often concentrated on single web pages and thus interfere any effort of directly applying the classical apparatus of categorization on web page level. The paper outlines a two-level algorithm as an alternative approach to category assignment which is sensitive to genre speciﬁc structures and thus may be used to tackle the problem of acquiring genre speciﬁc corpora.}}

@INPROCEEDINGS{Mehler:2007:d,
booktitle={Proceedings of the Workshop on Language, Games, and Evolution at the 9th European Summer School in Logic, Language and Information (ESSLLI 2007), Trinity College, Dublin, 6-17 August},
pages={57-67},
author={Mehler, Alexander},
editor={Benz, Anton and Ebert, Christian and van Rooij, Robert},
year={2007},
title={Evolving Lexical Networks. A Simulation Model of Terminological Alignment},
abstract={In this paper we describe a simulation model of terminological alignment in a multiagent community. It is based on the notion of an association game which is used instead of the classical notion of a naming game (Steels, 1996). The simulation model integrates a small world-like agent community which restricts agent communication. We hypothesize that this restriction is decisive when it comes to simulate terminological alignment based on lexical priming. The paper presents preliminary experimental results in support of this hypothesis.}}

@INCOLLECTION{Mehler:Lobin:2004:b,
publisher={Verlag für Sozialwissenschaften},
booktitle={Automatische Textanalyse: Systeme und Methoden zur Annotation und Analyse natürlichsprachlicher Texte},
pages={1-21},
author={Mehler, Alexander and Lobin, Henning},
editor={Mehler, Alexander and Lobin, Henning},
year={2004},
title={Aspekte der texttechnologischen Modellierung},

@ARTICLE{Dehmer:Emmert:Streib:Mehler:Kilian:2006,
journal={International Journal of Computational Intelligence},
pages={1-7},
number={1},
author={Dehmer, Matthias and Emmert-Streib, Frank and Mehler, Alexander and Kilian, Jürgen},
volume={3},
year={2006},
title={Measuring the Structural Similarity of Web-based Documents: A Novel Approach},
pdf={http://waset.org/publications/15928/measuring-the-structural-similarity-of-web-based-documents-a-novel-approach},
website={http://connection.ebscohost.com/c/articles/24839145/measuring-structural-similarity-web-based-documents-novel-approach},
abstract={Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees. We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents.}}

@ARTICLE{Mehler:1996:b,
journal={Journal of Quantitative Linguistics},
pages={113-127},
number={2},
author={Mehler, Alexander},
volume={3},
year={1996},
title={A Multiresolutional Approach to Fuzzy Text Meaning},
abstract={In diesem Beitrag beschreiben wir den eHumanities Desktop3. Es handelt sich dabei um eine rein webbasierte Umgebung für die texttechnologische Arbeit mit Korpora, welche von der standardisierten Repr{\"a}sentation textueller Einheiten über deren computerlinguistische Vorverarbeitung bis hin zu Text Mining–Funktionalit{\"a}ten eine gro{\ss}e Zahl von Werkzeugen integriert. Diese Integrationsleistung betrifft neben den Textkorpora und den hierauf operierenden texttechnologischen Werkzeugen auch die je zum Einsatz kommenden lexikalischen Ressourcen. Aus dem Blickwinkel der geisteswissenschaftlichen Fachinformatik gesprochen fokussiert der Desktop somit darauf, eine Vielzahl heterogener sprachlicher Ressourcen mit grundlegenden texttechnologischen Methoden zu integrieren, und zwar so, dass das Integrationsresultat auch in den H{\"a}nden von Nicht–Texttechnologen handhabbar bleibt. Wir exemplifizieren diese Handhabung an einem Beispiel aus der historischen Semantik, und damit an einem Bereich, der erst in jüngerer Zeit durch die Texttechnologie erschlossen wird.}}

@ARTICLE{Mehler:Weiss:Luecking:2010:a,
journal={Entropy},
author={Mehler, Alexander and Lücking, Andy and Wei{\ss}, Petra},
year={2010},
title={A Network Model of Interpersonal Alignment},
volume={12},
pages={1440-1483},
number={6},
doi={10.3390/e12061440},
website={http://www.mdpi.com/1099-4300/12/6/1440/},
pdf={http://www.mdpi.com/1099-4300/12/6/1440/pdf},
abstract={In dyadic communication, both interlocutors adapt to each other linguistically, that is, they align interpersonally. In this article, we develop a framework for modeling interpersonal alignment in terms of the structural similarity of the interlocutors’ dialog lexica. This is done by means of so-called two-layer time-aligned network series, that is, a time-adjusted graph model. The graph model is partitioned into two layers, so that the interlocutors’ lexica are captured as subgraphs of an encompassing dialog graph. Each constituent network of the series is updated utterance-wise. Thus, both the inherent bipartition of dyadic conversations and their gradual development are modeled. The notion of alignment is then operationalized within a quantitative model of structure formation based on the mutual information of the subgraphs that represent the interlocutor’s dialog lexica. By adapting and further developing several models of complex network theory, we show that dialog lexica evolve as a novel class of graphs that have not been considered before in the area of complex (linguistic) networks. Additionally, we show that our framework allows for classifying dialogs according to their alignment status. To the best of our knowledge, this is the first approach to measuring alignment in communication that explores the similarities of graph-like cognitive representations.}}

@ARTICLE{Luecking:Mehler:2011,
journal={International Journal of Signs and Semiotic Systems},
author={Lücking, Andy and Mehler, Alexander},
year={2011},
volume={1},
pages={18-38},
number={1},
title={A Model of Complexity Levels of Meaning Constitution in Simulation Models of Language Evolution},
abstract={Currently, some simulative accounts exist within dynamic or evolutionary frameworks that are concerned with the development of linguistic categories within a population of language users. Although these studies mostly emphasize that their models are abstract, the paradigm categorization domain is preferably that of colors. In this paper, the authors argue that color adjectives are special predicates in both linguistic and metaphysical terms: semantically, they are intersective predicates, metaphysically, color properties can be empirically reduced onto purely physical properties. The restriction of categorization simulations to the color paradigm systematically leads to ignoring two ubiquitous features of natural language predicates, namely relativity and context-dependency. Therefore, the models for simulation models of linguistic categories are not able to capture the formation of categories like perspective-dependent predicates ‘left’ and ‘right’, subsective predicates like ‘small’ and ‘big’, or predicates that make reference to abstract objects like ‘I prefer this kind of situation’. The authors develop a three-dimensional grid of ascending complexity that is partitioned according to the semiotic triangle. They also develop a conceptual model in the form of a decision grid by means of which the complexity level of simulation models of linguistic categorization can be assessed in linguistic terms.}}

@INPROCEEDINGS{Mehler:Gleim:Waltinger:Ernst:Esch:Feith:2009,
booktitle={Proceedings of the Symposium "Sprachtechnologie und eHumanities", 26.–27. Februar, Duisburg-Essen University},
author={Mehler, Alexander and Gleim, Rüdiger and Waltinger, Ulli and Ernst, Alexandra and Esch, Dietmar and Feith, Tobias},
website={http://duepublico.uni-duisburg-essen.de/servlets/DocumentServlet?id=37041},
year={2009},
title={eHumanities Desktop – eine webbasierte Arbeitsumgebung für die geisteswissenschaftliche Fachinformatik}}

@INPROCEEDINGS{Clarke:Mehler:1999,
booktitle={Proceedings of the 7th International Congress of the IASS-AIS: International Association for Semiotic Studies – Sign Processes in Complex Systems, Dresden, University of Technology, October 6-11},
author={Clarke, Rodney and Mehler, Alexander},
year={1999},
title={Theorising Print Media in Contexts: A Systemic Semiotic Contribution to Computational Semiotics}}

@BOOK{Mehler:Sharoff:Santini:2010:a,
publisher={Springer},
booktitle={Genres on the Web: Computational Models and Empirical Studies},
editor={Mehler, Alexander and Sharoff, Serge and Santini, Marina},
author={Mehler, Alexander and Sharoff, Serge and Santini, Marina},
year={2010},
pagetotal={376},
title={Genres on the Web: Computational Models and Empirical Studies},
website={http://www.springer.com/computer/ai/book/978-90-481-9177-2},
abstract={The volume 'Genres on the Web' has been designed for a wide audience, from the expert to the novice. It is a required book for scholars, researchers and students who want to become acquainted with the latest theoretical, empirical and computational advances in the expanding field of web genre research. The study of web genre is an overarching and interdisciplinary novel area of research that spans from corpus linguistics, computational linguistics, NLP, and text-technology, to web mining, webometrics, social network analysis and information studies. This book gives readers a thorough grounding in the latest research on web genres and emerging document types. The book covers a wide range of web-genre focussed subjects, such as: -The identification of the sources of web genres -Automatic web genre identification -The presentation of structure-oriented models -Empirical case studies One of the driving forces behind genre research is the idea of a genre-sensitive information system, which incorporates genre cues complementing the current keyword-based search and retrieval applications.}}

@BOOK{Sutter:Mehler:2010,
publisher={Verlag für Sozialwissenschaften},
editor={Sutter, Tilmann and Mehler, Alexander},
author={Sutter, Tilmann and Mehler, Alexander},
year={2010},
pagetotal={289},
title={Medienwandel als Wandel von Interaktionsformen – von frühen Medienkulturen zum Web 2.0},
website={http://www.springer.com/de/book/9783531156422},
abstract={Die Beitr{\"a}ge des Bandes untersuchen den Medienwandel von frühen europ{\"a}ischen Medienkulturen bis zu aktuellen Formen der Internetkommunikation unter soziologischer, kulturwissenschaftlicher und linguistischer Perspektive. Zwar haben sich die Massenmedien von den Beschr{\"a}nkungen sozialer Interaktionen gelöst, sie weisen dem Publikum aber eine distanzierte, blo{\ss} rezipierende Rolle zu. Dagegen eröffnen neue Formen 'interaktiver' Medien gesteigerte Möglichkeiten der Rückmeldung und der Mitgestaltung für die Nutzer. Der vorliegende Band fragt nach der Qualit{\"a}t dieses Medienwandels: Werden Medien tats{\"a}chlich interaktiv? Was bedeutet die Interaktivit{\"a}t neuer Medien? Werden die durch neue Medien eröffneten Beteiligungsmöglichkeiten realisiert?}}

@INPROCEEDINGS{Wagner:Mehler:Wolff:Dotzler:2009,
booktitle={Proceedings of the Symposium "Sprachtechnologie und eHumanities", 26.–27. Februar, Duisburg-Essen University},
author={Wagner, Benno and Mehler, Alexander and Wolff, Christian and Dotzler, Bernhard},
website={http://epub.uni-regensburg.de/6795/},
year={2009},
title={Bausteine eines Literary Memory Information System (LiMeS) am Beispiel der Kafka-Forschung},
abstract={In dem Paper beschreiben wir Bausteine eines Literary Memory Information System (LiMeS), das die literaturwissenschaftliche Erforschung von so genannten Matrixtexten – das sind Prim{\"a}rtexte eines bestimmten literarischen Gesamtwerks – unter dem Blickwinkel gro{\ss}er Mengen so genannter Echotexte (Topia 1984; Wagner/Reinhard 2007) – das sind Subtexte im Sinne eines literaturwissenschaftlichen Intertextualit{\"a}tsbegriffs – ermöglicht. Den Ausgangspunkt dieses computerphilologischen Informationssystems bildet ein Text-Mining-Modell basierend auf dem Intertextualit{\"a}tsbegriff in Verbindung mit dem Begriff des Semantic Web (Mehler, 2004b, 2005a, b, Wolff 2005). Wir zeigen, inwiefern dieses Modell über bestehende Informationssystemarchitekturen hinausgeht und schlie{\ss}en einen Brückenschlag zur derzeitigen Entwicklung von Arbeitsumgebungen in der geisteswissenschaftlichen Fachinformatik in Form eines eHumanities Desktop.},

@ARTICLE{Mehler:Wolff:2005:b,
journal={Journal for Language Technology and Computational Linguistics (JLCL)},
pages={1-18},
number={1},
author={Mehler, Alexander and Wolff, Christian},
volume={20},
year={2005},
title={Einleitung: Perspektiven und Positionen des Text Mining},
website={http://epub.uni-regensburg.de/6844/},
abstract={Beitr{\"a}ge zum Thema Text Mining beginnen vielfach mit dem Hinweis auf die enorme Zunahme online verfügbarer Dokumente, ob nun im Internet oder in Intranets (Losiewicz et al. 2000; Merkl 2000; Feldman 2001; Mehler 2001; Joachims \& Leopold 2002). Der hiermit einhergehenden „Informationsflut“ wird das Ungenügen des Information Retrieval (IR) bzw. seiner g{\"a}ngigen Verfahren der Informationsaufbereitung und Informationserschlie{\ss}ung gegenübergestellt. Es wird bem{\"a}ngelt, dass sich das IR weitgehend darin erschöpft, Teilmengen von Textkollektionen auf Suchanfragen hin aufzufinden und in der Regel blo{\ss} listenförmig anzuordnen. Das auf diese Weise dargestellte Spannungsverh{\"a}ltnis von Informationsexplosion und Defiziten bestehender IR-Verfahren bildet den Hintergrund für die Entwicklung von Verfahren zur automatischen Verarbeitung textueller Einheiten, die sich st{\"a}rker an den Anforderungen von Informationssuchenden orientieren. Anders ausgedrückt: Mit der Einführung der Neuen Medien w{\"a}chst die Bedeutung digitalisierter Dokumente als Prim{\"a}rmedium für die Verarbeitung, Verbreitung und Verwaltung von Information in öffentlichen und betrieblichen Organisationen. Dabei steht wegen der Menge zu verarbeitender Einheiten die Alternative einer intellektuellen Dokumenterschlie{\ss}ung nicht zur Verfügung. Andererseits wachsen die Anforderung an eine automatische Textanalyse, der das klassische IR nicht gerecht wird. Der Mehrzahl der hiervon betroffenen textuellen Einheiten fehlt die explizite Strukturiertheit formaler Datenstrukturen. Vielmehr weisen sie je nach Text- bzw. Dokumenttyp ganz unterschiedliche Strukturierungsgrade auf. Dabei korreliert die Flexibilit{\"a}t der Organisationsziele negativ mit dem Grad an explizierter Strukturiertheit und positiv mit der Anzahl jener Texte und Texttypen (E-Mails, Memos, Expertisen, technische Dokumentationen etc.), die im Zuge ihrer Realisierung produziert bzw. rezipiert werden. Vor diesem Hintergrund entsteht ein Bedarf an Texttechnologien, die ihren Benutzern nicht nur „intelligente“ Schnittstellen zur Textrezeption anbieten, sondern zugleich auf inhaltsorientierte Textanalysen zielen, um auf diese Weise aufgabenrelevante Daten explorieren und kontextsensitiv aufbereiten zu helfen.  Das Text Mining ist mit dem Versprechen verbunden, eine solche Technologie darzustellen bzw. sich als solche zu entwickeln.  Dieser einheitlichen Problembeschreibung stehen konkurrierende Textmining-Spezifikationen gegenüber, was bereits die Vielfalt der Namensgebungen verdeutlicht. So finden sich neben der Bezeichnung Text Mining (Joachims \& Leopold 2002; Tan 1999) die Alternativen • Text Data Mining (Hearst 1999b; Merkl 2000), • Textual Data Mining (Losiewicz et al. 2000), • Text Knowledge Engineering (Hahn \& Schnattinger 1998), Knowledge Discovery in Texts (Kodratoff 1999) oder Knowledge Discovery in Textual Databases (Feldman \& Dagan 1995).  Dabei l{\"a}sst bereits die Namensgebung erkennen, dass es sich um Analogiebildungen zu dem (nur unwesentlich {\"a}lteren) Forschungsgebiet des Data Mining (DM; als Bestandteil des Knowledge Discovery in Databases – KDD) handelt. Diese Namensvielfalt findet ihre Entsprechung in widerstreitenden Aufgabenzuweisungen. So setzt beispielsweise Sebastiani (2002) Informationsextraktion und Text Mining weitgehend gleich, wobei er eine Schnittmenge zwischen Text Mining und Textkategorisierung ausmacht (siehe auch Dörre et al. 1999). Demgegenüber betrachten Kosala \& Blockeel (2000) Informationsextraktion und Textkategorisierung lediglich als Teilbereiche des ihrer Ansicht nach umfassenderen Text Mining, w{\"a}hrend Hearst (1999a) im Gegensatz hierzu Informationsextraktion und Textkategorisierung explizit aus dem Bereich des explorativen Text Mining ausschlie{\ss}t.}}

@INPROCEEDINGS{Waltinger:Mehler:Wegner:2009,
booktitle={Proceedings of the 5th International Conference on Web Information Systems and Technologies (WEBIST '09), March 23-26, 2009, Lisboa},
author={Waltinger, Ulli and Mehler, Alexander and Wegner, Armin},
year={2009},
title={A Two-Level Approach to Web Genre Classification},
abstract={This paper presents an approach of two-level categorization of web pages. In contrast to related approaches the model additionally explores and categorizes functionally and thematically demarcated segments of the hypertext types to be categorized. By classifying these segments conclusions can be drawn about the type of the corresponding compound web document.},
pdf={http://www.ulliwaltinger.de/pdf/Webist_2009_TwoLevel_Genre_Classification_WaltingerMehlerWegner.pdf}}

@ARTICLE{Mehler:Abramov:Diewald:2011:a,
journal={Computer Speech and Language},
website={http://www.sciencedirect.com/science/article/pii/S0885230810000434},
abstract={In this article, we test a variant of the Sapir-Whorf Hypothesis in the area of complex network theory. This is done by analyzing social ontologies as a new resource for automatic language classification. Our method is to solely explore structural features of social ontologies in order to predict family resemblances of languages used by the corresponding communities to build these ontologies. This approach is based on a reformulation of the Sapir-Whorf Hypothesis in terms of distributed cognition. Starting from a corpus of 160 Wikipedia-based social ontologies, we test our variant of the Sapir-Whorf Hypothesis by several experiments, and find out that we outperform the corresponding baselines. All in all, the article develops an approach to classify linguistic networks of tens of thousands of vertices by exploring a small range of mathematically well-established topological indices.},
author={Mehler, Alexander and Abramov, Olga and Diewald, Nils},
year={2011},
title={Geography of Social Ontologies: Testing a Variant of the Sapir-Whorf Hypothesis in the Context of Wikipedia},
volume={25},
number={3},
pages={716-740},
doi={10.1016/j.csl.2010.05.006}}

@INCOLLECTION{Mehler:Gleim:2006:b,
publisher={Gedit},
booktitle={WaCky! Working Papers on the Web as Corpus},
pages={191-224},
author={Mehler, Alexander and Gleim, Rüdiger},
editor={Baroni, Marco and Bernardini, Silvia},
year={2006},
title={The Net for the Graphs – Towards Webgenre Representation for Corpus Linguistic Studies},
website={http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.510.4125},

}@BOOK{Mehler:2005:e,
series={Journal for Language Technology and Computational Linguistics (JLCL)},
volume={20(2)},
editor={Mehler, Alexander},
author={Mehler, Alexander},
year={2005},
pagetotal={97},
title={Korpuslinguistik},
website={http://www.jlcl.org/2005_Heft2/LDV_Forum_Band_20_Heft_2.pdf}}

@INPROCEEDINGS{Mehler:Geibel:Gleim:Herold:Jain:Pustylnikov:2007,
pdf={http://ikw.uni-osnabrueck.de/~ott06/ott06-abstracts/Mehler_Geibel_abstract.pdf},
booktitle={Proceedings of OTT '06 – Ontologies in Text Technology: Approaches to Extract Semantic Knowledge from Structured Information},
pages={63-71},
author={Mehler, Alexander and Geibel, Peter and Gleim, Rüdiger and Herold, Sebastian and Jain, Brijnesh-Johannes and Abramov, Olga},
series={Publications of the Institute of Cognitive Science (PICS)},
editor={Mönnich, Uwe and Kühnberger, Kai-Uwe},
year={2007},
title={Much Ado About Text Content. Learning Text Types Solely by Structural Differentiae},
abstract={In this paper, we deal with classifying texts into classes which denote text types whose textual instances serve more or less homogeneous functions. Other than mainstream approaches to text classification, which rely on the vector space model [30] or some of its descendants [2] and, thus, on content-related lexical features, we solely refer to structural differentiae, that is, to patterns of text structure as determinants of class membership. Further, we suppose that text types span a type hierarchy based on the type-subtype relation [31]. Thus, although we admit that class membership is fuzzy so that overlapping classes are inevitable, we suppose a non-overlapping type system structured into a rooted tree – whether solely based on functional or additional on, e.g., content- or mediabased criteria [1]. What regards criteria of goodness of classification, we perform a classical supervised categorization experiment [30] based on cross-validation as a method of model selection [11]. That is, we perform a categorization experiment in which for all training and test cases class membership is known ex ante. In summary, we perform a supervised experiment of text classification in order to learn functionally grounded text types where membership to these types is solely based on structural criteria.}}

@INPROCEEDINGS{Gleim:Mehler:Dehmer:2006:a,
booktitle={Proceedings of the EACL 2006 Workshop on Web as Corpus, April 3-7, 2006, Trento, Italy},
pages={67-74},
author={Gleim, Rüdiger and Mehler, Alexander and Dehmer, Matthias},
year={2006},
title={Web Corpus Mining by Instance of Wikipedia},
pdf={http://www.aclweb.org/anthology/W06-1710},
website={http://pub.uni-bielefeld.de/publication/1773538}}

@INPROCEEDINGS{Dehmer:Mehler:Emmert-Streib:2007:a,
booktitle={Proceedings of the 2007 International Conference on Machine Learning: Models, Technologies \& Applications (MLMTA '07), June 25-28, 2007, Las Vegas},
pages={113-117},
author={Dehmer, Matthias and Mehler, Alexander and Emmert-Streib, Frank},
year={2007},
title={Graph-theoretical Characterizations of Generalized Trees},
website={https://www.researchgate.net/publication/221188591_Graph-theoretical_Characterizations_of_Generalized_Trees}}

@INPROCEEDINGS{Gleim:Mehler:Eikmeyer:2007:a,
booktitle={Proceedings of the Corpus Linguistics 2007 Conference, Birmingham (UK)},
author={Gleim, Rüdiger and Mehler, Alexander and Eikmeyer, Hans-Jürgen},
year={2007},
title={Representing and Maintaining Large Corpora}}

@INPROCEEDINGS{Mehler:2002:f,
publisher={Peter Lang},
booktitle={Sprachwissenschaft auf dem Weg in das dritte Jahrtausend. Proceedings of the 34th Linguistics Colloquium, September 7-10, 1999, Universit{\"a}t Mainz},
pages={725-733},
author={Mehler, Alexander},
editor={Rapp, Reinhard},
year={2002},
title={Cohesive Paths: Applying the Concept of Cohesion to Hypertext},

@INCOLLECTION{Mehler:2009:b,
publisher={Springer},
booktitle={Linguistic Modeling of Information and Markup Languages. Contributions to Language Technology},
author={Mehler, Alexander},
series={Text, Speech and Language Technology},
editor={Witt, Andreas and Metzing, Dieter},
year={2009},
title={Structure Formation in the Web. A Graph-Theoretical Model of Hypertext Types},
abstract={In this chapter we develop a representation model of web document networks. Based on the notion of uncertain web document structures, the model is defined as a template which grasps nested manifestation levels of hypertext types. Further, we specify the model on the conceptual, formal and physical level and exemplify it by reconstructing competing web document models.}}

@INPROCEEDINGS{Gleim:Mehler:Waltinger:Menke:2009,
booktitle={5th Corpus Linguistics Conference, University of Liverpool},
author={Gleim, Rüdiger and Mehler, Alexander and Waltinger, Ulli and Menke, Peter},
year={2009},
title={eHumanities Desktop – An extensible Online System for Corpus Management and Analysis},
abstract={This paper presents the eHumanities Desktop - an online system for corpus management and analysis in support of computing in the humanities. Design issues and the overall architecture are described, as well as an outline of the applications offered by the system.},
website={http://www.ulliwaltinger.de/ehumanities-desktop-an-extensible-online-system-for-corpus-management-and-analysis/},
pdf={http://www.ulliwaltinger.de/pdf/eHumanitiesDesktop-AnExtensibleOnlineSystem-CL2009.pdf}}

@INPROCEEDINGS{Mehler:Dehmer:Gleim:2005,
publisher={Lang},
booktitle={Sprachtechnologie, mobile Kommunikation und linguistische Ressourcen. Beitr{\"a}ge zur GLDV-Frühjahrstagung '05, 10. M{\"a}rz – 01. April 2005, Universit{\"a}t Bonn},
pages={158-174},
author={Mehler, Alexander and Dehmer, Matthias and Gleim, Rüdiger},
editor={Fisseni, Bernhard and Schmitz, Hans-Christina and Schröder, Bernhard and Wagner, Petra},
year={2005},
title={Zur Automatischen Klassifikation von Webgenres},

@INPROCEEDINGS{Mehler:Luecking:2009,
publisher={IEEE},
booktitle={Proceedings of IEEE Africon 2009, September 23-25, Nairobi, Kenya},
author={Mehler, Alexander and Lücking, Andy},
year={2009},
title={A Structural Model of Semiotic Alignment: The Classification of Multimodal Ensembles as a Novel Machine Learning Task},
abstract={In addition to the well-known linguistic alignment processes in dyadic communication – e.g., phonetic, syntactic, semantic alignment – we provide evidence for a genuine multimodal alignment process, namely semiotic alignment. Communicative elements from different modalities 'routinize into' cross-modal 'super-signs', which we call multimodal ensembles. Computational models of human communication are in need of expressive models of multimodal ensembles. In this paper, we exemplify semiotic alignment by means of empirical examples of the building of multimodal ensembles. We then propose a graph model of multimodal dialogue that is expressive enough to capture multimodal ensembles. In line with this model, we define a novel task in machine learning with the aim of training classifiers that can detect semiotic alignment in dialogue. This model is in support of approaches which need to gain insights into realistic human-machine communication.}}

@INCOLLECTION{Mehler:2009:c,
publisher={Wiley-VCH},
booktitle={Analysis of Complex Networks: From Biology to Linguistics},
pages={175-220},
author={Mehler, Alexander},
editor={Dehmer, Matthias and Emmert-Streib, Frank},
website={https://www.researchgate.net/publication/255666602_1_Generalised_Shortest_Paths_Trees_A_Novel_Graph_Class_Applied_to_Semiotic_Networks},
year={2009},
title={Generalized Shortest Paths Trees: A Novel Graph Class Applied to Semiotic Networks},

}@BOOK{Mehler:Lobin:2004:a,
publisher={Verlag für Sozialwissenschaften},
editor={Mehler, Alexander and Lobin, Henning},
author={Mehler, Alexander and Lobin, Henning},
year={2004},
pagetotal={290},
title={Automatische Textanalyse. Systeme und Methoden zur Annotation und Analyse natürlichsprachlicher Texte},
website={http://www.v-r.de/de/Mehler-Lobin-Automatische-Textanalyse/t/352526527/}}

@BOOK{Mehler:Wolff:2005:a,
series={Journal for Language Technology and Computational Linguistics (JLCL)},
volume={20(1)},
editor={Mehler, Alexander and Wolff, Christian},
author={Mehler, Alexander and Wolff, Christian},
publisher={GSCL},
year={2005},
pagetotal={143},
title={Text Mining},
website={http://www.jlcl.org/2005_Heft1/LDV-Forum1.2005.pdf}}

@INCOLLECTION{Mehler:2008:b,
publisher={De Gruyter},
booktitle={Corpus Linguistics. An International Handbook of the Science of Language and Society},
pages={328–382},
author={Mehler, Alexander},
editor={Lüdeling, Anke and Kytö, Merja},
year={2008},
title={Large Text Networks as an Object of Corpus Linguistic Studies},

@INPROCEEDINGS{Geibel:Pustylnikov:Mehler:Gust:Kuehnberger:2007,
publisher={Springer},
booktitle={Proceedings of ICONIP 2007 (14th International Conference on Neural Information Processing)},
pages={779–788},
author={Geibel, Peter and Abramov, Olga and Mehler, Alexander and Gust, Helmar and Kühnberger, Kai-Uwe},
series={Lecture Notes in Computer Science 4985},
year={2007},
title={Classification of Documents Based on the Structure of Their DOM Trees},
abstract={In this paper, we discuss kernels that can be applied for the classification of XML documents based on their DOM trees. DOM trees are ordered trees in which every node might be labeled by a vector of attributes including its XML tag and the textual content. We describe five new kernels suitable for such structures: a kernel based on predefined structural features, a tree kernel derived from the well-known parse tree kernel, the set tree kernel that allows permutations of children, the string tree kernel being an extension of the so-called partial tree kernel, and the soft tree kernel as a more efficient alternative. We evaluate the kernels experimentally on a corpus containing the DOM trees of newspaper articles and on the well-known SUSANNE corpus.}}

@INPROCEEDINGS{Mehler:2002:k,
publisher={Morgan Kaufmann},
booktitle={Proceedings of the 19th International Conference on Computational Linguistics (COLING '02), August 24 – September 1, 2002, Taipei, Taiwan},
pages={646-652},
author={Mehler, Alexander},
year={2002},
title={Hierarchical Orderings of Textual Units},
abstract={Text representation is a central task for any approach to automatic learning from texts. It requires a format which allows to interrelate texts even if they do not share content words, but deal with similar topics. Furthermore, measuring text similarities raises the question of how to organize the resulting clusters. This paper presents cohesion trees (CT) as a data structure for the perspective, hierarchical organization of text corpora. CTs operate on alternative text representation models taking lexical organization, quantitative text characteristics, and text structure into account. It is shown that CTs realize text linkages which are lexically more homogeneous than those produced by minimal spanning trees.}}

}
• B. Jussen, A. Mehler, and A. Ernst, “A Corpus Management System for Historical Semantics,” Sprache und Datenverarbeitung. International Journal for Language Data Processing, vol. 31, iss. 1-2, pp. 81-89, 2007.
[Abstract] [BibTeX]

Der Beitrag beschreibt ein Korpusmanagementsystem für die historische Semantik. Die Grundlage hierfür bildet ein Bedeutungsbegriff, der – methodologisch gesprochen – auf der Analyse diachroner Korpora beruht. Das Ziel der Analyse dieser Korpora besteht darin, Bedeutungswandel als eine Bezugsgröße für den Wandel sozialer Systeme zu untersuchen. Das vorgestellte Korpusmanagementsystem unterstützt diese Art der korpusbasierten historischen Semantik.
@ARTICLE{Jussen:Mehler:Ernst:2007,
journal={Sprache und Datenverarbeitung. International Journal for Language Data Processing},
pages={81-89},
number={1-2},
author={Jussen, Bernhard and Mehler, Alexander and Ernst, Alexandra},
volume={31},
year={2007},
title={A Corpus Management System for Historical Semantics},
abstract={Der Beitrag beschreibt ein Korpusmanagementsystem für die historische Semantik. Die Grundlage hierfür bildet ein Bedeutungsbegriff, der – methodologisch gesprochen – auf der Analyse diachroner Korpora beruht. Das Ziel der Analyse dieser Korpora besteht darin, Bedeutungswandel als eine Bezugsgrö{\ss}e für den Wandel sozialer Systeme zu untersuchen. Das vorgestellte Korpusmanagementsystem unterstützt diese Art der korpusbasierten historischen Semantik.}}
• A. Mehler and R. Köhler, “Machine Learning in a Semiotic Perspective,” in Aspects of Automatic Text Analysis, A. Mehler and R. Köhler, Eds., Berlin/New York: Springer, 2007, pp. 1-29.
[Abstract] [BibTeX]

Gegenstand des folgenden Aufsatzes ist der konnotative Aspekt der Bedeutungen von Texten. Den Ausgangspunkt der Überlegungen zur Konnotation des Textes bildet die Auffassung, wonach Wort- und Textbedeutungskonstitution Ergebnis eines zirkulären Prozesses sind, der für die Emergenz einer Hierarchie ineinander geschachtelter Spracheinheiten verantwortlich zeichnet. Der Prozeß der Zeichenartikulation erfolgt entlang dieser Ebenen und erzeugt durch Verbindung von (konnotativer) Inhalts- und Ausdrucksseite auf Textebene das Textzeichen. Im Gegensatz zu einer strikten Interpretation des Fregeschen Kompositionalitätsprinzips, derzufolge die Bedeutungen sprachlicher Einheiten als fixierte, kontextfreie Größen vorauszusetzen sind, behandelt der vorliegende Ansatz bereits die lexikalische Bedeutung als Größe, die in Abhängigkeit von ihrem Kontext variieren kann. Aus semiotischer Perspektive ist es vor allem der Gestaltcharakter, welcher die konnotative Textbedeutung einer Anwendung des FregePrinzips entzieht. Anders ausgedrückt: Die konnotative Bedeutung eines Textes ist keineswegs in eine Struktur 'atomarer' Repräsentationen zerlegbar. Die hierarchische Organisation von Texten erweist sich insofern als komplex, als ihre Bedeutungen aus einem zirkulären Prozeß resultieren, der bestätigend und/oder verändernd auf die Bedeutungen der Textkonstituenten einwirkt. Diese Zirkularität bedingt, daß Texte nicht nur als Orte der Manifestation von Wortbedeutungsstrukturen anzusehen sind, sondern zugleich als Ausgangspunkte für die Modifikation und Emergenz solcher Strukturen dienen. Im folgenden wird unter Rekurs auf den Kopenhagener Strukturalismus ein Modell der konnotativen Bedeutung von Texten entwickelt, das sich unter anderem an dem glossematischen Begriff der Konstante orientiert. Die Formalisierung des Modells erfolgt mit Hilfe des Konzeptes der unscharfen Menge. Zu diesem Zweck werden die unscharfen Verwendungsregularitäten von Wörtern auf der Basis eines zweistufigen Verfahrens analysiert, welches die syntagmatischen und paradigmatischen Regularitäten des Wortgebrauches berücksichtigt. Die Rolle der Satzebene innerhalb des Prozesses der konnotativen Textbedeutungskonstitution wird angedeutet. Abschließend erfolgt eine Exemplifizierung des Algorithmus anhand der automatischen Analyse eines Textcorpus.
@INCOLLECTION{Mehler:Koehler:2007:b,
publisher={Springer},
booktitle={Aspects of Automatic Text Analysis},
website={http://rd.springer.com/chapter/10.1007/978-3-540-37522-7_1},
pages={1-29},
abstract={Gegenstand des folgenden Aufsatzes ist der konnotative Aspekt der Bedeutungen von Texten. Den Ausgangspunkt der {\"U}berlegungen zur Konnotation des Textes bildet die Auffassung, wonach Wort- und Textbedeutungskonstitution Ergebnis eines zirkul{\"a}ren Prozesses sind, der für die Emergenz einer Hierarchie ineinander geschachtelter Spracheinheiten verantwortlich zeichnet. Der Proze{\ss} der Zeichenartikulation erfolgt entlang dieser Ebenen und erzeugt durch Verbindung von (konnotativer) Inhalts- und Ausdrucksseite auf Textebene das Textzeichen. Im Gegensatz zu einer strikten Interpretation des Fregeschen Kompositionalit{\"a}tsprinzips, derzufolge die Bedeutungen sprachlicher Einheiten als fixierte, kontextfreie Grö{\ss}en vorauszusetzen sind, behandelt der vorliegende Ansatz bereits die lexikalische Bedeutung als Grö{\ss}e, die in Abh{\"a}ngigkeit von ihrem Kontext variieren kann. Aus semiotischer Perspektive ist es vor allem der Gestaltcharakter, welcher die konnotative Textbedeutung einer Anwendung des FregePrinzips entzieht. Anders ausgedrückt: Die konnotative Bedeutung eines Textes ist keineswegs in eine Struktur 'atomarer' Repr{\"a}sentationen zerlegbar. Die hierarchische Organisation von Texten erweist sich insofern als komplex, als ihre Bedeutungen aus einem zirkul{\"a}ren Proze{\ss} resultieren, der best{\"a}tigend und/oder ver{\"a}ndernd auf die Bedeutungen der Textkonstituenten einwirkt. Diese Zirkularit{\"a}t bedingt, da{\ss} Texte nicht nur als Orte der Manifestation von Wortbedeutungsstrukturen anzusehen sind, sondern zugleich als Ausgangspunkte für die Modifikation und Emergenz solcher Strukturen dienen. Im folgenden wird unter Rekurs auf den Kopenhagener Strukturalismus ein Modell der konnotativen Bedeutung von Texten entwickelt, das sich unter anderem an dem glossematischen Begriff der Konstante orientiert. Die Formalisierung des Modells erfolgt mit Hilfe des Konzeptes der unscharfen Menge. Zu diesem Zweck werden die unscharfen Verwendungsregularit{\"a}ten von Wörtern auf der Basis eines zweistufigen Verfahrens analysiert, welches die syntagmatischen und paradigmatischen Regularit{\"a}ten des Wortgebrauches berücksichtigt. Die Rolle der Satzebene innerhalb des Prozesses der konnotativen Textbedeutungskonstitution wird angedeutet. Abschlie{\ss}end erfolgt eine Exemplifizierung des Algorithmus anhand der automatischen Analyse eines Textcorpus.},
author={Mehler, Alexander and Köhler, Reinhard},
series={Studies in Fuzziness and Soft Computing},
editor={Mehler, Alexander and Köhler, Reinhard},
year={2007},
title={Machine Learning in a Semiotic Perspective},
address={Berlin/New York}}
• A. Mehler, U. Waltinger, and A. Wegner, “A Formal Text Representation Model Based on Lexical Chaining,” in Proceedings of the KI 2007 Workshop on Learning from Non-Vectorial Data (LNVD 2007) September 10, Osnabrück, Osnabrück, 2007, pp. 17-26.
[Abstract] [BibTeX]

This paper presents a formal text representation model as an alternative to the vector space model. It combines a tree-like model with graph-inducing lexical relations. The paper aims at formalizing two yet unrelated approaches, i.e. lexical chaining [3] and quantitative structure analysis [9], in order to combine content and structure modeling.
@INPROCEEDINGS{Mehler:Waltinger:Wegner:2007:a,
publisher={Universit{\"a}t Osnabrück},
booktitle={Proceedings of the KI 2007 Workshop on Learning from Non-Vectorial Data (LNVD 2007) September 10, Osnabrück},
pages={17-26},
author={Mehler, Alexander and Waltinger, Ulli and Wegner, Armin},
editor={Geibel, Peter and Jain, Brijnesh J.},
year={2007},
title={A Formal Text Representation Model Based on Lexical Chaining},
pdf={http://www.ulliwaltinger.de/pdf/LNVD07MehlerWaltingerWegner.pdf},
abstract={This paper presents a formal text representation model as an alternative to the vector space model. It combines a tree-like model with graph-inducing lexical relations. The paper aims at formalizing two yet unrelated approaches, i.e. lexical chaining [3] and quantitative structure analysis [9], in order to combine content and structure modeling.}}

### 2006 (16)

• A. Mehler, “In Search of a Bridge Between Network Analysis in Computational Linguistics and Computational Biology-A Conceptual Note.,” in BIOCOMP, 2006, pp. 496-502.
[BibTeX]

@inproceedings{mehler:2006,
title={In Search of a Bridge Between Network Analysis in Computational Linguistics and Computational Biology-A Conceptual Note.},
author={Mehler, Alexander},
booktitle={BIOCOMP},
pages={496--502},
year={2006},
pdf={https://pdfs.semanticscholar.org/81aa/0b840ed413089d69908cff60628a92609ccd.pdf}
}
• T. vor der Brück and S. Busemann, “Automatic Error Correction for Tree-Mapping Grammars,” in Proceedings of KONVENS 2006, Konstanz, Germany, 2006, pp. 1-8.
[Abstract] [BibTeX]

Tree mapping grammars are used in natural language generation (NLG) to map non-linguistic input onto a derivation tree from which the target text can be trivially read off as the terminal yield. Such grammars may consist of a large number of rules. Finding errors is quite tedious and sometimes very time-consuming. Often the generation fails because the relevant input subtree is not specified correctly. This work describes a method to detect and correct wrong assignments of input subtrees to grammar categories by cross-validating grammar rules with the given input structures. The result is implemented in a grammar development workbench and helps accelerating the grammar writer's work considerably.
@INPROCEEDINGS{vor:der:Brueck:Busemann:2006,
url={http://pi7.fernuni-hagen.de/brueck/papers/brueck-busemann-konvens06.pdf},
booktitle={Proceedings of KONVENS 2006},
pages={1--8},
author={vor der Brück, Tim and Busemann, Stephan},
month={October},
year={2006},
isbn={3-89318-050-8},
title={Automatic Error Correction for Tree-Mapping Grammars},
abstract={Tree mapping grammars are used in natural language generation (NLG) to map non-linguistic input onto a derivation tree from which the target text can be trivially read off as the terminal yield. Such grammars may consist of a large number of rules. Finding errors is quite tedious and sometimes very time-consuming. Often the generation fails because the relevant input subtree is not specified correctly. This work describes a method to detect and correct wrong assignments of input subtrees to grammar categories by cross-validating grammar rules with the given input structures. The result is implemented in a grammar development workbench and helps accelerating the grammar writer's work considerably.},
website={http://www.dfki.de/lt/publication_show.php?id=3602},
annote={editor: Miriam Butt}}
• A. Kranstedt, A. Lücking, T. Pfeiffer, H. Rieser, and M. Staudacher, “Measuring and Reconstructing Pointing in Visual Contexts,” in brandial ’06 — Proceedings of the 10th Workshop on the Semantics and Pragmatics of Dialogue, Potsdam, 2006, pp. 82-89.
[Abstract] [BibTeX]

We describe an experiment to gather original data on geometrical aspects of pointing. In particular, we are focusing upon the concept of the pointing cone, a geometrical model of a pointing’s extension. In our setting we employed methodological and technical procedures of a new type to integrate data from annotations as well as from tracker recordings. We combined exact information on position and orientation with rater’s classifications. Our first results seem to challenge classical linguistic and philosophical theories of demonstration in that they advise to separate pointings from reference.
@INPROCEEDINGS{Kranstedt:et:al:2006:c,
publisher={Universit{\"a}tsverlag Potsdam},
booktitle={brandial '06 -- Proceedings of the 10th Workshop on the Semantics and Pragmatics of Dialogue},
pages={82--89},
author={Kranstedt, Alfred and Lücking, Andy and Pfeiffer, Thies and Rieser, Hannes and Staudacher, Marc},
keywords={own},
editor={David Schlangen and Raquel Fernández},
month={9},
year={2006},
title={Measuring and Reconstructing Pointing in Visual Contexts},
abstract={We describe an experiment to gather original data on geometrical aspects of pointing. In particular, we are focusing upon the concept of the pointing cone, a geometrical model of a pointing’s extension. In our setting we employed methodological and technical procedures of a new type to integrate data from annotations as well as from tracker recordings. We combined exact information on position and orientation with rater’s classifications. Our first results seem to challenge classical linguistic and philosophical theories of demonstration in that they advise to separate pointings from reference.},
website={http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.144.8472},
address={Potsdam}}
• A. Lücking, H. Rieser, and M. Staudacher, “Multi-modal Integration for Gesture and Speech,” in brandial ’06 — Proceedings of the 10th Workshop on the Semantics and Pragmatics of Dialogue, Potsdam, 2006, pp. 106-113.
[Abstract] [BibTeX]

Demonstratives, in particular gestures that 'only' accompany speech, are not a big issue in current theories of grammar. If we deal with gestures, fixing their function is one big problem, the other one is how to integrate the representations originating from different channels and, ultimately, how to determine their composite meanings. The growing interest in multi-modal settings, computer simulations, human-machine interfaces and VR-applications increases the need for theories of multi-modal structures and events. In our workshop-contribution we focus on the integration of multi-modal contents and investigate different approaches dealing with this problem such as Johnston et al. (1997) and Johnston (1998), Johnston and Bangalore (2000), Chierchia (1995), Asher (2005), and Rieser (2005).
@INPROCEEDINGS{Luecking:Rieser:Staudacher:2006:a,
publisher={Universit{\"a}tsverlag Potsdam},
booktitle={brandial '06 -- Proceedings of the 10th Workshop on the Semantics and Pragmatics of Dialogue},
pages={106--113},
author={Lücking, Andy and Rieser, Hannes and Staudacher, Marc},
keywords={own},
editor={David Schlangen and Raquel Fernández},
month={9},
year={2006},
title={Multi-modal Integration for Gesture and Speech},
abstract={Demonstratives, in particular gestures that 'only' accompany speech, are not a big issue in current theories of grammar. If we deal with gestures, fixing their function is one big problem, the other one is how to integrate the representations originating from different channels and, ultimately, how to determine their composite meanings. The growing interest in multi-modal settings, computer simulations, human-machine interfaces and VR-applications increases the need for theories of multi-modal structures and events. In our workshop-contribution we focus on the integration of multi-modal contents and investigate different approaches dealing with this problem such as Johnston et al. (1997) and Johnston (1998), Johnston and Bangalore (2000), Chierchia (1995), Asher (2005), and Rieser (2005).},
address={Potsdam}}
• A. Kranstedt, A. Lücking, T. Pfeiffer, H. Rieser, and I. Wachsmuth, “Deictic Object Reference in Task-oriented Dialogue,” in Situated Communication, G. Rickheit and I. Wachsmuth, Eds., Berlin: De Gruyter Mouton, 2006, pp. 155-207.
[Abstract] [BibTeX]

This chapter presents an original approach towards a detailed understanding of the usage of pointing gestures accompanying referring expressions. This effort is undertaken in the context of human-machine interaction integrating empirical studies, theory of grammar and logics, and simulation techniques. In particular, we take steps to classify the role of pointing in deictic expressions and to model the focussed area of pointing gestures, the so-called pointing cone. This pointing cone serves as a central concept in a formal account of multi-modal integration at the linguistic speech-gesture interface as well as in a computational model of processing multi-modal deictic expressions.
@INCOLLECTION{Kranstedt:et:al:2006:b,
publisher={De Gruyter Mouton},
booktitle={Situated Communication},
pages={155--207},
author={Kranstedt, Alfred and Lücking, Andy and Pfeiffer, Thies and Rieser, Hannes and Wachsmuth, Ipke},
keywords={own},
editor={Gert Rickheit and Ipke Wachsmuth},
year={2006},
title={Deictic Object Reference in Task-oriented Dialogue},
website={http://pub.uni-bielefeld.de/publication/1894485},
abstract={This chapter presents an original approach towards a detailed understanding of the usage of pointing gestures accompanying referring expressions. This effort is undertaken in the context of human-machine interaction integrating empirical studies, theory of grammar and logics, and simulation techniques. In particular, we take steps to classify the role of pointing in deictic expressions and to model the focussed area of pointing gestures, the so-called pointing cone. This pointing cone serves as a central concept in a formal account of multi-modal integration at the linguistic speech-gesture interface as well as in a computational model of processing multi-modal deictic expressions.},
address={Berlin}}
• A. Kranstedt, A. Lücking, T. Pfeiffer, H. Rieser, and I. Wachsmuth, “Deixis: How to Determine Demonstrated Objects Using a Pointing Cone,” in Gesture in Human-Computer Interaction and Simulation, S. Gibet, N. Courty, and J. Kamp, Eds., Berlin: Springer, 2006, pp. 300-311.
[Abstract] [BibTeX]

We present a collaborative approach towards a detailed understanding of the usage of pointing gestures accompanying referring expressions. This effort is undertaken in the context of human-machine interaction integrating empirical studies, theory of grammar and logics, and simulation techniques. In particular, we attempt to measure the precision of the focussed area of a pointing gesture, the so-called pointing cone. The pointing cone serves as a central concept in a formal account of multi-modal integration at the linguistic speech-gesture interface as well as in a computational model of processing multi-modal deictic expressions.
@INCOLLECTION{Kranstedt:et:al:2006:a,
publisher={Springer},
booktitle={Gesture in Human-Computer Interaction and Simulation},
pages={300--311},
anote={6th International Gesture Workshop, Berder Island, France, 2005, Revised Selected Papers},
author={Kranstedt, Alfred and Lücking, Andy and Pfeiffer, Thies and Rieser, Hannes and Wachsmuth, Ipke},
keywords={own},
editor={Sylvie Gibet and Nicolas Courty and Jean-Francois Kamp},
year={2006},
title={Deixis: How to Determine Demonstrated Objects Using a Pointing Cone},
abstract={We present a collaborative approach towards a detailed understanding of the usage of pointing gestures accompanying referring expressions. This effort is undertaken in the context of human-machine interaction integrating empirical studies, theory of grammar and logics, and simulation techniques. In particular, we attempt to measure the precision of the focussed area of a pointing gesture, the so-called pointing cone. The pointing cone serves as a central concept in a formal account of multi-modal integration at the linguistic speech-gesture interface as well as in a computational model of processing multi-modal deictic expressions.}}
• T. Pfeiffer, A. Kranstedt, and A. Lücking, “Sprach-Gestik Experimente mit IADE, dem Interactive Augmented Data Explorer,” in Proceedings: Dritter Workshop Virtuelle und Erweiterte Realität der GI-Fachgruppe VR/AR, Koblenz, 2006.
[Abstract] [BibTeX]

Für die empirische Erforschung natürlicher menschlicher Kommunikation sind wir auf die Akquise und Auswertung umfangreicher Daten angewiesen. Die Modalitäten, über die sich Menschen ausdrücken können, sind sehr unterschiedlich - und genauso verschieden sind die Repräsentationen, mit denen sie für die Empirie verfügbar gemacht werden können. Für eine Untersuchung des Zeigeverhaltens bei der Referenzierung von Objekten haben wir mit IADE ein Framework für die Aufzeichnung, Analyse und Resimulation von Sprach-Gestik Daten entwickelt. Mit dessen Hilfe können wir für unsere Forschung entscheidende Fortschritte in der linguistischen Experimentalmethodik machen.
@INPROCEEDINGS{Pfeiffer:Kranstedt:Luecking:2006,
booktitle={Proceedings: Dritter Workshop Virtuelle und Erweiterte Realit{\"a}t der GI-Fachgruppe VR/AR},
author={Pfeiffer, Thies and Kranstedt, Alfred and Lücking, Andy},
keywords={own},
year={2006},
title={Sprach-Gestik Experimente mit IADE, dem Interactive Augmented Data Explorer},
abstract={Für die empirische Erforschung natürlicher menschlicher Kommunikation sind wir auf die Akquise und Auswertung umfangreicher Daten angewiesen. Die Modalit{\"a}ten, über die sich Menschen ausdrücken können, sind sehr unterschiedlich - und genauso verschieden sind die Repr{\"a}sentationen, mit denen sie für die Empirie verfügbar gemacht werden können. Für eine Untersuchung des Zeigeverhaltens bei der Referenzierung von Objekten haben wir mit IADE ein Framework für die Aufzeichnung, Analyse und Resimulation von Sprach-Gestik Daten entwickelt. Mit dessen Hilfe können wir für unsere Forschung entscheidende Fortschritte in der linguistischen Experimentalmethodik machen.},
website={http://pub.uni-bielefeld.de/publication/2426853},
address={Koblenz}}
• A. Lücking, H. Rieser, and M. Staudacher, “SDRT and Multi-modal Situated Communication,” in brandial ’06 — Proceedings of the 10th Workshop on the Semantics and Pragmatics of Dialogue, Potsdam, 2006, pp. 72-79.
[BibTeX]

@INPROCEEDINGS{Luecking:Rieser:Stauchdacher:2006:b,
publisher={Universit{\"a}tsverlag Potsdam},
booktitle={brandial '06 -- Proceedings of the 10th Workshop on the Semantics and Pragmatics of Dialogue},
pages={72--79},
author={Lücking, Andy and Rieser, Hannes and Staudacher, Marc},
keywords={own},
editor={David Schlangen and Raquel Fernández},
month={9},
year={2006},
title={SDRT and Multi-modal Situated Communication},
={},
website={http://publishup.uni-potsdam.de/opus4-ubp/frontdoor/index/index/docId/949},
address={Potsdam}}
• M. Z. Islam and M. Khan, “JKimmo: A Multilingual Computational Morphology Framework for PC-KIMMO,” in 9th International Conference on Computer and Information Technology (ICCIT 2006), Dhaka, Bangladesh, 2006.
[Abstract] [BibTeX]

Morphological analysis is of fundamental interest in computational linguistics and language processing. While there are established morphological analyzers for mostly Western and a few other languages using localized interfaces, the same cannot be said for Indic and other less-studied languages for which language processing is just beginning. There are three primary obstacles to computational morphological analysis of these less-studied languages: the generative rules that define the language morphology, the morphological processor, and the computational interface that a linguist can use to experiment with the generative rules. In this paper, we present JKimmo, a multilingual morphological open-source framework that uses the PC-KIMMO two-level morphological processor and provides a localized interface for Bangla morphological analysis. We then apply Jkimmo to Bangla computational morphology, demonstrating both its recognition and generation capabilities. Jkimmo’s internationalization (i18n) frame-work allows easy localization in other languages as well, using a property file for the interface definitions and a transliteration scheme for the analysis.
@INPROCEEDINGS{Zahurul:Khan:2006,
owner={zahurul},
booktitle={9th International Conference on Computer and Information Technology (ICCIT 2006), Dhaka, Bangladesh},
author={Islam, Md. Zahurul and Khan, Mumit},
timestamp={2011.08.02},
year={2006},
abstract={Morphological analysis is of fundamental interest in computational linguistics and language processing. While there are established morphological analyzers for mostly Western and a few other languages using localized interfaces, the same cannot be said for Indic and other less-studied languages for which language processing is just beginning. There are three primary obstacles to computational morphological analysis of these less-studied languages: the generative rules that define the language morphology, the morphological processor, and the computational interface that a linguist can use to experiment with the generative rules. In this paper, we present JKimmo, a multilingual morphological open-source framework that uses the PC-KIMMO two-level morphological processor and provides a localized interface for Bangla morphological analysis. We then apply Jkimmo to Bangla computational morphology, demonstrating both its recognition and generation capabilities. Jkimmo’s internationalization (i18n) frame-work allows easy localization in other languages as well, using a property file for the interface definitions and a transliteration scheme for the analysis.},
title={JKimmo: A Multilingual Computational Morphology Framework for PC-KIMMO},
website={https://www.researchgate.net/publication/237728403_JKimmo_A_Multilingual_Computational_Morphology_Framework_for_PC-KIMMO},
pdf={https://hucompute.org/wp-content/uploads/2015/08/JKimmo_-A_Multilingual_Computational_Morphology_Framework_for_PC-KIMMO.pdf}}
• T. Rownok, M. Z. Islam, and M. Khan, “Bangla Text Input and Rendering Support for Short Message Service on Mobile Devices,” in 9th International Conference on Computer and Information Technology (ICCIT 2006), Dhaka, Bangladesh, 2006.
[Abstract] [BibTeX]

Technology is the most important thing that involve in our everyday life. It is involving in almost every aspect of life like communication, work, shopping, recreation etc. Communication through mobile devices is the most effective and easy way now a day. It is faster, easier and you can communicate whenever you want from any-where. Mobile messaging or short message service is one of the popular ways to communicate using mobile devices. It is a big challenge to write and display Bangla characters on mobile devices. In this paper, we describe a Bangla text input method and rendering support on mobile devices for short message service.
@INPROCEEDINGS{Rownok:Zahurul:Khan:2006,
owner={zahurul},
booktitle={9th International Conference on Computer and Information Technology (ICCIT 2006), Dhaka, Bangladesh},
author={Rownok, Tofazzal and Islam, Md. Zahurul and Khan, Mumit},
timestamp={2011.08.02},
year={2006},
title={Bangla Text Input and Rendering Support for Short Message Service on Mobile Devices},
abstract={Technology is the most important thing that involve in our everyday life. It is involving in almost every aspect of life like communication, work, shopping, recreation etc. Communication through mobile devices is the most effective and easy way now a day. It is faster, easier and you can communicate whenever you want from any-where. Mobile messaging or short message service is one of the popular ways to communicate using mobile devices. It is a big challenge to write and display Bangla characters on mobile devices. In this paper, we describe a Bangla text input method and rendering support on mobile devices for short message service.},
pdf={https://hucompute.org/wp-content/uploads/2015/08/Bangla_Text_Input_and_Rendering_Support_for_Short_Message_Service_on_Mobile_Devices.pdf}}
• Y. Arafat, M. Z. Islam, and M. Khan, “Analysis and Observations From a Bangla news corpus,” in 9th International Conference on Computer and Information Technology (ICCIT 2006), Dhaka, Bangladesh, 2006.
[BibTeX]

@INPROCEEDINGS{Arafat:Zahurul:Khan:2006,
owner={zahurul},
booktitle={9th International Conference on Computer and Information Technology (ICCIT 2006), Dhaka, Bangladesh},
author={Arafat, Yeasir and Islam, Md. Zahurul and Khan, Mumit},
timestamp={2011.08.02},
year={2006},
title={Analysis and Observations From a Bangla news corpus}}
• R. Gleim, “HyGraph – Ein Framework zur Extraktion, Repräsentation und Analyse webbasierter Hypertextstrukturen,” in Sprachtechnologie, mobile Kommunikation und linguistische Ressourcen. Beiträge zur GLDV-Tagung 2005, Universität Bonn, Frankfurt a. M., 2006, pp. 42-53.
[BibTeX]

@INPROCEEDINGS{Gleim:2006,
publisher={Lang},
pdf={http://www.hucompute.org/data/gleim/pdf/GLDV2005-HyGraph-Framework.pdf},
booktitle={Sprachtechnologie, mobile Kommunikation und linguistische Ressourcen. Beitr{\"a}ge zur GLDV-Tagung 2005, Universit{\"a}t Bonn},
pages={42-53},
author={Gleim, Rüdiger},
editor={Fisseni, Bernhard and Schmitz, Hans-Christian and Schröder, Bernhard and Wagner, Petra},
year={2006},
title={HyGraph - Ein Framework zur Extraktion, Repr{\"a}sentation und Analyse webbasierter Hypertextstrukturen},
website={https://www.researchgate.net/publication/268294000_HyGraph__Ein_Framework_zur_Extraktion_Reprsentation_und_Analyse_webbasierter_Hypertextstrukturen},
address={Frankfurt a. M.}}
• A. Mehler, “Text Linkage in the Wiki Medium – A Comparative Study,” in Proceedings of the EACL Workshop on New Text – Wikis and blogs and other dynamic text sources, April 3-7, 2006, Trento, Italy, 2006, pp. 1-8.
[Abstract] [BibTeX]

Workshop organizer: Jussi Karlgren
@INPROCEEDINGS{Mehler:2006:c,
booktitle={Proceedings of the EACL Workshop on New Text – Wikis and blogs and other dynamic text sources, April 3-7, 2006, Trento, Italy},
pages={1-8},
author={Mehler, Alexander},
editor={Karlgren, Jussi},
year={2006},
abstract={Workshop organizer: Jussi Karlgren},
title={Text Linkage in the Wiki Medium – A Comparative Study},
pdf={http://www.aclweb.org/anthology/W06-2801},
website={http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.165.6390}}
• A. Mehler, “Stratified Constraint Satisfaction Networks in Synergetic Multi-Agent Simulations of Language Evolution,” in Artificial Cognition Systems, A. Loula, R. Gudwin, and J. Queiroz, Eds., Hershey: Idea Group Inc., 2006, pp. 140-174.
[Abstract] [BibTeX]

Ehedem = Mehler:2005:e
@INCOLLECTION{Mehler:2006:e,
publisher={Idea Group Inc.},
booktitle={Artificial Cognition Systems},
pages={140-174},
author={Mehler, Alexander},
editor={Loula, Angelo and Gudwin, Ricardo and Queiroz, João},
year={2006},
abstract={Ehedem = Mehler:2005:e},
title={Stratified Constraint Satisfaction Networks in Synergetic Multi-Agent Simulations of Language Evolution},
address={Hershey}}
• A. Mehler and L. Sichelschmidt, “Reconceptualizing Latent Semantic Analysis in Terms of Complex Network Theory. A Corpus-Linguistic Approach,” in 2nd International Conference of the German Cognitive Linguistics Association – Theme Session: Cognitive-Linguistic Approaches: What can we gain by computational treatment of data? 5.-7. Oktober 2006, Ludwig-Maximilians-Universität München, 2006, pp. 23-26.
[BibTeX]

@INPROCEEDINGS{Mehler:Sichelschmidt:2006,
booktitle={2nd International Conference of the German Cognitive Linguistics Association – Theme Session: Cognitive-Linguistic Approaches: What can we gain by computational treatment of data? 5.-7. Oktober 2006, Ludwig-Maximilians-Universit{\"a}t München},
pages={23-26},
author={Mehler, Alexander and Sichelschmidt, Lorenz},
editors={Alonge, Antonietta and Lönneker-Rodman, Birte},
year={2006},
title={Reconceptualizing Latent Semantic Analysis in Terms of Complex Network Theory. A Corpus-Linguistic Approach},
pdf={http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87.5069&rep=rep1&type=pdf}}
• A. Mehler, M. Dehmer, and R. Gleim, “Towards Logical Hypertext Structure – A Graph-Theoretic Perspective,” in Proceedings of the Fourth International Workshop on Innovative Internet Computing Systems (I2CS ’04), Berlin/New York, 2006, pp. 136-150.
[Abstract] [BibTeX]

Facing the retrieval problem according to the overwhelming set of documents online the adaptation of text categorization to web units has recently been pushed. The aim is to utilize categories of web sites and pages as an additional retrieval criterion. In this context, the bag-of-words model has been utilized just as HTML tags and link structures. In spite of promising results this adaptation stays in the framework of IR specific models since it neglects the content-based structuring inherent to hypertext units. This paper approaches hypertext modelling from the perspective of graph-theory. It presents an XML-based format for representing websites as hypergraphs. These hypergraphs are used to shed light on the relation of hypertext structure types and their web-based instances. We place emphasis on two characteristics of this relation: In terms of realizational ambiguity we speak of functional equivalents to the manifestation of the same structure type. In terms of polymorphism we speak of a single web unit which manifests different structure types. It is shown that polymorphism is a prevalent characteristic of web-based units. This is done by means of a categorization experiment which analyses a corpus of hypergraphs representing the structure and content of pages of conference websites. On this background we plead for a revision of text representation models by means of hypergraphs which are sensitive to the manifold structuring of web documents.
@INPROCEEDINGS{Mehler:Dehmer:Gleim:2006,
publisher={Springer},
booktitle={Proceedings of the Fourth International Workshop on Innovative Internet Computing Systems (I2CS '04)},
website={http://rd.springer.com/chapter/10.1007/11553762_14},
pages={136-150},
author={Mehler, Alexander and Dehmer, Matthias and Gleim, Rüdiger},
series={Lecture Notes in Computer Science 3473},
editor={Böhme, Thomas and Heyer, Gerhard},
year={2006},
title={Towards Logical Hypertext Structure - A Graph-Theoretic Perspective},
abstract={Facing the retrieval problem according to the overwhelming set of documents online the adaptation of text categorization to web units has recently been pushed. The aim is to utilize categories of web sites and pages as an additional retrieval criterion. In this context, the bag-of-words model has been utilized just as HTML tags and link structures. In spite of promising results this adaptation stays in the framework of IR specific models since it neglects the content-based structuring inherent to hypertext units. This paper approaches hypertext modelling from the perspective of graph-theory. It presents an XML-based format for representing websites as hypergraphs. These hypergraphs are used to shed light on the relation of hypertext structure types and their web-based instances. We place emphasis on two characteristics of this relation: In terms of realizational ambiguity we speak of functional equivalents to the manifestation of the same structure type. In terms of polymorphism we speak of a single web unit which manifests different structure types. It is shown that polymorphism is a prevalent characteristic of web-based units. This is done by means of a categorization experiment which analyses a corpus of hypergraphs representing the structure and content of pages of conference websites. On this background we plead for a revision of text representation models by means of hypergraphs which are sensitive to the manifold structuring of web documents.}}

### 2005 (3)

• M. Z. Islam and M. Khan, “Teaching Compiler Development to Undergraduates using a Template Based Approach,” in 8th International Conference on Computer and Information Technology (ICCIT 2005), Dhaka, Bangladesh, 2005.
[Abstract] [BibTeX]

Compiler Design remains one of the most dreaded courses in any undergraduate Computer Science curriculum, due in part to the complexity and the breadth of the material covered in a typical 14-15 week semester time frame. The situation is further complicated by the fact that most undergraduates have never implemented a large enough software package that is needed for a working compiler, and to do so in such a short time span is a challenge indeed. This necessitates changes in the way we teach compilers, and specifically in ways we set up the project for the Compiler Design course at the undergraduate level. We describe a template based method for teaching compiler design and implementation to the undergraduates, where the students fill in the blanks in a set of templates for each phase of the compiler, starting from the lexical scanner to the code generator. Compilers for new languages can be implemented by modifying only the parts necessary to implement the syntax and the semantics of the language, leaving much of the remaining environment as is. The students not only learn how to design the various phases of the compiler, but also learn the software design and engineering techniques for implementing large software systems. In this paper, we describe a compiler teaching methodology that implements a full working compiler for an imperative C-like programming language with backend code generators for MIPS, Java Virtual Machine (JVM) and Microsoft’s .NET Common Language Runtime (CLR).
@INPROCEEDINGS{Zahurul:Khan:2005,
owner={zahurul},
booktitle={8th International Conference on Computer and Information Technology (ICCIT 2005), Dhaka, Bangladesh},
author={Islam, Md. Zahurul and Khan, Mumit},
timestamp={2011.08.02},
year={2005},
title={Teaching Compiler Development to Undergraduates using a Template Based Approach},
abstract={Compiler Design remains one of the most dreaded courses in any undergraduate Computer Science curriculum, due in part to the complexity and the breadth of the material covered in a typical 14-15 week semester time frame. The situation is further complicated by the fact that most undergraduates have never implemented a large enough software package that is needed for a working compiler, and to do so in such a short time span is a challenge indeed. This necessitates changes in the way we teach compilers, and specifically in ways we set up the project for the Compiler Design course at the undergraduate level. We describe a template based method for teaching compiler design and implementation to the undergraduates, where the students fill in the blanks in a set of templates for each phase of the compiler, starting from the lexical scanner to the code generator. Compilers for new languages can be implemented by modifying only the parts necessary to implement the syntax and the semantics of the language, leaving much of the remaining environment as is. The students not only learn how to design the various phases of the compiler, but also learn the software design and engineering techniques for implementing large software systems. In this paper, we describe a compiler teaching methodology that implements a full working compiler for an imperative C-like programming language with backend code generators for MIPS, Java Virtual Machine (JVM) and Microsoft’s .NET Common Language Runtime (CLR).},
website={http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.173.1323}}
• A. Mehler, “Eigenschaften der textuellen Einheiten und Systeme / Properties of Textual Units and Systems,” in Quantitative Linguistik. Ein internationales Handbuch / Quantitative Linguistics. An International Handbook, R. Köhler, G. Altmann, and R. G. Piotrowski, Eds., Berlin/New York: De Gruyter, 2005, pp. 325-348.
[BibTeX]

@INCOLLECTION{Mehler:2005:b,
publisher={De Gruyter},
booktitle={Quantitative Linguistik. Ein internationales Handbuch / Quantitative Linguistics. An International Handbook},
pages={325-348},
author={Mehler, Alexander},
editor={Köhler, Reinhard and Altmann, Gabriel and Piotrowski, Raijmund G.},
year={2005},
title={Eigenschaften der textuellen Einheiten und Systeme / Properties of Textual Units and Systems},
address={Berlin/New York}}
• A. Mehler, “Lexical Chaining as a Source of Text Chaining,” in Proceedings of the 1st Computational Systemic Functional Grammar Conference, University of Sydney, Australia, 2005, pp. 12-21.
[Abstract] [BibTeX]

July 16, 2005,
@INPROCEEDINGS{Mehler:2005:d,
booktitle={Proceedings of the 1st Computational Systemic Functional Grammar Conference, University of Sydney, Australia},
pdf={http://www.hucompute.org/media/pdf/CohesionTrees1.pdf},
pages={12-21},
author={Mehler, Alexander},
editor={Patrick, Jon and Matthiessen, Christian},
year={2005},
abstract={July 16, 2005,},
title={Lexical Chaining as a Source of Text Chaining}}

### 2004 (8)

• A. Eisele and T. vor der Brück, “Error-Tolerant Finite-State Lookup for Trademark Search,” in 27th German Conference on Artificial Intelligence (KI), Ulm, Germany, 2004. Springer Best Paper Award
[Abstract] [BibTeX]

Error-tolerant lookup of words in large vocabularies hasmany potential uses, both within and beyond natural language processing (NLP). This work describes a generic library for finite-state-based lexical lookup, originally designed for NLP-related applications, that can be adapted to application-specific error metrics. We show how this tool can be used for searching existing trademarks in a database, using orthographic and phonetic similarity. We sketch a prototypical implementation of a trademark search engine and show results of a preliminary evaluation of this system.
@INPROCEEDINGS{Eisele:vor:der:Brueck:2004,
publisher={Springer},
booktitle={27th German Conference on Artificial Intelligence (KI)},
author={Eisele, Andreas and vor der Brück, Tim},
editor={Susanne Biundo},
month={October},
year={2004},
note={Springer Best Paper Award},
specialnote={Best Paper Award},
title={Error-Tolerant Finite-State Lookup for Trademark Search},
abstract={Error-tolerant lookup of words in large vocabularies hasmany potential uses, both within and beyond natural language processing (NLP). This work describes a generic library for finite-state-based lexical lookup, originally designed for NLP-related applications, that can be adapted to application-specific error metrics. We show how this tool can be used for searching existing trademarks in a database, using orthographic and phonetic similarity. We sketch a prototypical implementation of a trademark search engine and show results of a preliminary evaluation of this system.}}
• M. Rohn, W. Raatz, and T. vor der Brück, “Objektive Optimierung der lokalen Wettervorhersage,” in DACH Meteorologenkonferenz, Karlsruhe, Germany, 2004.
[Abstract] [BibTeX]

Die lokale Wettervorhersage umfaßt einen Zeitraum von 0 bis 178 Stunden und muß daher die unterschiedlichsten Punktinformationen aus den Ergebnissen der numerischen Modellierung, konventioneller Beobachtungen von Bodenwetterelementen sowie Nowcasting-Produkten integrieren. Dabei liefern die Verfahren oft unterschiedliche Punktprognosen. Um eine Endvorhersage oder Guidance abzuleiten, müssen alle verfügbaren Informationen bezüglich ihrer Qualität bewertet werden, sodann eine Auswahl getroffen, und abschließend zu einer einzigen Aussage kombiniert werden. Dieses Problem von Selektion und Kombination verschiedener Vorhersageinformationen wird anschaulich von Winkler 1989 aus der Perspektive der Entscheidungstheorie beschrieben. In der täglichen Routine arbeit des Vorhersagemeteorologen wird diese Integration 'intuitiv' vollzogen, basierend auf seiner meteorologischen Erfahrung über die synoptische Situation sowie seiner Kenntnisse der lokalen Charakteristika des Prognoseortes. Der DWDplant, den Vorhersageprozeß durch ein Verfahren 'ObjektiveOptimierung' zu unterstützen, welches eine sog. Objektiv Optimierte Guidance OOG erzeugt. Das Verfahren umfaßt objektive Ansätze zur Kombination verschiedener Vorhersagedaten sowie die kontinuierliche Aktualisierung durch Beobachtungs- und Nowcastingdaten.
@INPROCEEDINGS{Rohn:Raatz:vor:der:Brueck:2004,
url={http://pi7.fernuni-hagen.de/brueck/papers/021_RoRaBr.pdf},
booktitle={DACH Meteorologenkonferenz},
author={Rohn, Michael and Raatz, Wolfgang and vor der Brück, Tim},
month={October},
year={2004},
title={Objektive Optimierung der lokalen Wettervorhersage},
abstract={Die lokale Wettervorhersage umfa{\ss}t einen Zeitraum von 0 bis 178 Stunden und mu{\ss} daher die unterschiedlichsten Punktinformationen aus den Ergebnissen der numerischen Modellierung, konventioneller Beobachtungen von Bodenwetterelementen sowie Nowcasting-Produkten integrieren. Dabei liefern die Verfahren oft unterschiedliche Punktprognosen. Um eine Endvorhersage oder Guidance abzuleiten, müssen alle verfügbaren Informationen bezüglich ihrer Qualit{\"a}t bewertet werden, sodann eine Auswahl getroffen, und abschlie{\ss}end zu einer einzigen Aussage kombiniert werden. Dieses Problem von Selektion und Kombination verschiedener Vorhersageinformationen wird anschaulich von Winkler 1989 aus der Perspektive der Entscheidungstheorie beschrieben. In der t{\"a}glichen Routine arbeit des Vorhersagemeteorologen wird diese Integration 'intuitiv' vollzogen, basierend auf seiner meteorologischen Erfahrung über die synoptische Situation sowie seiner Kenntnisse der lokalen Charakteristika des Prognoseortes. Der DWDplant, den Vorhersageproze{\ss} durch ein Verfahren 'ObjektiveOptimierung' zu unterstützen, welches eine sog. Objektiv Optimierte Guidance OOG erzeugt. Das Verfahren umfa{\ss}t objektive Ans{\"a}tze zur Kombination verschiedener Vorhersagedaten sowie die kontinuierliche Aktualisierung durch Beobachtungs- und Nowcastingdaten.},
address={Karlsruhe, Germany}}
• A. Lücking, H. Rieser, and J. Stegmann, “Statistical Support for the Study of Structures in Multi-Modal Dialogue: Inter-Rater Agreement and Synchronization,” in Catalog ’04—Proceedings of the Eighth Workshop on the Semantics and Pragmatics of Dialogue, Barcelona, 2004, pp. 56-63.
[Abstract] [BibTeX]

We present a statistical approach to assess relations that hold among speech and pointing gestures in and between turns in task-oriented dialogue. The units quantified over are the time-stamps of the XML-based annotation of the digital video data. It was found that, on average, gesture strokes do not exceed, but are freely distributed over the time span of their linguistic affiliates. Further, the onset of the affiliate was observed to occur earlier than gesture initiation. Moreover, we found that gestures do obey certain appropriateness conditions and contribute semantic content ('gestures save words') as well. Gestures also seem to play a functional role wrt dialogue structure: There is evidence that gestures can contribute to the bundle of features making up a turn-taking signal. Some statistical results support a partitioning of the domain, which is also reflected in certain rating difficulties. However, our evaluation of the applied annotation scheme generally resulted in very good agreement
@INPROCEEDINGS{Luecking:Rieser:Stegmann:2004,
organization={Department of Translation and Philology, Universitat Pompeu Fabra},
booktitle={Catalog '04---Proceedings of the Eighth Workshop on the Semantics and Pragmatics of Dialogue},
pages={56--63},
author={Lücking, Andy and Rieser, Hannes and Stegmann, Jens},
keywords={own},
editor={Jonathan Ginzburg and Enric Vallduví},
year={2004},
title={Statistical Support for the Study of Structures in Multi-Modal Dialogue: Inter-Rater Agreement and Synchronization},
abstract={We present a statistical approach to assess relations that hold among speech and pointing gestures in and between turns in task-oriented dialogue. The units quantified over are the time-stamps of the XML-based annotation of the digital video data. It was found that, on average, gesture strokes do not exceed, but are freely distributed over the time span of their linguistic affiliates. Further, the onset of the affiliate was observed to occur earlier than gesture initiation. Moreover, we found that gestures do obey certain appropriateness conditions and contribute semantic content ('gestures save words') as well. Gestures also seem to play a functional role wrt dialogue structure: There is evidence that gestures can contribute to the bundle of features making up a turn-taking signal. Some statistical results support a partitioning of the domain, which is also reflected in certain rating difficulties. However, our evaluation of the applied annotation scheme generally resulted in very good agreement},
address={Barcelona}}
• A. Mehler, “A Data-Oriented Model of Context in Hypertext Authoring,” in Proceedings of the 7th International Workshop on Organisational Semiotics (OS ’04), July 19-20, 2004, Setúbal, Portugal, Setúbal, 2004, pp. 24-45.
[BibTeX]

@INPROCEEDINGS{Mehler:2004:c,
publisher={INSTICC},
booktitle={Proceedings of the 7th International Workshop on Organisational Semiotics (OS '04), July 19-20, 2004, Setúbal, Portugal},
pages={24-45},
author={Mehler, Alexander},
editor={Filipe, Joaquim and Liu, Kecheng},
year={2004},
title={A Data-Oriented Model of Context in Hypertext Authoring},
website={http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.7944},
pdf={http://www.orgsem.org/papers/02.pdf},
address={Setúbal}}
• A. Mehler, “Automatische Synthese Internet-basierter Links für digitale Bibliotheken,” Osnabrücker Beiträge zur Sprachtheorie. Themenheft Internetbasierte Kommunikation, vol. 68, pp. 31-53, 2004.
[Abstract] [BibTeX]

Dieser Beitrag behandelt Verfahren zur automatischen Erzeugung von Hyperlinks, wie sie im WWW für die Informationssuche bereitstehen. Dabei steht die Frage im Vordergrund, auf welche Weise bestehende Verfahren suchrelevante Dokumente bestimmen und von diesen aus inhaltsverwandte Dokumente verlinken. Dieser Gegenstand verbindet den Bereich des klassischen Information Retrievals (IR) mit einem Anwendungsgebiet, das in der Wissenschaftskommunikation unter dem Stichwort der digitalen Bibliothek unter Nutzbarmachung des Hyperlink-basierten Browsings firmiert. Ein Beispiel hierfür bildet die digitale Bibliothek CiteSeer (Lawrence et al. 1999), welche das Boolesche Retrieval dadurch erweitert, dass ausgehend von Treffern einer Suche jene Dokumente per Link angesteuert werden können, welche die aufgefundenen Dokumente zitieren oder von diesen zitiert werden. CiteSeer ist also ein System, welches das Schlagwort-basierte Querying im Rahmen des klassischen IRs mit dem Hypertext-basierten Browsing von Zitaten verknüpft, und zwar zu dem Zweck, die Suche wissenschaftlicher Dokumente zu erleichtern. Darüber hinaus verwendet es die unter dem Stichwort des Vektorraummodells bekannt gewordene Technologie für den wortbasierten Vergleich von Texten. Der Beitrag setzt an dieser Stelle an. Er argumentiert, dass Verfahren bereitstehen, welche die Anforderung nach inhaltsorientiertem Retrieval mit dem inhaltsorientierten Browsing verbinden, mit der Forderung also, dass Hyperlinks, die E-Texte als digitalisierte Versionen von (wissenschaftlichen) Dokumenten verknüpfen (Storrer 2002), Inhalts- und nicht nur Zitat-basiert sind.
@ARTICLE{Mehler:2004:b,
journal={Osnabrücker Beitr{\"a}ge zur Sprachtheorie. Themenheft Internetbasierte Kommunikation},
pages={31-53},
author={Mehler, Alexander},
volume={68},
year={2004},
title={Automatische Synthese Internet-basierter Links für digitale Bibliotheken},
abstract={Dieser Beitrag behandelt Verfahren zur automatischen Erzeugung von Hyperlinks, wie sie im WWW für die Informationssuche bereitstehen. Dabei steht die Frage im Vordergrund, auf welche Weise bestehende Verfahren suchrelevante Dokumente bestimmen und von diesen aus inhaltsverwandte Dokumente verlinken. Dieser Gegenstand verbindet den Bereich des klassischen Information Retrievals (IR) mit einem Anwendungsgebiet, das in der Wissenschaftskommunikation unter dem Stichwort der digitalen Bibliothek unter Nutzbarmachung des Hyperlink-basierten Browsings firmiert. Ein Beispiel hierfür bildet die digitale Bibliothek CiteSeer (Lawrence et al. 1999), welche das Boolesche Retrieval dadurch erweitert, dass ausgehend von Treffern einer Suche jene Dokumente per Link angesteuert werden können, welche die aufgefundenen Dokumente zitieren oder von diesen zitiert werden. CiteSeer ist also ein System, welches das Schlagwort-basierte Querying im Rahmen des klassischen IRs mit dem Hypertext-basierten Browsing von Zitaten verknüpft, und zwar zu dem Zweck, die Suche wissenschaftlicher Dokumente zu erleichtern. Darüber hinaus verwendet es die unter dem Stichwort des Vektorraummodells bekannt gewordene Technologie für den wortbasierten Vergleich von Texten. Der Beitrag setzt an dieser Stelle an. Er argumentiert, dass Verfahren bereitstehen, welche die Anforderung nach inhaltsorientiertem Retrieval mit dem inhaltsorientierten Browsing verbinden, mit der Forderung also, dass Hyperlinks, die E-Texte als digitalisierte Versionen von (wissenschaftlichen) Dokumenten verknüpfen (Storrer 2002), Inhalts- und nicht nur Zitat-basiert sind.}}
• M. Dehmer, A. Mehler, and R. Gleim, “Aspekte der Kategorisierung von Webseiten,” in INFORMATIK 2004 – Informatik verbindet, Band 2, Beiträge der 34. Jahrestagung der Gesellschaft für Informatik e.V. (GI). Workshop Multimedia-Informationssysteme, 2004, pp. 39-43.
[Abstract] [BibTeX]

Im Zuge der Web-basierten Kommunikation tritt die Frage auf, inwiefern Webpages zum Zwecke ihrer inhaltsorientierten Filterung kategorisiert werden können. Diese Studie untersucht zwei Phänomene, welche die Bedingung der Möglichkeit einer solchen Kategorisierung betreffen (siehe [6]): Mit dem Begriff der funktionalen Aquivalenz beziehen wir uns auf das Phänomen, dass dieselbe Funktions- oder Inhaltskategorie durch völlig verschiedene Bausteine Web-basierter Dokumente manifestiert werden kann. Mit dem Begriff des Polymorphie beziehen wir uns auf das Phänomen, dass dasselbe Dokument zugleich mehrere Funktions- oder Inhaltskategorien manifestieren kann. Die zentrale Hypothese lautet, dass beide Phänomene für Web-basierte Hypertextstrukturen charakteristisch sind. Ist dies der Fall, so kann die automatische Kategorisierung von Hypertexten [2, 10] nicht mehr als eindeutige Zuordnung verstanden werden, bei der einem Dokument genau eine Kategorie zugeordnet wird. In diesem Sinne thematisiert das Papier die Frage nach der adäquaten Modellierung multimedialer Dokumente.
@INPROCEEDINGS{Dehmer:Mehler:Gleim:2004,
publisher={GI},
booktitle={INFORMATIK 2004 – Informatik verbindet, Band 2, Beitr{\"a}ge der 34. Jahrestagung der Gesellschaft für Informatik e.V. (GI). Workshop Multimedia-Informationssysteme},
pages={39-43},
author={Dehmer, Matthias and Mehler, Alexander and Gleim, Rüdiger},
series={Lecture Notes in Informatics},
volume={51},
year={2004},
title={Aspekte der Kategorisierung von Webseiten},
pdf={http://subs.emis.de/LNI/Proceedings/Proceedings51/GI-Proceedings.51-11.pdf},
website={https://www.researchgate.net/publication/221385316_Aspekte_der_Kategorisierung_von_Webseiten},
abstract={Im Zuge der Web-basierten Kommunikation tritt die Frage auf, inwiefern Webpages zum Zwecke ihrer inhaltsorientierten Filterung kategorisiert werden können. Diese Studie untersucht zwei Ph{\"a}nomene, welche die Bedingung der Möglichkeit einer solchen Kategorisierung betreffen (siehe [6]): Mit dem Begriff der funktionalen Aquivalenz beziehen wir uns auf das Ph{\"a}nomen, dass dieselbe Funktions- oder Inhaltskategorie durch völlig verschiedene Bausteine Web-basierter Dokumente manifestiert werden kann. Mit dem Begriff des Polymorphie beziehen wir uns auf das Ph{\"a}nomen, dass dasselbe Dokument zugleich mehrere Funktions- oder Inhaltskategorien manifestieren kann. Die zentrale Hypothese lautet, dass beide Ph{\"a}nomene für Web-basierte Hypertextstrukturen charakteristisch sind. Ist dies der Fall, so kann die automatische Kategorisierung von Hypertexten [2, 10] nicht mehr als eindeutige Zuordnung verstanden werden, bei der einem Dokument genau eine Kategorie zugeordnet wird. In diesem Sinne thematisiert das Papier die Frage nach der ad{\"a}quaten Modellierung multimedialer Dokumente.}}
• A. Mehler, “Textmodellierung: Mehrstufige Modellierung generischer Bausteine der Textähnlichkeitsmessung,” in Automatische Textanalyse: Systeme und Methoden zur Annotation und Analyse natürlichsprachlicher Texte, A. Mehler and H. Lobin, Eds., Wiesbaden: Verlag für Sozialwissenschaften, 2004, pp. 101-120.
[BibTeX]

@INCOLLECTION{Mehler:2003:d,
publisher={Verlag für Sozialwissenschaften},
booktitle={Automatische Textanalyse: Systeme und Methoden zur Annotation und Analyse natürlichsprachlicher Texte},
pages={101-120},
author={Mehler, Alexander},
editor={Mehler, Alexander and Lobin, Henning},
year={2004},
title={Textmodellierung: Mehrstufige Modellierung generischer Bausteine der Text{\"a}hnlichkeitsmessung},
address={Wiesbaden}}
• A. Mehler, “Quantitative Methoden,” in Texttechnologie. Perspektiven und Anwendungen, H. Lobin and Lemnitzer Lothar, Eds., Tübingen: Stauffenburg, 2004, pp. 83-107.
[BibTeX]

@INCOLLECTION{Mehler:2004:g,
publisher={Stauffenburg},
booktitle={Texttechnologie. Perspektiven und Anwendungen},
pages={83-107},
author={Mehler, Alexander},
editor={Lobin, Henning and Lemnitzer, Lothar,},
year={2004},
title={Quantitative Methoden},
address={Tübingen}}

### 2003 (4)

• A. Mehler, “Ein Kompositionalitätsprinzip für numerische Textsemantiken,” Journal for Language Technology and Computational Linguistics (JLCL), vol. 18, iss. 1-2, pp. 321-337, 2003.
[Abstract] [BibTeX]

Der Beitrag beschreibt eine Variante des Kompositionalitätsprinzips der Bedeutung als Grundprinzip für die numerische Analyse unsystematischer Sinnrelationen komplexer Zeichen, das über das Phänomen der perspektivischen Interpretation hinaus gebrauchssemantische Bedeutungsaspekte berücksichtigt. Ziel ist es, ein theoretisches Fundament für korpusanalytische Ansätze in der Semantik, die oftmals die linguistische Interpretierbarkeit ihrer Analyseergebnisse vermissen lassen, zu umreißen. Die Spezifikation des Kompositionalitätsprinzips erfolgt unter Rekurs auf das Modell eines hierarchisch geordneten Constraint-Satisfaction-Prozesses. Hiermit ist das längerfristige Ziel verbunden, das Problem einer defizitären numerischen Textrepräsentation sowie die mangelnde Integration von propositionaler und strukturaler bzw. korpusanalytischer Semantik anzugehen. Die Erörterungen dieses Beitrags sind primär konzeptioneller Natur; sie betreffen die Konzeption einer numerischen Textsemantik zur Vermeidung von Defiziten bestehender Ansätze.
@ARTICLE{Mehler:2003:c,
journal={Journal for Language Technology and Computational Linguistics (JLCL)},
pages={321-337},
number={1-2},
author={Mehler, Alexander},
volume={18},
year={2003},
title={Ein Kompositionalit{\"a}tsprinzip für numerische Textsemantiken},
pdf={http://media.dwds.de/jlcl/2003_Doppelheft/321-337_Mehler.pdf},
abstract={Der Beitrag beschreibt eine Variante des Kompositionalit{\"a}tsprinzips der Bedeutung als Grundprinzip für die numerische Analyse unsystematischer Sinnrelationen komplexer Zeichen, das über das Ph{\"a}nomen der perspektivischen Interpretation hinaus gebrauchssemantische Bedeutungsaspekte berücksichtigt. Ziel ist es, ein theoretisches Fundament für korpusanalytische Ans{\"a}tze in der Semantik, die oftmals die linguistische Interpretierbarkeit ihrer Analyseergebnisse vermissen lassen, zu umrei{\ss}en. Die Spezifikation des Kompositionalit{\"a}tsprinzips erfolgt unter Rekurs auf das Modell eines hierarchisch geordneten Constraint-Satisfaction-Prozesses. Hiermit ist das l{\"a}ngerfristige Ziel verbunden, das Problem einer defizit{\"a}ren numerischen Textrepr{\"a}sentation sowie die mangelnde Integration von propositionaler und strukturaler bzw. korpusanalytischer Semantik anzugehen. Die Erörterungen dieses Beitrags sind prim{\"a}r konzeptioneller Natur; sie betreffen die Konzeption einer numerischen Textsemantik zur Vermeidung von Defiziten bestehender Ans{\"a}tze.}}
• A. Mehler, “Methodological Aspects of Computational Semiotics,” SEED Journal, vol. 3, iss. 3, pp. 71-80, 2003.
[Abstract] [BibTeX]

In the following, elementary constituents of models in computational semiotics are outlined. This is done by referring to computer simulations as a framework which neither aims to describe artificial sign systems (as done in computer semiotics), nor to realize semiotic functions in “artificial worlds” (as proposed in “artificial semiosis”). Rather, the framework referred to focuses on preconditions of computer-based simulations of semiotic processes. Following this approach, the paper focuses on methodological aspects of computational semiotics.
@ARTICLE{Mehler:2003:b,
journal={SEED Journal},
pages={71-80},
number={3},
author={Mehler, Alexander},
volume={3},
year={2003},
title={Methodological Aspects of Computational Semiotics},
abstract={In the following, elementary constituents of models in computational semiotics are outlined. This is done by referring to computer simulations as a framework which neither aims to describe artificial sign systems (as done in computer semiotics), nor to realize semiotic functions in “artificial worlds” (as proposed in “artificial semiosis”). Rather, the framework referred to focuses on preconditions of computer-based simulations of semiotic processes. Following this approach, the paper focuses on methodological aspects of computational semiotics.}}
• A. Mehler, “Konnotative Textbedeutungen: zur Modellierung struktureller Aspekte der Bedeutungen von Texten,” in Korpuslinguistische Untersuchungen zur quantitativen und systemtheoretischen Linguistik, R. Köhler, Ed., Sankt Augustin: Gardez! Verlag, 2003, pp. 320-347.
[BibTeX]

@INCOLLECTION{Mehler:2003,
publisher={Gardez! Verlag},
booktitle={Korpuslinguistische Untersuchungen zur quantitativen und systemtheoretischen Linguistik},
pages={320-347},
author={Mehler, Alexander},
editor={Köhler, Reinhard},
year={2003},
title={Konnotative Textbedeutungen: zur Modellierung struktureller Aspekte der Bedeutungen von Texten},
pdf={http://ubt.opus.hbz-nrw.de/volltexte/2004/279/pdf/10_mehler.pdf},
address={Sankt Augustin}}
• A. Mehler and S. Reich, “Guided Tours + Trails := Guided Trails,” in Poster at the 14th ACM Conference on Hypertext and Hypermedia (Hypertext ’03), Nottingham, August 26-30, 2003, pp. 1-2.
[BibTeX]

@INPROCEEDINGS{Mehler:Reich:2003,
booktitle={Poster at the 14th ACM Conference on Hypertext and Hypermedia (Hypertext '03), Nottingham, August 26-30},
website={http://www.sigweb.org/Ht03posters},
pages={1-2},
author={Mehler, Alexander and Reich, Siegfried},
year={2003},
title={Guided Tours + Trails := Guided Trails}}

### 2002 (2)

• A. Mehler, “Hierarchical Analysis of Text Similarity Data,” Künstliche Intelligenz (KI), vol. 2, pp. 12-16, 2002.
[Abstract] [BibTeX]

Semantic spaces are used as a representational format for modeling similarities of signs. As a multidimensional data structure they are bound to the question of how to explore similarity relations of signs mapped onto them. This paper introduces an abstract data structure called dependency scheme as a formal format which encapsulates two types of order relations, whose variable instatiation allows to derive different classes of trees for the hierarchial analysis of text similarity data derived from semantic spaces.
@ARTICLE{Mehler:2002:a,
journal={Künstliche Intelligenz (KI)},
pages={12-16},
author={Mehler, Alexander},
volume={2},
year={2002},
title={Hierarchical Analysis of Text Similarity Data},
abstract={Semantic spaces are used as a representational format for modeling similarities of signs. As a multidimensional data structure they are bound to the question of how to explore similarity relations of signs mapped onto them. This paper introduces an abstract data structure called dependency scheme as a formal format which encapsulates two types of order relations, whose variable instatiation allows to derive different classes of trees for the hierarchial analysis of text similarity data derived from semantic spaces.}}
• A. Mehler, “Textbedeutungsrekonstruktion. Grundzüge einer Architektur zur Modellierung der Bedeutungen von Texten,” in Prozesse der Bedeutungskonstruktion, I. Pohl, Ed., Frankfurt a. M.: Peter Lang, 2002, pp. 445-486.
[BibTeX]

@INCOLLECTION{Mehler:2002:b,
publisher={Peter Lang},
booktitle={Prozesse der Bedeutungskonstruktion},
pages={445-486},
author={Mehler, Alexander},
editor={Pohl, Inge},
year={2002},
title={Textbedeutungsrekonstruktion. Grundzüge einer Architektur zur Modellierung der Bedeutungen von Texten},
address={Frankfurt a. M.}}

### 2001 (3)

• A. Mehler, Textbedeutung. Zur prozeduralen Analyse und Repräsentation struktureller Ähnlichkeiten von Texten / Text Meaning – Procedural Analysis and Representation of Structural Similarities of Texts, Frankfurt a. M.: Peter Lang, 2001, vol. 5. Zugl. Diss. Univ. Trier
[BibTeX]

@BOOK{Mehler:2001:a,
publisher={Peter Lang},
author={Mehler, Alexander},
series={Computer Studies in Language and Speech},
volume={5},
year={2001},
pagetotal={401},
title={Textbedeutung. Zur prozeduralen Analyse und Repr{\"a}sentation struktureller {\"A}hnlichkeiten von Texten / Text Meaning – Procedural Analysis and Representation of Structural Similarities of Texts},
website={https://www.peterlang.com/view/product/39259?tab=toc&format=PBK},
note={Zugl. Diss. Univ. Trier}}
• A. Mehler and R. Clarke, “Systemic Functional Hypertexts (SFHT): Modeling Contexts in Hypertexts,” in Organizational Semiotics. Evolving a Science of Information Systems, K. Liu, R. J. Clarke, P. B. Andersen, and R. K. Stamper, Eds., Boston: Kluwer, 2001, pp. 153-170.
[Abstract] [BibTeX]

IFIP TC8 / WG8.1 Working Conference on Organizational Semiotics. July 23-25, 2001, Montreal, Canada
@INCOLLECTION{Mehler:Clarke:2001,
publisher={Kluwer},
booktitle={Organizational Semiotics. Evolving a Science of Information Systems},
pages={153-170},
author={Mehler, Alexander and Clarke, Rodney},
editor={Liu, Kecheng and Clarke, Rodney J. and Andersen, Peter B. and Stamper, Ronald K.},
year={2001},
abstract={IFIP TC8 / WG8.1 Working Conference on Organizational Semiotics. July 23-25, 2001, Montreal, Canada},
title={Systemic Functional Hypertexts (SFHT): Modeling Contexts in Hypertexts},
website={http://link.springer.com/chapter/10.1007/978-0-387-35611-2_10}}
• A. Mehler, “Aspects of Text Mining. From Computational Semiotics to Systemic Functional Hypertexts,” Australasian Journal of Information Systems (AJIS), vol. 8, iss. 2, pp. 129-141, 2001.
[Abstract] [BibTeX]

The significance of natural language texts as the prime information structure for the management and dissemination of knowledge in organisations is still increasing. Making relevant documents available depending on varying tasks in different contexts is of primary importance for any efficient task completion. Implementing this demand requires the content based processing of texts, which enables to reconstruct or, if necessary, to explore the relationship of task, context and document. Text mining is a technology that is suitable for solving problems of this kind. In the following, semiotic aspects of text mining are investigated. Based on the primary object of text mining - natural language lexis - the specific complexity of this class of signs is outlined and requirements for the implementation of text mining procedures are derived. This is done with reference to text linkage introduced as a special task in text mining. Text linkage refers to the exploration of implicit, content based relations of texts (and their annotation as typed links in corpora possibly organised as hypertexts). In this context, the term systemic functional hypertext is introduced, which distinguishes genre and register layers for the management of links in a poly-level hypertext system
@ARTICLE{Mehler:2001:b,
journal={Australasian Journal of Information Systems (AJIS)},
pages={129-141},
number={2},
author={Mehler, Alexander},
volume={8},
year={2001},
title={Aspects of Text Mining. From Computational Semiotics to Systemic Functional Hypertexts},
website={http://journal.acs.org.au/index.php/ajis/article/view/249/220},
abstract={The significance of natural language texts as the prime information structure for the management and dissemination of knowledge in organisations is still increasing. Making relevant documents available depending on varying tasks in different contexts is of primary importance for any efficient task completion. Implementing this demand requires the content based processing of texts, which enables to reconstruct or, if necessary, to explore the relationship of task, context and document. Text mining is a technology that is suitable for solving problems of this kind. In the following, semiotic aspects of text mining are investigated. Based on the primary object of text mining - natural language lexis - the specific complexity of this class of signs is outlined and requirements for the implementation of text mining procedures are derived. This is done with reference to text linkage introduced as a special task in text mining. Text linkage refers to the exploration of implicit, content based relations of texts (and their annotation as typed links in corpora possibly organised as hypertexts). In this context, the term systemic functional hypertext is introduced, which distinguishes genre and register layers for the management of links in a poly-level hypertext system}}

### 1999 (1)

• A. Mehler, “Aspects of Text Semantics in Hypertext,” in Returning to our Diverse Roots. Proceedings of the 10th ACM Conference on Hypertext and Hypermedia (Hypertext ’99), February 21-25, 1999, Technische Universität Darmstadt, New York, 1999, pp. 25-26.
[BibTeX]

@INPROCEEDINGS{Mehler:1999,
publisher={ACM Press},
booktitle={Returning to our Diverse Roots. Proceedings of the 10th ACM Conference on Hypertext and Hypermedia (Hypertext '99), February 21-25, 1999, Technische Universit{\"a}t Darmstadt},
pages={25-26},
author={Mehler, Alexander},
editor={Tochtermann, Klaus and Westbomke, Jörg and Wiil, Uffe K. and Leggett, John J.},
year={1999},
title={Aspects of Text Semantics in Hypertext},
pdf={{http://dl.acm.org/ft_gateway.cfm?id=294477&ftid=30049&dwn=1&CFID=722943569&CFTOKEN=97409508}},
website={http://dl.acm.org/citation.cfm?id=294477}}

### 1998 (1)

• A. Mehler, “Toward Computational Aspects of Text Semiotics,” in Proceedings of the 1998 Joint Conference of IEEE ISIC, IEEE CIRA, and ISAS on the Science and Technology of Intelligent Systems, September 14-17, 1998, NIST, Gaithersburg, USA, Gaithersburg, 1998, pp. 807-813.
[BibTeX]

@INPROCEEDINGS{Mehler:1998,
publisher={IEEE},
booktitle={Proceedings of the 1998 Joint Conference of IEEE ISIC, IEEE CIRA, and ISAS on the Science and Technology of Intelligent Systems, September 14-17, 1998, NIST, Gaithersburg, USA},
pages={807-813},
author={Mehler, Alexander},
editor={Albus, James and Meystel, Alex},
year={1998},
title={Toward Computational Aspects of Text Semiotics},
website={http://www.researchgate.net/publication/3766784_Toward_computational_aspects_of_text_semiotics}}

### 1996 (1)

• A. Mehler, “A Multiresolutional Approach to Fuzzy Text Meaning — a First Attempt,” in Proceedings of the 1996 International Multidisciplinary Conference on Intelligent Systems: A Semiotic Perspective, Gaithersburg, Maryland, October 20-23, Gaithersburg, 1996, pp. 261-273.
[BibTeX]

@INPROCEEDINGS{Mehler:1996:a,
publisher={National Institute of Standards and Technology (NIST)},
booktitle={Proceedings of the 1996 International Multidisciplinary Conference on Intelligent Systems: A Semiotic Perspective, Gaithersburg, Maryland, October 20-23},
pages={261-273},
author={Mehler, Alexander},
volume={I},
editor={Albus, James and Meystel, Alex and Quintero, Richard},
year={1996},
title={A Multiresolutional Approach to Fuzzy Text Meaning -- a First Attempt},
address={Gaithersburg}}