Technical-Reports

Total: 8

2016 (2)

  • [PDF] A. Mehler and G. Abrami, “Stolperwege WiSe 2015 / 2016, Praktikumsbericht,” Text Technology Lab, Goethe University Frankfurt, TTLab-TR-2016-04, 2016.
    [BibTeX]

    @TechReport{StolperwegeWiSe15,
    author= {Mehler, Alexander and Abrami, Giuseppe},
    title= {Stolperwege WiSe 2015 / 2016, {Praktikumsbericht}},
    institution={Text Technology Lab},
    year = {2016},
    number={TTLab-TR-2016-04},
    address={Goethe University Frankfurt},
    pdf={https://hucompute.org/wp-content/uploads/2016/05/Stolperwege-WiSe2015-Praktikumsbericht.pdf}
    }
  • [PDF] A. Lücking, D. Kurfürst, D. Walther, M. Mauri, and A. Mehler, “FIGURE: Gesture Annotation,” Text Technology Lab, Goethe Universtiy Frankfurt, TTLab-TR-2016-05, 2016.
    [BibTeX]

    @TechReport{FIGURE:annotation,
    author = {Andy L\"{u}cking and Dennis Kurf\"{u}rst and D\'{e}sir\'{e}e Walther and Marcel Mauri and Alexander Mehler},
    title = {{FIGURE:} Gesture Annotation},
    institution = {Text Technology Lab},
    year = {2016} ,
    number = {TTLab-TR-2016-05},
    address = {Goethe Universtiy Frankfurt},
    pdf = {https://hucompute.org/wp-content/uploads/2016/05/Figure-Annotation-Manual_TTLabReport.pdf}
    }

2014 (1)

  • [PDF] A. Mehler, R. Gaitsch, and G. Abrami, “Stolperwege – Eine App zur Realisierung einer Public History of the Holocaust,” Text Technology Lab, Goethe University Frankfurt, TTLab-TR-2014-11, 2014.
    [BibTeX]

    @TechReport{Stolperwege2014,
    author={Mehler, Alexander and Gaitsch, Regina and Abrami, Giuseppe},
    title={Stolperwege - Eine App zur Realisierung einer Public History of the Holocaust},
    institution={Text Technology Lab},
    year={2014},
    number={TTLab-TR-2014-11},
    address={Goethe University Frankfurt},
    pdf={https://hucompute.org/wp-content/uploads/2016/11/Stolperwege-Report.pdf}
    }

2011 (1)

  • [PDF] S. Hartrumpf, H. Helbig, T. vor der Brück, and C. Eichhorn, “SemDupl: Semantic-based Duplicate Identification,” FernUniversität in Hagen, 359-07/2011, 2011.
    [BibTeX]

    @TECHREPORT{Hartrumpf:Helbig:vor:der:Brueck:Eichhorn:2011,
        institution={FernUniversit{\"a}t in Hagen},
        number={359-07/2011},
        author={Hartrumpf, Sven and Helbig, Hermann and vor der Brück, Tim and Eichhorn, Christian},
        year={2011},
        title={SemDupl: Semantic-based Duplicate Identification},
        pdf={https://hucompute.org/wp-content/uploads/2015/08/semdupl-tr.pdf}}

2008 (2)

  • [PDF] [http://pi7.fernuni-hagen.de/brueck/papers/DeLite_techreport.pdf] T. vor der Brück, H. Helbig, and J. Leveling, “The Readability Checker DeLite,” Fakultät für Mathematik und Informatik, FernUniversität in Hagen, 345-5/2008, 2008.
    [Abstract] [BibTeX]

    This report describes the DeLite readability checker which automatically assesses the linguistic accessibility of Web documents. The system computes readability scores for an arbitrary German text and highlights those parts of the text causing difficulties with regard to readability. The highlighting is done at different linguistic levels, beginning with surface effects closely connected to morphology (like complex words) down to deep semantic phenomena (like semantic ambiguity). DeLite uses advanced NLP technology realized as Web services and accessed via a clearly defined interface. The system has been trained and evaluated with 315 users validating a corpus of 500 texts (6135 sentences). The results of the human judgments regarding the readability of the texts have been used as a basis for automatically learning the parameter settings of the DeLite component which computes the readability scores. To demonstrate the transfer of this approach to another language (in this case to English), a feasibility study has been carried out on the basis of a core lexicon for En glish, and the parser has been adapted to the most important linguistic phenomena of English. Finally, recommendations for further guidelines regarding the linguistic aspects of accessibility to the Web are derived.
    @TECHREPORT{vor:der:Brueck:Helbig:Leveling:2008,
        url={http://pi7.fernuni-hagen.de/brueck/papers/DeLite_techreport.pdf},
        institution={Fakult{\"a}t für Mathematik und Informatik, FernUniversit{\"a}t in Hagen},
        number={345-5/2008},
        author={vor der Brück, Tim and Helbig, Hermann and Leveling, Johannes},
        website={http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.5207},
        year={2008},
        title={The Readability Checker DeLite},
        pdf={https://hucompute.org/wp-content/uploads/2015/08/DeLite_techreport.pdf},
        abstract={This report describes the DeLite readability checker which automatically assesses the linguistic accessibility of Web documents. The system computes readability scores for an arbitrary German text and highlights those parts of the text causing difficulties with regard to readability. The highlighting is done at different linguistic levels, beginning with surface effects closely connected to morphology (like complex words) down to deep semantic phenomena (like semantic ambiguity). DeLite uses advanced NLP technology realized as Web services and accessed via a clearly defined interface. The system has been trained and evaluated with 315 users validating a corpus of 500 texts (6135 sentences). The results of the human judgments regarding the readability of the texts have been used as a basis for automatically learning the parameter settings of the DeLite component which computes the readability scores. To demonstrate the transfer of this approach to another language (in this case to English), a feasibility study has been carried out on the basis of a core lexicon for En glish, and the parser has been adapted to the most important linguistic phenomena of English. Finally, recommendations for further guidelines regarding the linguistic aspects of accessibility to the Web are derived.}}
  • [PDF] T. vor der Brück, “Application of Machine Learning Algorithms for Automatic Knowledge Acquisition and Readability Analysis,” FernUniversität in Hagen, 346-6/2008, 2008.
    [Abstract] [BibTeX]

    A large knowledge base is a prerequisite for a lot of tasks in natural language processing (NLP). To build a handcrafted knowledge base, which is applicable to real world scenarios, a vast amount of effort is required. Furthermore, experts are needed with a strong background in linguistics, artificial intelligence and knowledge representation which may not be available to the extent necessary (knowledge acquisition bottleneck). For these reasons, machine learning techniques are widely used to construct a knowledge base automatically. Learning techniques are also relevant to many other areas, e.g., for readability analysis. In the latter area, a lot of work is needed to find the optimal settings for a readability formula and it usually involves a large amount of trial and error iterations. Thus, it is preferable to learn the necessary parameter settings automatically. This report investigates the application of machine learning techniques in both areas. Finally, several freely available machine learning tools, which can be employed to accomplish both tasks, are introduced and compared with each other.
    @TECHREPORT{vor:der:Brueck:2008,
        institution={FernUniversit{\"a}t in Hagen},
        number={346-6/2008},
        author={vor der Brück, Tim},
        year={2008},
        pdf={https://hucompute.org/wp-content/uploads/2015/08/knowaqu_en.pdf},
        title={Application of Machine Learning Algorithms for Automatic Knowledge Acquisition and Readability Analysis},
        abstract={A large knowledge base is a prerequisite for a lot of tasks in natural language processing (NLP). To build a handcrafted knowledge base, which is applicable to real world scenarios, a vast amount of effort is required. Furthermore, experts are needed with a strong background in linguistics, artificial intelligence and knowledge representation which may not be available to the extent necessary (knowledge acquisition bottleneck). For these reasons, machine learning techniques are widely used to construct a knowledge base automatically. Learning techniques are also relevant to many other areas, e.g., for readability analysis. In the latter area, a lot of work is needed to find the optimal settings for a readability formula and it usually involves a large amount of trial and error iterations. Thus, it is preferable to learn the necessary parameter settings automatically. This report investigates the application of machine learning techniques in both areas. Finally, several freely available machine learning tools, which can be employed to accomplish both tasks, are introduced and compared with each other.}}

2005 (2)

  • [PDF] J. Stegmann and A. Lücking, “Assessing Reliability on Annotations (1): Theoretical Considerations,” 360 Situierte Künstliche Kommunikatoren, Universität Bielefeld, 2, 2005.
    [Abstract] [BibTeX]

    This is the first part of a two-report mini-series focussing on issues in the evaluation of annotations. In this theoretically-oriented report we lay out the relevant statistical background for reliability studies, evaluate some pertaining approaches and also sketch some arguments that may lend themselves to the development of an original statistic. A description of the project background, including the documentation of the annotation scheme at stake and the empirical data collected, as well as results from the practical application of the relevant statistics and the discussion of our respective results are contained in the second, more empirically-oriented report [Lücking and Stegmann, 2005]. The following points are dealt with in detail here: we summarize and contribute to an argument by Gwet [2001] which indicates that the popular pi and kappa statistics [Carletta, 1996] are generally not appropriate for assessing the degree of agreement between raters on categorical type-ii data. We propose the use of AC 1 [Gwet, 2001] instead, since it has desirable mathematical properties that make it more appropriate for assessing the results of expert raters in general. As far as type-i data are concerned, we make use of conventional correlation statistics which, unlike their AC 1 and kappa cousins, do not deliver results that are adjusted with respect to agreements due to chance. Furthermore, we discuss issues in the interpretation of the results of the different statistics. Finally, we take up some loose ends from the previous chapters and sketch some advanced ideas pertaining to inter-rater agreement statistics. Therein, some differences as well as common ground concerning Gwet’s perspective and our own stance will be highlighted. We conclude with some preliminary suggestions regarding the development of the original statistic omega that will be different in nature from those discussed before.
    @TECHREPORT{Stegmann:Luecking:2005,
        institution={360 Situierte Künstliche Kommunikatoren},
        number={2},
        pdf={https://hucompute.org/wp-content/uploads/2015/08/report-05-02.pdf},
        website={http://www.sfb360.uni-bielefeld.de/reports/2005/2005-02.html},
        author={Stegmann, Jens and Lücking, Andy},
        keywords={own},
        year={2005},
        title={Assessing Reliability on Annotations (1): Theoretical Considerations},
        address={Universit{\"a}t Bielefeld},
        abstract={This is the first part of a two-report mini-series focussing on issues in the evaluation of annotations. In this theoretically-oriented report we lay out the relevant statistical background for reliability studies, evaluate some pertaining approaches and also sketch some arguments that may lend themselves to the development of an original statistic. A description of the project background, including the documentation of the annotation scheme at stake and the empirical data collected, as well as results from the practical application of the relevant statistics and the discussion of our respective results are contained in the second, more empirically-oriented report [Lücking and Stegmann, 2005]. The following points are dealt with in detail here: we summarize and contribute to an argument by Gwet [2001] which indicates that the popular pi and kappa statistics [Carletta, 1996] are generally not appropriate for assessing the degree of agreement between raters on categorical type-ii data. We propose the use of AC 1 [Gwet, 2001] instead, since it has desirable mathematical properties that make it more appropriate for assessing the results of expert raters in general. As far as type-i data are concerned, we make use of conventional correlation statistics which, unlike their AC 1 and kappa cousins, do not deliver results that are adjusted with respect to agreements due to chance. Furthermore, we discuss issues in the interpretation of the results of the different statistics. Finally, we take up some loose ends from the previous chapters and sketch some advanced ideas pertaining to inter-rater agreement statistics. Therein, some differences as well as common ground concerning Gwet’s perspective and our own stance will be highlighted. We conclude with some preliminary suggestions regarding the development of the original statistic omega that will be different in nature from those discussed before.}}
  • [PDF] A. Lücking and J. Stegmann, “Assessing Reliability on Annotations (2): Statistical Results for the DeiKon Scheme,” 360 Situierte Künstliche Kommunikatoren, Universität Bielefeld, 3, 2005.
    [Abstract] [BibTeX]

    This is the second part of a two-report mini-series focussing on issues in the evaluation of annotations. In this empirically-oriented report we lay out the documentation of the annotation scheme used in the deikon pro ject, discuss the results obtained in a respective reliability study and conclude with some suggestions regarding forthcoming versions of the scheme. Relevant statistical background, theoretical considerations in reliability statistics and an evaluation of some pertaining approaches are given in the first, more theoretically-oriented report [Stegmann and Lücking, 2005]. The following points are dealt with in detail here: we describe the setting that was used to elicit the empirical data. The annotation scheme that is put to scrutiny is documented and exemplified. Aspects of our theoretical work in linguistics are mentioned en passant. Then we present, discuss, and interpret the actual results obtained for our scheme. We find a high degree of correlation on the exact placement of time-stretched entities (word and gesture phase boundaries), mildly good results pertaining to agreement concerning time-related categories that appeal to structural configurations (e. g. the position of a gesture with respect to the parts of accompanying speech), but rather weak agreement with respect to the determination of gesture function. Therefore, the results for time- based type-i data look more promising than those obtained for the more theoretically- framed type-ii categories. However, the type-i results must not be compared with the type-ii ones on superficial grounds, since the statistics are of a different kind (correlation vs. agreement, i. e. not chance-adjusted vs. chance-adjusted) and, hence, the results have to be interpreted in different terms, respectively. Finally, we discuss some issues in the future make-up of the annotation scheme with a focus on its dialogue parts. Our respective suggestions amount to a shift towards a more theory-oriented annotation.
    @TECHREPORT{Luecking:Stegmann:2005,
        institution={360 Situierte Künstliche Kommunikatoren},
        number={3},
        pdf={https://hucompute.org/wp-content/uploads/2015/08/report-05-03.pdf},
        author={Lücking, Andy and Stegmann, Jens},
        website={http://www.sfb360.uni-bielefeld.de/reports/2005/2005-03.html},
        year={2005},
        title={Assessing Reliability on Annotations (2): Statistical Results for the DeiKon Scheme},
        abstract={This is the second part of a two-report mini-series focussing on issues in the evaluation of annotations. In this empirically-oriented report we lay out the documentation of the annotation scheme used in the deikon pro ject, discuss the results obtained in a respective reliability study and conclude with some suggestions regarding forthcoming versions of the scheme. Relevant statistical background, theoretical considerations in reliability statistics and an evaluation of some pertaining approaches are given in the first, more theoretically-oriented report [Stegmann and Lücking, 2005]. The following points are dealt with in detail here: we describe the setting that was used to elicit the empirical data. The annotation scheme that is put to scrutiny is documented and exemplified. Aspects of our theoretical work in linguistics are mentioned en passant. Then we present, discuss, and interpret the actual results obtained for our scheme. We find a high degree of correlation on the exact placement of time-stretched entities (word and gesture phase boundaries), mildly good results pertaining to agreement concerning time-related categories that appeal to structural configurations (e. g. the position of a gesture with respect to the parts of accompanying speech), but rather weak agreement with respect to the determination of gesture function. Therefore, the results for time- based type-i data look more promising than those obtained for the more theoretically- framed type-ii categories. However, the type-i results must not be compared with the type-ii ones on superficial grounds, since the statistics are of a different kind (correlation vs. agreement, i. e. not chance-adjusted vs. chance-adjusted) and, hence, the results have to be interpreted in different terms, respectively. Finally, we discuss some issues in the future make-up of the annotation scheme with a focus on its dialogue parts. Our respective suggestions amount to a shift towards a more theory-oriented annotation.},
        address={Universit{\"a}t Bielefeld}}