The English textbook corpus has been extracted from textbooks (English version) that have been used for teaching in public schools in Banglades. The corpus is collected in the year of 2012. the corpus collected with the aim to support research multilingual text readability analysis. The corpus contains 519 documents 95,470 sentences and 1,184,124 tokens. The format of the corpus is TEI P5. For more details on this corpus please refer to:
In the case that you use this corpus, please cite the publications above.
Download as ZIP archive(3.01 MB)