1st Edition

Corpus Annotation Linguistic Information from Computer Text Corpora

    292 Pages
    by Routledge

    292 Pages
    by Routledge

    Corpus Annotation gives an up-to-date picture of this fascinating new area of research, and will provide essential reading for newcomers to the field as well as those already involved in corpus annotation. Early chapters introduce the different levels and techniques of corpus annotation. Later chapters deal with software developments, applications, and the development of standards for the evaluation of corpus annotation. While the book takes detailed account of research world-wide, its focus is particularly on the work of the UCREL (University Centre for Computer Corpus Research on Language) team at Lancaster University, which has been at the forefront of developments in the field of corpus annotation since its beginnings in the 1970s.

    List of contributors
    1. Introducing corpus annotation
    Geoffrey Leech
    2. Grammatical tagging
    Geoffrey Leech
    3. Syntactic annotation: treebanks
    Geoffrey Leech and Elizabeth Eyes
    4. Semantic annotation
    Andrew Wilson and Jenny Thomas
    5. Discourse annotation: anaphoric relations in corpora
    Roger Garside, Steve Fligelstone and Simon Botley
    6. Further levels of annotation
    Geoffrey Leech, Anthony McEnery and Martin Wynne
    7. A hybrid grammatical tagger: CLAWS4
    Roger Garside and Nicholas Smith
    8. How to generalise the task of annotation
    Steve Fligelstone, Mike Pacey and Paul Rayson
    9. Improving a tagger
    Nicholas Smith
    10. Retageting a tagger
    Fernando Sánchez León and Amalio F.Nieto-Serrano
    11. The use of syntactic annotation tools: partial and full parsing
    Jeremy Bateman, Jean Forrest, and Tim Willis
    12. Higher-level annotation tools
    Roger Garside and Paul Rayson
    13. A corpus/annotation toolbox
    Anthony McEnery and Paul Rayson
    14. A corpus-based grammar tool
    Anthony McEnery, John Paul Baker and John Hutchinson
    15. The exploitation of multilingual annotated corpora for term extraction
    Anthony McEnery, Jean-Marc Langé, Michael Oakes and Jean Véronis
    16. Cross-linguistic guidelines for the annotation of corpora
    Peter Kahrel, Ruthanna Barnett and Geoffrey Leech
    17. Consistency and accuracy in correcting automatically-tagged corpora
    John Paul Baker

    Appendix I: Sources for further information (WWW and e-mail addresses)
    Appendix II: Abbreviations and acronyms
    Appendix III: Specimen annotation practices: the C7 and C5 tagsets


    R.G. Garside, Geoffrey Leech, Anthony Mark Mcenery