This textbook introduces the fundamental concepts and methods of corpus linguistics for students approaching this topic for the first time, putting specific emphasis on the enormous linguistic diversity represented by approximately 7,000 human languages and broadening the scope of current concerns in general corpus linguistics.
Including a basic toolkit to help the reader investigate language in different usage contexts, this book:
- Shows the relevance of corpora to a range of linguistic areas from phonology to sociolinguistics and discourse
- Covers recent developments in the application of corpus linguistics to the study of understudied languages and linguistic typology
- Features exercises, short problems, and questions
- Includes examples from real studies in over 15 languages plus multilingual corpora
Providing the necessary corpus linguistics skills to critically evaluate and replicate studies, this book is essential reading for anyone studying corpus linguistics.
Table of Contents
2 Basic concepts in corpus linguistics
3 Corpus composition and corpus types
4 Levels of linguistic representation in corpus linguistic research
5. Corpus Queries
6 Corpus building
7 Corpus annotation
8 Statistical description and analysis
9 Corpora in Sociolinguistics
10 Corpus linguistics and language documentation
11 Corpus-based typology
Danielle Barth is a university lecturer in the School of Culture, History, and Language at the Australian National University and in the ARC Centre of Excellence for the Dynamics of Language. She works with a language community in Matukar, Papua New Guinea. Her areas of research are quantitative corpus linguistics, typology, and linguistic variation. She is a co-developer of the multilingual corpus SCOPIC.
Stefan Schnell is a senior researcher in the Department of Comparative Language Science at the University of Zurich. He has undertaken documentary work on Oceanic languages from North Vanuatu (South Pacific). In recent years, he has been engaged in cross-corpus research into typological questions of morphosyntactic patterns in language use and its interaction with discourse structure. He is a co-developer and co-editor of the multilingual corpus Multi-CAST
'Corpora – large and small – are central to linguistics and we need be able to recognise their potential and limitations. This excellent book is a storehouse of such wisdom. Barth and Schnell offer practical guidance for building and exploring a corpus, and they draw extensively on international scholarship which demonstrates the richness of corpus-based enquiry.'
Miriam Meyerhoff, University of Oxford, UK