1st Edition

Corpus Linguistics for Writing Development A Guide for Research

By Philip Durrant Copyright 2023
    194 Pages 147 B/W Illustrations
    by Routledge

    194 Pages 147 B/W Illustrations
    by Routledge

    Corpus Linguistics for Writing Development provides a practical introduction to using corpora in the study of first and second language learners’ written language over time and across different levels of proficiency. Focusing on development in the use of vocabulary, formulaic language, and grammar, this book

    • discusses how corpus research can contribute to our understanding of writing development and to pedagogical practice;

    • reviews a range of corpus techniques for studying writing development from the perspectives of vocabulary, grammar, and formulaic language and interrogates the methodological bases of those techniques; and

    • guides readers to perform practical analyses of learner writing using the R open-source programming language.

    Aimed at the novice researcher, this book will be key reading for advanced undergraduate and postgraduate students in the fields of education, language, and linguistics. It will be of particular interest to those interested in first or second language writing, language assessment, and learner corpus research.

    Table of Contents

    Part One: Foundations

    Chapter 1. Studying Writing Development with a Corpus

    1. Introduction

    2. Using a corpus to study writing development

    3. How does writing development relate to vocabulary, grammar, formulaic language?

    4. Outline of the book

    Chapter 2. Learner Corpus Analysis in Practice: Some Basics

    1. Introduction

    2. Some housekeeping: getting your computer ready

    3. Getting to know R and RStudio

    3.1 Introduction: why learn R?

    3.2 Entering commands: the Console and Scripts

    3.3 Functions

    3.4 Vectors

    3.5 Getting help

    4. Some fundamentals of corpus research: encoding, markup, annotation, and metadata

    5. Corpora used in this book

    6. Automatically annotating your corpus for part of speech and syntactic relationships

    6.1 Introduction

    6.2 Make sure you have the required software

    6.3 Prepare the corpus for parsing

    6.4 Make a list of the files you want to process

    6.5 Run the CoreNLP pipeline

    7. Conclusion

    Part Two: Studying Vocabulary in Writing Development

    Chapter 3. Understanding Vocabulary in Learner Writing

    1. Introduction

    2. Theorizing development in vocabulary

    2.1 Introduction

    2.2 Breadth, depth, and fluency

    2.3 Aspects of word knowledge

    3. Measures of vocabulary development

    3.1 Introduction

    3.2 Lexical diversity

    3.3 Lexical sophistication

    3.3.1 Word length

    3.3.2 Word frequency

    3.3.3 Register-based measures

    3.3.4. Contextual distinctiveness

    3.3.5 Semantic measures

    3.3.6 Psycholinguistic measures

    4. Complicating factors

    4.1 Introduction

    4.2 What is a ‘word’?

    4.2.1 Defining words

    4.2.2 Defining word tokens

    4.2.3 Defining word types

    4.3 Choosing a suitable reference corpus

    4.4 Relationships between measures of diversity and sophistication

    4.5 Vocabulary knowledge depth

    5. Conclusion

    6. Taking it further

    Chapter 4. Vocabulary Research in Practice: Diversity and Academic Vocabulary

    1. Introduction

    2. Measuring vocabulary diversity

    2.1 Getting the metadata and corpus filenames

    2.2: Generating CTTR scores

    2.3 Recording the results

    2.4 Analysing vocabulary diversity

    3. Studying academic vocabulary

    3.1 Preparing the list of academic vocabulary

    3.2 Converting the parsed corpus to an easier-to-use format

    3.3 Identifying AVL words in the learner corpus

    3.4 Visualizing variation in measures

    3.5 Investigating the patterns

    4. Conclusion

    Part Three: Studying Grammar in Writing Development

    Chapter 5. Understanding Grammar in Learner Writing

    1. Introduction

    2. Studying development through grammar

    2.1 Models of grammar

    2.2 Selecting and interpreting grammatical features

    3. Approaches to grammatical development

    3.1 Varieties of grammatical approaches

    3.2 Development in grammatical complexity

    3.3 Multi-dimensional analysis

    3.4 Usage-based models of development

    4. Conclusion

    5. Taking it further

    Chapter 6. Grammar Research in Practice: Evaluating Parser Accuracy

    1. Introduction

    2. Reading a parsed corpus

    3. Accuracy evaluation and fixtagging: an introduction

    4. Accuracy evaluation and fixtagging: a worked example

    4.1 Hand-annotating a sample of texts

    4.2 Getting metadata and filenames

    4.3 Identifying and counting adjectives

    4.4 Identifying true positives, false positives, and false negatives

    4.5 Calculating precision and recall

    4.6 Identifying matches and differences in hand vs. computer parses

    4.7 Identifying and fixing parsing errors

    5. Tracing development in a grammatical feature

    5.1 Counting a feature in texts

    5.2 Visualizing variation across learner groups

    6. Conclusion

    Part Four: Studying Formulaic Language in Writing Development

    Chapter 7. Understanding Formulaic Language in Learner Writing

    1. Introduction

    2. Defining formulaic language

    3. How can we study formulaic language in a corpus?

    3.1 A frequency-based approach to studying formulaic language

    3.2 Lexical bundles

    3.3 Collocations

    4. Conclusion

    5. Taking it further

    Chapter 8. Formulaic Language Research in Practice: Academic Collocations

    1. Introduction

    2. Identifying collocations in a reference corpus

    2.1 Editing the parsed corpus

    2.2 Identifying lemmas and verb + noun combinations

    2.3 Identifying collocations

    3. Quantifying the use of academic collocations across learner groups

    3.1 Preparing the learner corpus

    3.2 Identifying academic collocations in the learner corpus

    3.3 Understanding use of academic collocations across levels

    4. Conclusion

    Biography

    Philip Durrant is Associate Professor in Language Education at the University of Exeter, United Kingdom.