Statistics for Linguists: An Introduction Using R  book cover
1st Edition

Statistics for Linguists: An Introduction Using R

ISBN 9781138056091
Published November 13, 2019 by Routledge
326 Pages

FREE Standard Shipping
USD $56.95

Prices & shipping based on shipping country


Book Description

Statistics for Linguists: An Introduction Using R is the first statistics textbook on linear models for linguistics. The book covers simple uses of linear models through generalized models to more advanced approaches, maintaining its focus on conceptual issues and avoiding excessive mathematical details. It contains many applied examples using the R statistical programming environment. Written in an accessible tone and style, this text is the ideal main resource for graduate and advanced undergraduate students of Linguistics statistics courses as well as those in other fields, including Psychology, Cognitive Science, and Data Science.

Table of Contents

Table of contents

0. Preface: Approach and how to use this book

0.1. Strategy of the book

0.2. Why R?

0.3. Why the tidyverse?

0.4. R packages required for this book

0.5. What this book is not

0.6. How to use this book

0.7. Information for teachers

1. Introduction to base R

1.1. Introduction

1.2. Baby steps: simple math with R

1.3. Your first R script

1.4. Assigning variables

1.5. Numeric vectors

1.6. Indexing

1.7. Logical vectors

1.8. Character vectors

1.9. Factor vectors

1.10. Data frames

1.11. Loading in files

1.12. Plotting

1.13. Installing, loading, and citing packages

1.14. Seeking help

1.15. A note on keyboard shortcuts

1.16. Your R journey: The road ahead

2. Tidy functions and reproducible R workflows

2.1. Introduction

2.2. tibble and readr

2.3. dplyr

2.4. ggplot2

2.5. Piping with magrittr

2.6. A more extensive example: iconicity and the senses

2.7. R markdown

2.8. Folder structure for analysis projects

2.9. Readme files and more markdown

2.10. Open and reproducible research

3. Models and distributions

3.1. Models

3.2. Distributions

3.3. The normal distribution

3.4. Thinking of the mean as a model

3.5. Other summary statistics: median and range

3.6. Boxplots and the interquartile range

3.7. Summary statistics in R

3.8. Exploring the emotional valence ratings

3.9. Chapter conclusions

4. Introduction to the linear model: Simple linear regression

4.1. Word frequency effects

4.2. Intercepts and slopes

4.3. Fitted values and residuals

4.4. Assumptions: Normality and constant variance

4.5. Measuring model fit with

4.6. A simple linear model in R

4.7. Linear models with tidyverse functions

4.8. Model formula notation: Intercept placeholders

4.9. Chapter conclusions

5. Correlation, linear, and nonlinear transformations

5.1. Centering

5.2. Standardizing

5.3. Correlation

5.4. Using logarithms to describe magnitudes

5.5. Example: Response durations and word frequency

5.6. Centering and standardization in R

5.7. Terminological note on the term ‘normalizing’

5.8. Chapter conclusions

6. Multiple regression

6.1. Regression with more than one predictor

6.2. Multiple regression with standardized coefficients

6.3. Assessing assumptions

6.4. Collinearity

6.5. Adjusted

6.6. Chapter conclusions

7. Categorical predictors

7.1. Introduction

7.2. Modeling the emotional valence of taste and smell words

7.3. Processing the taste and smell data

7.4. Treatment coding in R

7.5. Doing dummy coding ‘by hand’

7.6. Changing the reference level

7.7. Sum coding in R

7.8. Categorical predictors with more than two levels

7.9. Assumptions again

7.10. Other coding schemes

7.11. Chapter conclusions

8. Interactions and nonlinear effects

8.1. Introduction

8.2. Categorical * continuous interactions

8.3. Categorical * categorical interactions

8.4. Continuous * continuous interactions

8.5. Continuous interactions and regression planes

8.6. Higher-order interactions

8.7. Chapter conclusions

9. Inferential statistics 1: Significance testing

9.1. Introduction

9.2. Effect size: Cohen’s

9.3. Cohen’s in R

9.4. Standard errors and confidence intervals

9.5. Null hypotheses

9.6. Using to measure the incompatibility with the null hypothesis

9.7. Using the -distribution to compute -values

9.8. Chapter conclusions

10. Inferential statistics 2: Issues in significance testing

10.1. Common misinterpretations of -values

10.2. Statistical power and Type I, II, M, and S errors

10.3. Multiple testing

10.4. Stopping rules

10.5. Chapter conclusions

11. Inferential statistics 3: Significance testing in a regression context

11.1. Introduction

11.2. Standard errors and confidence intervals for regression coefficients

11.3. Significance tests with multi-level categorical predictors

11.4. Another example: the absolute valence of taste and smell words

11.5. Communicating uncertainty for categorical predictors

11.6. Communicating uncertainty for continuous predictors

11.7. Chapter conclusions

12. Generalized linear models: Logistic regression

12.1. Motivating generalized linear models

12.2. Theoretical background: Data-generating processes

12.3. The log odd function and interpreting logits

12.4. Speech errors and blood alcohol concentration

12.5. Predicting the dative alternation

12.6. Analyzing gesture perception: Hassemer & Winter (2016)

12.6.1. Exploring the dataset

12.6.2. Logistic regression analysis

12.7. Chapter conclusions

13. Generalized linear models 2: Poisson regression

13.1. Motivating Poisson regression

13.2. The Poisson distribution

13.3. Analyzing linguistic diversity using Poisson regression

13.4. Adding exposure variables

13.5. Negative binomial regression for overdispersed count data

13.6. Overview and summary of the generalized linear model framework

13.7. Chapter conclusions

14. Mixed models 1: Conceptual introduction

14.1. Introduction

14.2. The independence assumption

14.3. Dealing with non-independence via experimental design and averaging

14.4. Mixed models: Varying intercepts and varying slopes

14.5. More on varying intercepts and varying slopes

14.6. Interpreting random effects and random effect correlations

14.7. Specifying mixed effects models: lme4 syntax

14.8. Reasoning about your mixed model: The importance of varying slopes

14.9. Chapter conclusions

15. Mixed models 2: Extended example, significance testing, convergence issues

15.1. Introduction

15.2. Simulating vowel durations for a mixed model analysis

15.3. Analyzing the simulated vowel durations with mixed models

15.4. Extracting information out of lme4 objects

15.5. Messing up the model

15.6. Likelihood ratio tests

15.7. Remaining issues

15.7.1. -squared for mixed models

15.7.2. Predictions from mixed models

15.7.3. Convergence issues

15.8. Mixed logistic regression: Ugly selfies

15.9. Shrinkage and individual differences

15.10. Chapter conclusions

16. Outlook and strategies for model building

16.1. What you have learned so far

16.2. Model choice

16.3. The cookbook approach

16.4. Stepwise regression

16.5. A plea for subjective and theory-driven statistical modeling

16.6. Reproducible research

16.7. Closing words


Appendix A. Correspondences between significance tests and linear models

Appendix B. Reading recommendations

View More



Bodo Winter is Lecturer in Cognitive Linguistics in the Department of English Language and Applied Linguistics at the University of Birmingham, UK.