# R Companion to Elementary Applied Statistics

## Preview

## Book Description

The *R Companion to Elementary Applied Statistics* includes traditional applications covered in elementary statistics courses as well as some additional methods that address questions that might arise during or after the application of commonly used methods. Beginning with basic tasks and computations with R, readers are then guided through ways to bring data into R, manipulate the data as needed, perform common statistical computations and elementary exploratory data analysis tasks, prepare customized graphics, and take advantage of R for a wide range of methods that find use in many elementary applications of statistics.

Features:

- Requires no familiarity with R or programming to begin using this book.

- Can be used as a resource for a project-based elementary applied statistics course, or for researchers and professionals who wish to delve more deeply into R.
- Contains an extensive array of examples that illustrate ideas on various ways to use pre-packaged routines, as well as on developing individualized code.
- Presents quite a few methods that may be considered non-traditional, or advanced.
- Includes accompanying carefully documented script files that contain code for all examples presented, and more.

R is a powerful and free product that is gaining popularity across the scientific community in both the professional and academic arenas. Statistical methods discussed in this book are used to introduce the fundamentals of using R functions and provide ideas for developing further skills in writing R code. These ideas are illustrated through an extensive collection of examples.

About the Author:

**Christopher Hay-Jahans** received his Doctor of Arts in mathematics from Idaho State University in 1999. After spending three years at University of South Dakota, he moved to Juneau, Alaska, in 2002 where he has taught a wide range of undergraduate courses at University of Alaska Southeast.

## Table of Contents

- Preliminaries
- Bringing Data Into and Out of R
**Accessing Contents of Data Structures****Altering and Manipulating Data**- Summaries and Statistics
- More on Computing with R
- Basic Charts for Categorical Data
- Basic Plots for Numeric Data
- Scatterplots, Lines, and Curves
**More Graphics Tools**- Tests for One and Two Proportions
**Tests for More than Two Proportions**- Tests of Variances and Spread
- Tests for One or Two Means
- Tests for More than Two Means
- Selected Tests for Medians, and More
- Dependence and Independence

First Steps

Running Code in R

Some Terminology

Hierarchy of Data Classes

Data Structures

Operators

Functions

R Packages

Probability Distributions

Coding Conventions

Some Book-keeping and Other Tips

Getting Quick Coding Help

Entering Data Through Coding

Number and Sample Generating Tricks

The R Data Editor

Reading Text Files

Reading Data from Other File Formats

Reading Data from the Keyboard

Saving and Exporting Data

Extracting Data from Vectors

Conducting Data Searches in Vectors

Working with Factors

Navigating Data Frames

Lists

Choosing an Access/Extraction Method

Additional Notes

More About the attach Function

About Functions and their Arguments

Alternative Argument Assignments in Function Calls

Altering Entries in Vectors

Transformations

Manipulating Character Strings

Sorting Vectors and Factors

Altering Data Frames

Sorting Data Frames

Moving Between Lists and Data Frames

Additional Notes on the merge Function

Univariate Frequency Distributions

Bivariate Frequency Distributions

Statistics for Univariate Samples

Measures of Central Tendency

Measures of Spread

Measures of Position

Measures of Shape

Five-Number Summaries and Outliers

Elementary Five-Number Summary

Tukey’s Five-Number

The boxplotstats Function

Computing with Numeric Vectors

Working with Lists, Data Frames and Arrays

The sapply Function

The tapply Function

The by Function

The aggregate Function

The apply Function

The sweep Function

For-loops

Conditional Statements and the switch Function

The if-then Statement

The if-then-else Statement

The switch Function

Preparing Your Own Functions

Preliminary Comments

Bar Charts

Dot Charts

Pie Charts

Exporting Graphics Images

Additional Notes

Customizing Plotting Windows

The plotnew and plotwindow Functions

More on the paste Function

The title Function

More on the legend Function

More on the mtext Function

The text Function

Histograms

Boxplots

Stripcharts

QQ-Plots

Normal Probability QQ-Plots

Interpreting Normal Probability QQ-Plots

More on Reference Lines for QQ-Plots

QQ-Plots for Other Distributions

Additional Notes

More on the ifelse Function

Revisiting the axis Function

Frequency Polygons and Ogives

Scatterplots

Basic Plots

Manipulating Plotting Characters

Plotting Transformed Data

Matrix Scatterplots

The matplot Function

Graphs of Lines

Graphs of Curves

Superimposing Multiple Lines and/or Curves

Time-series Plots

Partitioning Graphics Windows

The layout Function

The splitscreen Function

Customizing Plotted Text and Symbols

Inserting Mathematical Annotation in Plots

More Low-level Graphics Functions

The points and symbols Functions

The grid, segments and arrows Functions

Boxes, Rectangles and Polygons

Error Bars

Computing Bounds for Error Bars

The errorBarplot Function

Purpose and Interpretation of Error Bars

More R Graphics Resources

Relevant Probability Distributions

Binomial Distributions

Hypergeometric Distributions

Normal Distributions

Chi-square Distributions

Single Population Proportions

Estimating a Population Proportion

Hypotheses for Single Proportion Tests

A Normal Approximation Test

A Chi-square Test

An Exact Test

Which Approach Should be Used?

Two Population Proportions

Estimating Differences Between Proportions

Hypotheses for Two Proportions Tests

A Normal Approximation Test

A Chi-square Test

Fisher’s Exact Test

Which Approach Should be Used?

Additional Notes

Normal Approximations of Binomial Distributions

One- versus Two-sided Hypothesis Tests

Equality of Three or More Proportions

Pearson’s Homogeneity of Proportions Test

Marascuilo’s Large Sample Procedure

Cohen’s Small Sample Procedure

Simultaneous Pairwise Comparisons

Marascuilo’s Large Sample Procedure

Cohen’s Small Sample Procedure

Linear Contrasts of Proportions

Marascuilo’s Large Sample Approach

Cohen’s Small Sample Approach

The Chi-square Goodness-of-Fit Test

Relevant Probability Distributions

F Distributions

Using a Sample to Assess Normality

Single Population Variances

Estimating a Variance

Testing a Variance

Exactly Two Population Variances

Estimating the Ratio of Two Variances

Testing the Ratio of Two Variances

What if the Normality Assumption is Violated?

Two or More Population Variances

Assessing Spread Graphically

Levene’s Test

Levene’s Test with Trimmed Means

Brown-Forsythe Test

Fligner-Killeen Test

Student’s t-Distribution

Single Population Means

Verifying the Normality Assumption

Estimating a Mean

Testing a Mean

Can a Normal Approximation be Used Here?

Exactly Two Population Means

Verifying Assumptions

The Test for Dependent Samples

Tests for Independent Samples

Relevant Probability Distributions

Studentized Range Distribution

Dunnett’s Test Distribution

Studentized Maximum Modulus Distribution

Setting the Stage

Equality of Means — Equal Variances Case

Pairwise Comparisons — Equal Variances

Bonferroni’s Procedure

Tukey’s Procedure

t Tests and Comparisons with a Control

Dunnett’s Test and Comparisons with a Control

Which Procedure to Choose

Equality of Means — Unequal Variances Case

Large-sample Chi-square Test

Welch’s F Test

Hotelling’s T Test

Pairwise Comparisons — Unequal Variances

Large-sample Chi-square Test

Dunnett’s C Procedure

Dunnett’s T Procedure

Comparisons with a Control

Which Procedure to Choose

The Nature of Differences Found

All Possible Pairwise Comparisons

Comparisons with a Control

Relevant Probability Distributions

Distribution of the Signed Rank Statistic

Distribution of the Rank Sum Statistic

The One-sample Sign Test

The Exact Test

The Normal Approximation

Paired Samples Sign Test

Independent Samples Median Test

Equality of Medians

Pairwise Comparisons of Medians

Single Sample Signed Rank Test

The Exact Test

The Normal Approximation

Paired Samples Signed Rank Test

Rank Sum Test of Medians

The Exact Mann-Whitney Test

The Normal Approximation

The Wilcoxon Rank Sum Test

Using the Kruskal-Wallis Test to Test Medians

Working with Ordinal Data

Paired Samples

Independent Samples

More than Two Independent Samples

Some Comments on the Use of Ordinal Data

Assessing Bivariate Normality

Pearson’s Correlation Coefficient

An Interval Estimate of ρ

Testing the Significance of ρ

Testing a Null Hypothesis with ρ ≠

Kendall’s Correlation Coefficient

An Interval Estimate of τ

Exact Test of the Significance of τ

Approximate Test of the Significance of τ

Spearman’s Rank Correlation Coefficient

Exact Test of the Significance of ρ_{S}

Approximate Test of the Significance ρ_{S}

Correlations in General — Comments and Cautions

Chi-square Test of Independence

For the Curious — Distributions of *r _{K}* and

*r*

_{S}## Author(s)

### Biography

**Christopher Hay-Jahans** received his Doctor of Arts in mathematics from Idaho State University in 1999. After spending three years at University of South Dakota, he moved to Juneau, Alaska, in 2002 where he has taught a wide range of undergraduate courses at University of Alaska Southeast.

## Reviews

"This book is written by a Professor of Mathematics with much experience in teaching statistics applied to the natural sciences. As mentioned in the Preface, the book addresses students (and teachers) of elementary statistics courses. Only basic preliminary statistical knowledge is necessary to start using the book, it is perfect for anyone jumping in to R, and it could readily serve as a reference manual rather than to be read from beginning to end... Several simple applied examples with detailed explanations are presented (coded in R) in order to make the methods more deeply understandable, and in some cases to compare different types of application (e.g. when different assumptions are filled, different research questions are of interest, or different types of data are recorded). All the richly-commented script files used in the book are available on the publisher’s website... At the end of the book, a highly informative Index aids quick searches. Nevertheless, the book can be ordered as an e-book as well... This second book of Professor Hay-Jahans, particularly together with the first one, is appropriate for undergraduate students as an

introductory book on statistics using R, but it could successfully be used also by PhD students, researchers, and teachers requiring a consistent and through reference."

-Márta Ladányi, ISCB December 2019