1st Edition

# Statistical Analysis of Financial Data With Examples In R

**Also available as eBook on:**

**Statistical Analysis of Financial Data** covers the use of statistical analysis and the methods of data science to model and analyze financial data. The first chapter is an overview of financial markets, describing the market operations and using exploratory data analysis to illustrate the nature of financial data. The software used to obtain the data for the examples in the first chapter and for all computations and to produce the graphs is R. However discussion of R is deferred to an appendix to the first chapter, where the basics of R, especially those most relevant in financial applications, are presented and illustrated. The appendix also describes how to use R to obtain current financial data from the internet.

Chapter 2 describes the methods of exploratory data analysis, especially graphical methods, and illustrates them on real financial data. Chapter 3 covers probability distributions useful in financial analysis, especially heavy-tailed distributions, and describes methods of computer simulation of financial data. Chapter 4 covers basic methods of statistical inference, especially the use of linear models in analysis, and Chapter 5 describes methods of time series with special emphasis on models and methods applicable to analysis of financial data.

Features

* Covers statistical methods for analyzing models appropriate for financial data, especially models with outliers or heavy-tailed distributions.

* Describes both the basics of R and advanced techniques useful in financial data analysis.

* Driven by real, current financial data, not just stale data deposited on some static website.

* Includes a large number of exercises, many requiring the use of open-source software to acquire real financial data from the internet and to analyze it.

**1. The Nature of Financial Data **

Financial Time Series

Autocorrelations

Stationarity

Time Scales and Data Aggregation

Financial Assets and Markets

Markets and Regulatory Agencies

Interest

Returns on Assets

Stock Prices; Fair Market Value

Splits, Dividends, and Return of Capital

Indexes and "the Market"

Derivative Assets

Short Positions

Portfolios of Assets: Diversification and Hedging

Frequency Distributions of Returns

Location and Scale

Skewness

Kurtosis

Multivariate Data

The Normal Distribution

Q-Q Plots

Outliers

Other Statistical Measures

Volatility

The Time Series of Returns

Measuring Volatility: Historical and Implied

Volatility Indexes: The VIX

The Curve of Implied Volatility

Risk Assessment and Management

Market Dynamics

Stylized Facts about Financial Data

Notes and Further Reading

Exercises and Questions for Review

Appendix A: Accessing and Analyzing Financial Data in R

A R Basics

A Data Repositories and Inputting Data into R

A Time Series and Financial Data in R

A Data Cleansing

Notes, Comments, and Further Reading on R

Exercises in R

2. Exploratory Financial Data Analysis

Data Reduction

Simple Summary Statistics

Centering and Standardizing Data

Simple Summary Statistics for Multivariate Data

Transformations

Identifying Outlying Observations

The Empirical Cumulative Distribution Function

Nonparametric Probability Density Estimation

Binned Data

Kernel Density Estimator

Multivariate Kernel Density Estimator

Graphical Methods in Exploratory Analysis

Time Series Plots

Histograms

Boxplots

Density Plots

Bivariate Data

Q-Q Plots

Graphics in R

Notes and Further Reading

Exercises

3. Probability Distributions in Models of Observable Events

Random Variables and Probability Distributions

Discrete Random Variables

Continuous Random Variables

Multivariate Distributions

Measures of Association in Multivariate Distributions

Copulas

Transformations of Multivariate Random Variables

Distributions of Order Statistics

Asymptotic Distributions; The Central Limit Theorem

The Tails of Probability Distributions

Sequences of Random Variables; Stochastic Processes

Diffusion of Stock Prices and Pricing of Options

Some Useful Probability Distributions

Discrete Distributions

Continuous Distributions

Multivariate Distributions

General Families of Distributions Useful in Modeling

Constructing Multivariate Distributions

Modeling of Data-Generating Processes

R Functions for Probability Distributions

Simulating Observations of a Random Variable

Uniform Random Numbers

Generating Nonuniform Random Numbers

Simulating Data in R

Notes and Further Reading

Exercises

**4.** St**atistical Models and Methods of Inference **

Models

Fitting Statistical Models

Measuring and Partitioning Observed Variation

Linear Models

Nonlinear Variance-Stabilizing Transformations

Parametric and Nonparametric Models

Bayesian Models

Models for Time Series

Criteria and Methods for Statistical Modeling

Estimators and Their Properties

Methods of Statistical Modeling

Optimization in Statistical Modeling; Least Squares and Other Applications

The General Optimization Problem

Least Squares

Maximum Likelihood

R Functions for Optimization

Statistical Inference

Confidence Intervals

Testing Statistical Hypotheses

Prediction

Inference in Bayesian Models

Resampling Methods; The Bootstrap

Robust Statistical Methods

Estimation of the Tail Index

Estimation of VaR and Expected Shortfall

Models of Relationships among Variables

Principal Components

Regression Models

Linear Regression Models

Linear Regression Models: The Regressors

Linear Regression Models: Individual Observations and Residuals

Linear Regression Models: An Example

Nonlinear Models

Specifying Models in R

Assessing the Adequacy of Models

Goodness-of-Fit Tests; Tests for Normality

Cross Validation

Model Selection and Model Complexity

Notes and Further Reading

Exercises

5. Discrete Time Series Models and Analysis

Basic Linear Operations

The Backshift Operator

The Difference Operator

The Integration Operator

Summation of an Infinite Geometric Series

Linear Difference Equations

Trends and Detrending

Cycles and Seasonal Adjustment

Analysis of Discrete Time Series Models

Stationarity

Sample Autocovariance and Autocorrelation Functions; Estimators

Statistical Inference in Stationary Time Series

Autoregressive and Moving Average Models

Moving Average Models; MA(q)

Autoregressive Models; AR(p)

The Partial Autocorrelation Function (PACF)

ARMA and ARIMA Models

Simulation of ARMA and ARIMA Models

Statistical Inference in ARMA and ARIMA Models

Selection of Orders in ARIMA Models

Forecasting in ARIMA Models

Analysis of ARMA and ARIMA Models in R

Robustness of ARMA Procedures; Innovations with Heavy Tails

Financial Data

Linear Regression with ARMA Errors

Conditional Heteroscedasticity

ARCH Models

GARCH Models and Extensions

Unit Roots and Cointegration

Spurious Correlations; The Distribution of the Correlation Coefficient

Unit Roots

Cointegrated Processes

Notes and Further Reading

Exercises

### Biography

**James E. Gentle** is University Professor Emeritus at George Mason University. He is a Fellow of the American Statistical Association (ASA) and of the American Association for the Advancement of Science. He is author of *Random Number Generation and Monte Carlo Methods* and *Matrix Algebra*.

"The book is very well written, and fills an important need for an up-to-date textbook about statistical techniques applied to finance. The book explains the theory behind the statistical techniques very well, with good detail. The mathematical notation is appealing and elegant."

~Jerzy Pawlowski, New York University Tandon School of Engineering"I thoroughly enjoyed reading the first two chapters of the book. Often, the first couple of chapters of a book provide a "boilerplate" discussion of the characteristics of the data and R. Here, the first two chapters are very well developed, to the point that they provide a good general resource to readers approaching the analysis of financial data from several different perspectives. For example, students in statistics usually approach the entire analysis of time series having in mind the potential application to the analysis of financial data, but they know nothing about the characteristics of the data and the financial markets...Just like the previous chapters, I broadly enjoyed reading this chapter. Prof. Gentle explains the topics clearly and often uses simulations to convey the intuition. That's also the way I like to teach these concepts and I think it enhances understanding among economics and finance students. I also commend the way he discusses the lag and difference operators and how they are implemented in R. He devotes quite some space to them, and I believe that is good as many texts go over these concepts too quickly for many students. Likewise, the discussion of the AR(I)MA models is very detailed and clear.

~Jan Annaert, University of Antwerp and Antwerp Management School