This textbook shows how to bring theoretical concepts from finance and econometrics to the data. Focusing on coding and data analysis with R, we show how to conduct research in empirical finance from scratch. We start by introducing the concepts of tidy data and coding principles using the tidyverse family of R packages. Code is provided to prepare common open-source and proprietary financial data sources (CRSP, Compustat, Mergent FISD, TRACE) and organize them in a database. We reuse these data in all the subsequent chapters, which we keep as self-contained as possible. The empirical applications range from key concepts of empirical asset pricing (beta estimation, portfolio sorts, performance analysis, Fama-French factors) to modeling and machine learning applications (fixed effects estimation, clustering standard errors, difference-in-difference estimators, ridge regression, Lasso, Elastic net, random forests, neural networks) and portfolio optimization techniques.
1. Self-contained chapters on the most important applications and methodologies in finance, which can easily be used for the reader’s research or as a reference for courses on empirical finance.
2. Each chapter is reproducible in the sense that the reader can replicate every single figure, table, or number by simply copying and pasting the code we provide.
3. A full-fledged introduction to machine learning with tidymodels based on tidy principles to show how factor selection and option pricing can benefit from Machine Learning methods.
4. Chapter 2 on accessing and managing financial data shows how to retrieve and prepare the most important datasets financial economics: CRSP and Compustat. The chapter also contains detailed explanations of the most relevant data characteristics.
5. Each chapter provides exercises based on established lectures and classes which are designed to help students to dig deeper. The exercises can be used for self-studying or as a source of inspiration for teaching exercises.
1. Introduction to Tidy Finance 2. Accessing & Managing Financial Data 3. WRDS, CRSP, and Compustat 4. TRACE and FISD 5. Other Data Providers 6. Beta Estimation 7. Univariate Portfolio Sorts 8. Size Sorts and P-Hacking 9. Value and Bivariate Sorts 10. Replicating Fama and French Factors 11. Fama-MacBeth Regressions 12. Fixed Effects and Clustered Standard Errors 13. Difference in Differences 14. Factor Selection via Machine Learning 15. Option Pricing via Machine Learning 16. Parametric Portfolio Policies 17. Constrained Optimization and Backtesting Appendix A. Cover Design Appendix B. Clean Enhanced TRACE with R