Machine Learning for Factor Investing: R Version

Description

Machine learning (ML) is progressively reshaping the fields of quantitative finance and algorithmic trading. ML tools are increasingly adopted by hedge funds and asset managers, notably for alpha signal generation and stocks selection. The technicality of the subject can make it hard for non-specialists to join the bandwagon, as the jargon and coding requirements may seem out of reach. Machine... Read more

4. Factor investing and asset pricing anomalies
Introduction
Detecting anomalies
Simple portfolio sorts
Factors
Predictive regressions, sorts, and p-value issues
Fama-Macbeth regressions
Factor competition
Advanced techniques
Factors or characteristics?
Hot topics: momentum, timing and ESG
Factor momentum
Factor timing
The green factors
The link with machine learning
A short list of recent references
Explicit connections with asset pricing models
Coding exercises

5. Data preprocessing
Know your data
Missing data
Outlier detection
Feature engineering
Feature selection
Scaling the predictors
Labelling
Simple labels
Categorical labels
The triple barrier method
Filtering the sample
Return horizons
Handling persistence
Extensions
Transforming features
Macro-economic variables
Active learning
Additional code and results
Impact of rescaling: graphical representation
Impact of rescaling: toy example
Coding exercises

II Common supervised algorithms

6. Penalized regressions and sparse hedging for minimum variance portfolios
Penalised regressions
Simple regressions
Forms of penalizations
Illustrations
Sparse hedging for minimum variance portfolios
Presentation and derivations
Example
Predictive regressions
Literature review and principle
Code and results
Coding exercise

7. Tree-based methods
Simple trees
Principle
Further details on classification
Pruning criteria
Code and interpretation
Random forests
Principle
Code and results
Boosted trees: Adaboost
Methodology
Illustration
Boosted trees: extreme gradient boosting
Managing Loss
Penalisation
Aggregation
Tree structure
Extensions
Code and results
Instance weighting
Discussion
Coding exercises

8. Neural networks
The original perceptron
Multilayer perceptron (MLP)
Introduction and notations
Universal approximation
Learning via back-propagation
Further details on classification
How deep should we go? And other practical issues
Architectural choices
Frequency of weight updates and learning duration
Penalizations and dropout
Code samples and comments for vanilla MLP
Regression example
Classification example
Custom losses
Recurrent networks
Presentation
Code and results
Other common architectures
Generative adversarial networks
Auto-encoders
A word on convolutional networks
Advanced architectures
Coding exercise

9. Support vector machines
SVM for classification
SVM for regression
Practice
Coding exercises

10. Bayesian methods
The Bayesian framework
Bayesian sampling
Gibbs sampling
Metropolis-Hastings sampling
Bayesian linear regression
Naive Bayes classifier
Bayesian additive trees
General formulation
Priors
Sampling and predictions
Code

III From predictions to portfolios

11. Validating and tuning
Learning metrics
Regression analysis
Classification analysis
Validation
The variance-bias tradeoff: theory
The variance-bias tradeoff: illustration
The risk of overfitting: principle
The risk of overfitting: some solutions
The search for good hyperparameters
Methods
Example: grid search
Example: Bayesian optimization
Short discussion on validation in backtests

12. Ensemble models
Linear ensembles
Principles
Example
Stacked ensembles
Two stage training
Code and results
Extensions
Exogenous variables
Shrinking inter-model correlations
Exercise

13. Portfolio backtesting
Setting the protocol
Turning signals into portfolio weights
Performance metrics
Discussion
Pure performance and risk indicators
Factor-based evaluation
Risk-adjusted measures
Transaction costs and turnover
Common errors and issues
Forward looking data
Backtest overfitting
Simple safeguards
Implication of non-stationarity: forecasting is hard
General comments
The no free lunch theorem
Example
Coding exercises

IV Further important topics

14. Interpretability
Global interpretations
Simple models as surrogates
Variable importance (tree-based)
Variable importance (agnostic)
Partial dependence plot
Local interpretations
LIME
Shapley values
Breakdown

15. Two key concepts: causality and non-stationarity
Causality
Granger causality
Causal additive models
Structural time-series models
Dealing with changing environments
Non-stationarity: yet another illustration
Online learning
Homogeneous transfer learning

16. Unsupervised learning
The problem with correlated predictors
Principal component analysis and autoencoders
A bit of algebra
PCA
Autoencoders
Application
Clustering via k-means
Nearest neighbors
Coding exercise

17. Reinforcement learning
Theoretical layout
General framework
Q-learning
SARSA
The curse of dimensionality
Policy gradient
Principle
Extensions
Simple examples
Q-learning with simulations
Q-learning with market data
Concluding remarks
Exercises

V Appendix

Data Description

Solution to exercises

Author(s)

Biography

Guillaume Coqueret is associate professor of finance and data science at EMLYON Business School. His recent research revolves around applications of machine learning tools in financial economics.

Tony Guida is executive director at RAM Active Investments. He serves as chair of the machineByte think tank and is the author of Big Data and Machine Learning in Quantitative Investment.

Critics' Reviews

"This book is the perfect one for any data scientist on financial markets. It is well written, with lots of illustrations, examples, pieces of code, tips on the different statistical package available to perform the various algos. This book requires for sure a strong knowledge in quantitative finance and Machine Learning, so it cannot be put in any hands. But for those who are familiar with quantitative finance, this book can be a reference, as Hull's book is as regards to derivatives products. I liked the good and detailed analysis of the different Machine Learning algos, and the different examples used throughout the book. This book is perfect for assets managers having to run backtests and searching for innovative ways to enhance the return of their portfolios. I spent quite a good time reading this manuscript, and I would recommend it."
-Frédéric Girod, Union of European Football Associations

Machine Learning for Factor Investing: R Version

Description

Table of Contents

Author(s)

Biography

Critics' Reviews

SOCIAL NETWORKS

Secure Shopping Payment Options