Machine Learning for Factor Investing: R Version  book cover
SAVE
$15.99
1st Edition

Machine Learning for Factor Investing: R Version




  • Available for pre-order. Item will ship after September 1, 2020
ISBN 9780367545864
September 1, 2020 Forthcoming by Chapman and Hall/CRC
319 Pages

 
SAVE ~ $15.99
was $79.95
USD $63.96

Prices & shipping based on shipping country


Preview

Book Description

Machine learning (ML) is progressively reshaping the fields of quantitative finance and algorithmic trading. ML tools are increasingly adopted by hedge funds and asset managers, notably for alpha signal generation and stocks selection. The technicality of the subject can make it hard for non-specialists to join the bandwagon, as the jargon and coding requirements may seem out of reach. Machine Learning for Factor Investing: R Version bridges this gap. It provides a comprehensive tour of modern ML-based investment strategies that rely on firm characteristics.

The book covers a wide array of subjects which range from economic rationales to rigorous portfolio back-testing and encompass both data processing and model interpretability. Common supervised learning algorithms such as tree models and neural networks are explained in the context of style investing and the reader can also dig into more complex techniques like autoencoder asset returns, Bayesian additive trees, and causal models.

All topics are illustrated with self-contained R code samples and snippets that are applied to a large public dataset that contains over 90 predictors. The material, along with the content of the book, is available online so that readers can reproduce and enhance the examples at their convenience. If you have even a basic knowledge of quantitative finance, this combination of theoretical concepts and practical illustrations will help you learn quickly and deepen your financial and technical expertise.

Table of Contents

I Introduction

1. Preface
 What this book is not about                          
 The targeted audience                              
 How this book is structured                          
 Companion website                               
 Why R?                                      
 Coding instructions                               
 Acknowledgements                                
 Future developments                              
 
2. Notations and data
 Notations                                     
 Dataset                                      

3. Introduction
 Context                                      
 Portfolio construction: the workflow                      
 Machine Learning is no Magic Wand                     

4. Factor investing and asset pricing anomalies
 Introduction                                   
 Detecting anomalies                               
 Simple portfolio sorts                          
 Factors                                  
 Predictive regressions, sorts, and p-value issues            
 Fama-Macbeth regressions                        
 Factor competition                            
 Advanced techniques                           
 Factors or characteristics?                           
 Hot topics: momentum, timing and ESG                   
 Factor momentum                            
 Factor timing                               
 The green factors                             
 The link with machine learning                         
 A short list of recent references                     
 Explicit connections with asset pricing models            
 Coding exercises                                 
 
5. Data preprocessing

 Know your data                                 
 Missing data                                   
 Outlier detection                                 
 Feature engineering                               
 Feature selection                             
 Scaling the predictors                          
 Labelling                                     
 Simple labels                               
 Categorical labels                             
 The triple barrier method                        
 Filtering the sample                           
 Return horizons                             
 Handling persistence                               
 Extensions                                    
 Transforming features                          
 Macro-economic variables                        
 Active learning                              
 Additional code and results                           
 Impact of rescaling: graphical representation             
 Impact of rescaling: toy example                    
 Coding exercises                                 

II Common supervised algorithms

6. Penalized regressions and sparse hedging for minimum variance portfolios
 
Penalised regressions                              
 Simple regressions                            
 Forms of penalizations                          
 Illustrations                                
 Sparse hedging for minimum variance portfolios               
 Presentation and derivations                      
 Example                                  
 Predictive regressions                              
 Literature review and principle                     
 Code and results                             
 Coding exercise                                 

7. Tree-based methods
 Simple trees                                   
 Principle                                 
 Further details on classification                     
 Pruning criteria                              
 Code and interpretation                         
 Random forests                                 
 Principle                                 
 Code and results                             
 Boosted trees: Adaboost                            
 Methodology                               
 Illustration                                
 Boosted trees: extreme gradient boosting                   
 Managing Loss                              
 Penalisation                                
 Aggregation                                
 Tree structure                               
 Extensions                                
 Code and results                             
 Instance weighting                            
 Discussion                                    
 Coding exercises                                 
 
8. Neural networks
 The original perceptron                             
 Multilayer perceptron (MLP)                          
 Introduction and notations                       
 Universal approximation                         
 Learning via back-propagation                     
 Further details on classification                     
 How deep should we go? And other practical issues             
 Architectural choices                           
 Frequency of weight updates and learning duration          
 Penalizations and dropout                        
 Code samples and comments for vanilla MLP                 
 Regression example                           
 Classification example                          
 Custom losses                               
 Recurrent networks                               
 Presentation                               
 Code and results                             
 Other common architectures                          
 Generative adversarial networks                    
 Auto-encoders                              
 A word on convolutional networks                   
 Advanced architectures                         
 Coding exercise                                 
 
9. Support vector machines
 SVM for classification                              
 SVM for regression                               
 Practice                                      
 Coding exercises                                 
 
10. Bayesian methods

 The Bayesian framework                            
 Bayesian sampling                                
 Gibbs sampling                              
 Metropolis-Hastings sampling                      
 Bayesian linear regression                            
 Naive Bayes classifier                              
 Bayesian additive trees                             
 General formulation                           
 Priors                                   
 Sampling and predictions                        
 Code                                    

III From predictions to portfolios
 
11. Validating and tuning

 Learning metrics                                 
 Regression analysis                            
 Classification analysis                          
 Validation                                    
 The variance-bias tradeoff: theory                   
 The variance-bias tradeoff: illustration                 
 The risk of overfitting: principle                     
 The risk of overfitting: some solutions                 
 The search for good hyperparameters                     
 Methods                                  
 Example: grid search                           
 Example: Bayesian optimization                    
 Short discussion on validation in backtests                  

12. Ensemble models
 Linear ensembles                                 
 Principles                                 
 Example                                  
 Stacked ensembles                                
 Two stage training                            
 Code and results                             
 Extensions                                    
 Exogenous variables                           
 Shrinking inter-model correlations                   
 Exercise                                      
 
13. Portfolio backtesting
 Setting the protocol                               
 Turning signals into portfolio weights                     
 Performance metrics                               
 Discussion                                 
 Pure performance and risk indicators                  
 Factor-based evaluation                         
 Risk-adjusted measures                         
 Transaction costs and turnover                     
 Common errors and issues                           
 Forward looking data                          
 Backtest overfitting                           
 Simple safeguards                            
 Implication of non-stationarity: forecasting is hard              
 General comments                            
 The no free lunch theorem                        
 Example                                     
 Coding exercises                                 

IV Further important topics

14. Interpretability
 Global interpretations                              
 Simple models as surrogates                       
 Variable importance (tree-based)                    
 Variable importance (agnostic)                     
 Partial dependence plot                         
 Local interpretations                              
 LIME                                   
 Shapley values                              
 Breakdown                                
 
15. Two key concepts: causality and non-stationarity
 Causality                                     
 Granger causality                             
 Causal additive models                         
 Structural time-series models                      
 Dealing with changing environments                      
 Non-stationarity: yet another illustration               
 Online learning                              
 Homogeneous transfer learning                     
 
16. Unsupervised learning
 The problem with correlated predictors                    
 Principal component analysis and autoencoders               
 A bit of algebra                              
 PCA                                    
 Autoencoders                               
 Application                                
 Clustering via k-means                             
 Nearest neighbors                                
 Coding exercise                                 
 
17. Reinforcement learning

 Theoretical layout                                
 General framework                            
 Q-learning                                 
 SARSA                                  
 The curse of dimensionality                           
 Policy gradient                                  
 Principle                                 
 Extensions                                
 Simple examples                                 
 Q-learning with simulations                       
 Q-learning with market data                      
 Concluding remarks                               
 Exercises                                     

V Appendix

Data Description
 
Solution to exercises
 

...
View More

Author(s)

Biography

Guillaume Coqueret is associate professor of finance and data science at EMLYON Business School. His recent research revolves around applications of machine learning tools in financial economics.

Tony Guida is executive director at RAM Active Investments. He serves as chair of the machineByte think tank and is the author of Big Data and Machine Learning in Quantitative Investment.