1st Edition

Probability, Statistics and Other Frightening Stuff

By Alan Jones Copyright 2019

    Probability, Statistics and Other Frightening Stuff (Volume II of the Working Guides to Estimating & Forecasting series) considers many of the commonly used Descriptive Statistics in the world of estimating and forecasting. It considers values that are representative of the ‘middle ground’ (Measures of Central Tendency), and the degree of data scatter (Measures of Dispersion and Shape) around the ‘middle ground’ values.

    A number of Probability Distributions and where they might be used are discussed, along with some fascinating and useful ‘rules of thumb’ or short-cut properties that estimators and forecasters can exploit in plying their trade. With the help of a ‘Correlation Chicken’, the concept of partial correlation is explained, including how the estimator or forecaster can exploit this in reflecting varying levels of independence and imperfect dependence between an output or predicted value (such as cost) and an input or predictor variable such as size.

    Under the guise of ‘Tails of the unexpected’ the book concludes with two chapters devoted to Hypothesis Testing (or knowing when to accept or reject the validity of an assumed estimating relationship), and a number of statistically-based tests to help the estimator to decide whether to include or exclude a data point as an ‘outlier’, one that appears not to be representative of that which the estimator is tasked to produce. This is a valuable resource for estimators, engineers, accountants, project risk specialists as well as students of cost engineering.

    Volume II Table of Contents, 1 Introduction and Objectives, 1.1 Why write this book? Who might find it useful? Why Five Volumes? 1.1.1 Why write this series? Who might find it useful? 1.1.2 Why Five Volumes? 1.2 Features you'll find in this book and others in this series, 1.2.1 Chapter Context, 1.2.2 The Lighter Side (humour), 1.2.3 Quotations, 1.2.4 Definitions, 1.2.5 Discussions and Explanations with a Mathematical Slant for Formula-philes, 1.2.6 Discussions and Explanations without a Mathematical Slant for Formula -phobes, 1.2.7 Caveat Augur, 1.2.8 Worked Examples, 1.2.9 Useful Microsoft Excel Functions and Facilities, 1.2.10 References to Authoritative Sources, 1.2.11 Chapter Reviews, 1.3 Overview of Chapters in this Volume, 1.4 Elsewhere in the 'Working Guide to Estimating & Forecasting' Series, 1.4.1 Volume I: Principles, Process and Practice of Professional Number Juggling, 1.4.2 Volume II: Probability, Statistics and other Frightening Stuff, 1.4.3 Volume III: Best Fit Lines & Curves, and some Mathe-Magical Transformations, 1.4.4 Volume IV: Learning, Unlearning and Re-Learning Curves, 1.4.5 Volume V: Risk, Opportunity, Uncertainty and Other Random Models, 1.5 Final Thoughts and Musings on this Volume and Series, References, 2 Measures of Central Tendency: Means, Modes, Medians, 2.1 'S' is for Shivers, Statistics and Spin, 2.1.1 Cutting through the Mumbo-Jumbo: What is or are Statistics? 2.1.2 Are there any types of Statistics that are not 'Descriptive'? 2.1.3 Samples, Populations and the Dreaded Statistical Bias, 2.2 Measures of Central Tendency, 2.2.1 What do we mean by ‘Mean’? 2.2.2 Can we take the Average of an Average? 2.3 Arithmetic Mean - The Simple Average, 2.3.1 Properties of Arithmetic Means: A Potentially Unachievable Value! 2.3.2 Properties of Arithmetic Means: An Unbiased Representative Value of the Whole, 2.3.3 Why would we not want to use the Arithmetic Mean? 2.3.4 Is an Arithmetic Mean useful where there is an upward or downward trend? 2.3.5 Average of Averages: Can we take the Arithmetic Mean of an Arithmetic Mean? 2.4 Geometric Mean, 2.4.1 Basic Rules and Properties of a Geometric Mean, 2.4.2 When might we want to use a Geometric Mean? 2.4.3 Finding a steady state rate of growth or decay with a Geometric Mean, 2.4.4 Using a Geometric Mean as a Cross-Driver Comparator, 2.4.5 Using a Geometric Mean with Certain Non-Linear Regressions, 2.4.6 Average of Averages: Can we take the Geometric Mean of a Geometric Mean? 2.5 Harmonic Mean, 2.5.1 Surely Estimators would never use the Harmonic Mean? 2.5.2 Cases where the Harmonic Mean and the Arithmetic Mean are both inappropriate, 2.5.3 Average of Averages: Can we take the Harmonic Mean of a Harmonic Mean?, 2.6 Quadratic Mean: Root Mean Square, 2.6.1 When would we ever use a Quadratic Mean? 2.7 Comparison of Arithmetic, Geometric, Harmonic and Quadratic Means, 2.8 Mode, 2.8.1 When would we use the Mode instead of the Arithmetic Mean? 2.8.2 What does it mean if we observe more than one Mode? 2.8.3 What if we have two modes that occur at adjacent values? 2.8.4 Approximating the Theoretical Mode when there is no Real Observable Mode! 2.9 Median, 2.9.1 Primary Use of the Median, 2.9.2 Finding the Median, 2.10 Choosing a Representative Value: The 5-Ms, 2.10.1 Some Properties of the 5-Ms, 2.11 Chapter Review, References, 3 Measures of Dispersion and Shape, 3.1 Measures of Dispersion or Scatter around a Central Value, 3.2 Minimum, Maximum and Range, 3.3 Absolute Deviations, 3.3.1 Mean or Average Absolute Deviation (AAD), 3.3.2 Median Absolute Deviation (MAD), 3.3.3 Is there a Mode Absolute Deviation? 3.3.4 When would we use an Absolute Deviation? 3.4 Variance and Standard Deviation, 3.4.1 Variance and Standard Deviation - Compensating for Small Samples, 3.4.2 Coefficient of Variation, 3.4.3 The Range Rule - Is it Myth or Magic? 3.5 Comparison of Deviation-Based Measures of Dispersion, 3.6 Confidence Levels, Limits and Intervals, 3.6.1 Open and Closed Confidence Level Ranges, 3.7 Quantiles: Quartiles, Quintiles, Deciles and Percentiles, 3.7.1 A few more words about Quartiles, 3.7.2 A few thoughts about Quintiles, 3.7.3 And a few words about Deciles, 3.7.4 Finally, a few words about Percentiles, 3.8 Other Measures of Shape: Skewness and Peakedness, 3.8.1 Measures of Skewness, 3.8.2 Measures of Peakedness or Flatness - Kurtosis, 3.9 Chapter Review, References, 4 Probability Distributions, 4.1 Probability, 4.1.1 Discrete Distributions, 4.1.2 Continuous Distributions, 4.1.3 Bounding Distributions, 4.2 Normal Distributions, 4.2.1 What is a Normal Distribution? 4.2.2 Key Properties of a Normal Distribution, 4.2.3 Where is the Normal Distribution observed? When can, or should, it be used? 4.2.4 Probability Density Function and Cumulative Distribution Function, 4.2.5 Key Stats and Facts about the Normal Distribution, 4.3 Uniform Distributions, 4.3.1 Discrete Uniform Distributions, 4.3.2 Continuous Uniform Distributions, 4.3.3 Key Properties of a Uniform Distribution, 4.3.4 Where is the Uniform Distribution observed? When can, or should, it be used? 4.3.5 Key Stats and Facts about the Uniform Distribution, 4.4 Binomial and Bernoulli Distributions, 4.4.1 What is a Binomial Distribution? 4.4.2 What is a Bernoulli Distribution? 4.4.3 Probability Mass Function and Cumulative Distribution Function, 4.4.4 Key Properties of a Binomial Distribution, 4.4.5 Where is the Binomial Distribution observed? When can, or should, it be used? 4.4.6 Key Stats and Facts about the Binomial Distribution, 4.5 Beta Distributions, 4.4.1 What is a Beta Distribution? 4.4.2 Probability Density Function and Cumulative Distribution Function, 4.4.3 Key Properties of a Beta Distribution, 4.4.4 PERT-Beta or Project Beta Distributions, 4.4.5 Where is the Beta Distribution observed? When can, or should, it be used? 4.4.6 Key Stats and Facts about the Beta Distribution, 4.6 Triangular Distributions, 4.6.1 What is a Triangular Distribution? 4.6.2 Probability Density Function and Cumulative Distribution Function, 4.6.3 Key Properties of a Triangular Distribution, 4.6.4 Where is the Triangular Distribution observed? When can, or should, it be used? 4.6.5 Key Stats and Facts about the Triangular Distribution, 4.7 Lognormal Distributions, 4.7.1 What is a Lognormal Distribution? 4.7.2 Probability Density Function and Cumulative Distribution Function, 4.7.3 Key Properties of a Lognormal Distribution, 4.7.4 Where is the Lognormal Distribution observed? When can, or should, it be used? 4.7.5 Key Stats and Facts about the Lognormal Distribution, 4.8 Weibull Distributions, 4.8.1 What is a Weibull Distribution? 4.8.2 Probability Density Function and Cumulative Distribution Function, 4.8.3 Key Properties of a Weibull Distribution, 4.8.4 Where is the Weibull Distribution observed? When can, or should, it be used? 4.8.5 Key Stats and Facts about the Weibull Distribution, 4.9 Poisson Distributions, 4.9.1 What is a Poisson Distribution? 4.9.2 Probability Mass Function and Cumulative Distribution Function, 4.9.3 Key Properties of a Poisson Distribution, 4.9.4 Where is the Poisson Distribution observed? When can, or should, it be used? 4.9.5 Key Stats and Facts about the Poisson Distribution, 4.10 Gamma and Chi-Squared Distributions, 4.10.1 What is a Gamma Distribution? 4.10.2 What is a Chi-Squared Distribution? 4.10.3 Probability Density Function and Cumulative Distribution Function, 4.10.4 Key Properties of Gamma and Chi-Squared Distributions, 4.10.5 Where are the Gamma and Chi-Squared Distributions used? 4.10.6 Key Stats and Facts about the Gamma and Chi-Squared Distributions,4.11 Exponential Distributions, 4.11.1 What is an Exponential Distribution? 4.11.2 Probability Density Function and Cumulative Distribution Function, 4.11.3 Key Properties of an Exponential Distribution, 4.11.4 Where is the Exponential Distribution observed? When can, or should, it be used? 4.11.5 Key Stats and Facts about the Exponential Distribution, 4.12 Pareto Distributions, 4.12.1 What is a Pareto Distribution? 4.12.2 Probability Density Function and Cumulative Distribution Function, 4.12.3 The Pareto Principle: How does it fit in with the Pareto Distribution? 4.12.4 Key Properties of a Pareto Distribution, 4.12.5 Where is the Pareto Distribution observed? When can, or should, it be used? 4.12.6 Key Stats and Facts about the Pareto Distribution, 4.13 Choosing an Appropriate Distribution, 4.14 Chapter Review, References, 5 Measures of Linearity, Dependence and Correlation, 5.1 Covariance, 5.2 Linear Correlation or Measures of Linear Dependence, 5.2.1 Pearson's Correlation Coefficient, 5.2.2 Pearson's Correlation Coefficient - Key Properties and Limitations, 5.2.3 Correlation is not Causation, 5.2.4 Partial Correlation: Time for some Correlation Chicken, 5.2.5 Coefficient of Determination, 5.3 Rank Correlation, 5.3.1 Spearman's Rank Correlation Coefficient, 5.3.2 If Spearman's Rank Correlation is so much trouble, why bother? 5.3.3 Interpreting Spearman's Rank Correlation Coefficient, 5.3.4 Kendall's Tau Rank Correlation Coefficient, 5.3.5If Kendall's Tau Rank Correlation is so much trouble, why bother? 5.4 Correlation: What if you want to 'Push' it not 'Pull' it? 5.4.1 The Pushy Pythagorean Technique or Restricting the Scatter around a Straight Line, 5.4.2 ‘Controlling Partner’ Technique, 5.4.3 Equivalence of the Pushy Pythagorean and Controlling Partner Techniques, 5.4.4 ‘Equal Partners’ Technique, 5.4.5 Copulas, 5.5 Chapter Review, References, 6 Tails of the Unexpected (1): Hypothesis Testing, 6.1 Hypothesis Testing, 6.1.1 Tails of the Unexpected, 6.2 Z-Scores and Z-Tests, 6.2.1 Standard Error, 6.2.2 Example: Z-Testing the Mean Value of a Normal Distribution, 6.2.3 Example: Z-Testing the Median Value of a Beta Distribution, 6.3 Student's t-Distribution and t-Tests, 6.3.1 Student's t-Distribution, 6.3.2 t-Tests, 6.3.3 Performing a t-Test in Microsoft Excel on a Single Sample, 6.3.4 Performing a t-Test in Microsoft Excel to Compare Two Samples, 6.4 Mann-Whitney U-Tests, 6.5 Chi-Squared Tests or 2-Tests, 6.5.1 Chi-Squared Distribution Revisited, 6.5.2 Chi-Squared Test, 6.6 F-Distribution and F-Tests, 6.6.1 F-Distribution, 6.6.2 F-Test, 6.6.3 Primary Use of the F-Distribution, 6.7 Checking for Normality, 6.7.1 Q-Q Plots, 6.7.2 Using a Chi-Square Test for Normality, 6.7.3 Using the Jarque-Bera Test for Normality, 6.8 Chapter Review, References, 7 Tails of the Unexpected (2): Outing the Outliers, 7.1 Outing the Outliers: Detecting and Dealing with Outliers, 7.1.1 Mitigation of Type I and Type II Outlier Errors, 7.2 Tukey Fences, 7.2.1 Tukey Slimline Fences - For Larger Samples and Less Tolerance of Outliers? 7.3 Chauvenet's Criterion, 7.3.1 Variation on Chauvenet's Criterion for Small Sample Sizes (SSS), 7.3.2 Taking a Q-Q Perspective on Chauvenet's Criterion for Small Sample Sizes (SSS), 7.4 Peirce's Criterion, 7.5 Iglewicz and Hoaglin's MAD Technique, 7.6 Grubbs' Test, 7.7 Generalised Extreme Studentised Deviate (GESD), 7.8 Dixon's Q-Test, 7.9 Doing the JB Swing - Using Skewness and Excess Kurtosis to identify Outliers, 7.10 Outlier Tests - A Comparison, 7.11 Chapter Review, References, Glossary of Estimating Terms

    Biography

    Alan R. Jones is Principal Consultant at Estimata Limited, an estimating consultancy service. He is a Certified Cost Estimator/Analyst (US) and Certified Cost Engineer (CCE) (UK). Prior to setting up his own business, he has enjoyed a 40-year career in the UK aerospace and defence industry as an estimator, culminating in the role of Chief Estimator at BAE Systems. Alan is a Fellow of the Association of Cost Engineers and a Member of the International Cost Estimating and Analysis Association. Historically (some four decades ago), Alan was a graduate in Mathematics from Imperial College of Science and Technology in London, and was an MBA Prize-winner at the Henley Management College (. . . that was slightly more recent, being only two decades ago). Oh, how time flies when you are enjoying yourself.

    "In the Working Guides to Estimating and Forecasting Alan has managed to capture the full spectrum of relevant topics with simple explanations, practical examples and academic rigor, while injecting humour into the narrative." Dale Shermon, Chairman, Society of Cost Analysis and Forecasting (SCAF).

    "If estimating has always baffled you, this innovative well illustrated and user friendly book will prove a revelation to its mysteries. To confidently forecast, minimise risk and reduce uncertainty we need full disclosure into the science and art of estimating. Thankfully, and at long last the "Working Guides to Estimating & Forecasting" are exactly that, full of practical examples giving clarity, understanding and validity to the techniques. These are comprehensive step by step guides in understanding the principles of estimating using experientially based models to analyse the most appropriate, repeatable, transparent and credible outcomes. Each of the five volumes affords a valuable tool for both corporate reference and an outstanding practical resource for the teaching and training of this elusive and complex subject. I wish I had access to such a thorough reference when I started in this discipline over 15 years ago, I am looking forward to adding this to my library and using it with my team." - Tracey L Clavell, Head of Estimating & Pricing, BAE Systems Australia

    "At last, a comprehensive compendium on these engineering math subjects, essential to both the new and established "cost engineer"! As expected the subjects are presented with the author’s usual wit and humour on complex and daunting "mathematically challenging" subjects. As a professional trainer within the MOD Cost Engineering community trying to embed this into my students, I will be recommending this series of books as essential bedtime reading." - Steve Baker, Senior Cost Engineer, DE&S MOD

    "Alan has been a highly regarded member of the Cost Estimating and forecasting profession for several years. He is well known for an ability to reduce difficult topics and cost estimating methods down to something that is easily digested. As a master of this communication he would most often be found providing training across the cost estimating and forecasting tools and at all levels of expertise. With this 5-volume set, Working Guides to Estimating and Forecasting, Alan has brought his normal verbal training method into a written form. Within their covers Alan steers away from the usual dry academic script into establishing an almost 1:1 relationship with the reader. For my money a recommendable read for all levels of the Cost Estimating and forecasting profession and those who simply want to understand what is in the ‘blackbox’ just a bit more." - Prof Robert Mills, Margin Engineering, Birmingham City University. MACOSTE, SCAF, ICEAA.

    "Finally, a book to fill the gap in cost estimating and forecasting! Although other publications exist in this field, they tend to be light on detail whilst also failing to cover many of the essential aspects of estimating and forecasting. Jones covers all this and more from both a theoretical and practical point of view, regularly drawing on his considerable experience in the defence industry to provide many practical examples to support his comments. Heavily illustrated throughout, and often presented in a humorous fashion, this is a must read for those who want to understand the importance of cost estimating within the broader field of project management." - Dr Paul Blackwell, Lecturer in Management of Projects, The University of Manchester, UK.

    "Alan Jones provides a useful guidebook and navigation aid for those entering the field of estimating as well as an overview for more experienced practitioners. His humorous asides supplement a thorough explanation of techniques to liven up and illuminate an area which has little attention in the literature, yet is the basis of robust project planning and successful delivery. Alan’s talent for explaining the complicated science and art of estimating in practical terms is testament to his knowledge of the subject and to his experience in teaching and training." - Therese Lawlor-Wright, Principal Lecturer in Project Management at the University of Cumbria

    "Alan Jones has created an in depth guide to estimating and forecasting that I have not seen historically. Anyone wishing to improve their awareness in this field should read this and learn from the best." Richard Robinson, Technical Principal for Estimating, Mott MacDonald

    "The book series of ‘Working Guides to Estimating and Forecasting’ is an essential read for students, academics and practitioners who interested in developing a good understanding of cost estimating and forecasting from real-life perspectives". Professor Essam Shehab, Professor of Digital Manufacturing and Head of Cost Engineering, Cranfield University, UK.

    "In creating the Working Guides to Estimating and Forecasting, Alan has captured the core approaches and techniques required to deliver robust and reliable estimates in a single series. Some of the concepts can be challenging, however, Alan has delivered them to the reader in a very accessible way that supports lifelong learning. Whether you are an apprentice, academic or a seasoned professional, these working guides will enhance your ability to understand the alternative approaches to generating a well-executed, defensible estimate, increasing your ability to support competitive advantage in your organisation." - Professor Andrew Langridge, Royal Academy of Engineering Visiting Professor in Whole Life Cost Engineering and Cost Data Management, University of Bath, UK.

    "Alan Jones’s "Working Guides to Estimating and Forecasting" provides an excellent guide for all levels of cost estimators from the new to the highly experienced. Not only does he cover the underpinning good practice for the field, his books will take you on a journey from cost estimating basics through to how estimating should be used in manufacturing the future – reflecting on a whole life cycle approach. He has written a must-read book for anyone starting cost estimating as well as for those who have been doing estimates for years. Read this book and learn from one of the best."Linda Newnes, Professor of Cost Engineering, University of Bath, UK.