1st Edition

Performance, Reliability, and Availability Evaluation of Computational Systems, Volume 2 Reliability, Availability Modeling, Measuring, and Data Analysis

By Paulo Romero Martins Maciel Copyright 2023
    748 Pages 300 B/W Illustrations
    by Chapman & Hall

    748 Pages 300 B/W Illustrations
    by Chapman & Hall

    This textbook intends to be a comprehensive and substantially self-contained two-volume book covering performance, reliability, and availability evaluation subjects. The volumes focus on computing systems, although the methods may also be applied to other systems. The first volume covers Chapter 1 to Chapter 14, whose subtitle is ``Performance Modeling and Background". The second volume encompasses Chapter 15 to Chapter 25 and has the subtitle ``Reliability and Availability Modeling, Measuring and Workload, and Lifetime Data Analysis".

    This text is helpful for computer performance professionals for supporting planning, design, configuring, and tuning the performance, reliability, and availability of computing systems. Such professionals may use these volumes to get acquainted with specific subjects by looking at the particular chapters. Many examples in the textbook on computing systems will help them understand the concepts covered in each chapter. The text may also be helpful for the instructor who teaches performance, reliability, and availability evaluation subjects. Many possible threads could be configured according to the interest of the audience and the duration of the course. Chapter 1 presents a good number of possible courses programs that could be organized using this text.

    Volume II is composed of the last two parts. Part III examines reliability and availability modeling by covering a set of fundamental notions, definitions, redundancy procedures, and modeling methods such as Reliability Block Diagrams (RBD) and Fault Trees (FT) with the respective evaluation methods, adopts Markov chains, Stochastic Petri nets and even hierarchical and heterogeneous modeling to represent more complex systems. Part IV discusses performance measurements and reliability data analysis. It first depicts some basic measuring mechanisms applied in computer systems, then discusses workload generation. After, we examine failure monitoring and fault injection, and finally, we discuss a set of techniques for reliability and maintainability data analysis.

    PART III Reliability and AvailabilityModeling

    Chapter 15 Fundamentals of Dependability

    Chapter 16 Redundancy

    Chapter 17 Reliability Block Diagram

    Chapter 18 Fault Tree

    Chapter 19 CombinatorialModel Analysis

    Chapter 20 Modeling Availability, Reliability and Capacity with CTMC

    Chapter 21 Modeling Availability, Reliability and Capacity with SPN

    PART IV Measuring and Data Analysis

    Chapter 22 PerformanceMeasuring

    Chapter 23 Workload Characterization

    Chapter 24 Life Time Data Analysis

    Chapter 25 Fault Injection and Failure Monitoring

    Appendix A Whetsone

    Appendix B Linpack˙Bench

    Appendix C Livermore Loops

    Appendix D MMP - CTMC Trace Generator

    Biography

    Paulo Romero Martins Maciel is Full Professor at Centro de Informática da Universidade Federal de Pernambuco (UFPE), Brazil