1st Edition

Mechanizing Hypothesis Formation Principles and Case Studies

    362 Pages 10 Color & 213 B/W Illustrations
    by CRC Press

    362 Pages 10 Color & 213 B/W Illustrations
    by CRC Press

    Mechanizing hypothesis formation is an approach to exploratory data analysis. Its development started in the 1960s inspired by the question “can computers formulate and verify scientific hypotheses?”. The development resulted in a general theory of logic of discovery. It comprises theoretical calculi dealing with theoretical statements as well as observational calculi dealing with observational statements concerning finite results of observation. Both calculi are related through statistical hypotheses tests. A GUHA method is a tool of the logic of discovery. It uses a one-to-one relation between theoretical and observational statements to get all interesting theoretical statements. A GUHA procedure generates all interesting observational statements and verifies them in a given observational data. Output of the procedure consists of all observational statements true in the given data. Several GUHA procedures dealing with association rules, couples of association rules, action rules, histograms, couples of histograms, and patterns based on general contingency tables are involved in the LISp-Miner system developed at the Prague University of Economics and Business. Various results about observational calculi were achieved and applied together with the LISp-Miner system.

    The book covers a brief overview of logic of discovery. Many examples of applications of the GUHA procedures to solve real problems relevant to data mining and business intelligence are presented. An overview of recent research results relevant to dealing with domain knowledge in data mining and its automation is provided. Firsthand experiences with implementation of the GUHA method in the Python language are presented.

    1. Introduction  2. Data Sets  SECTION I: THE GUHA PROCEDURES  3. Principle and Simple Examples  4. Common Features  5. LISp-Miner System  SECTION II: APPLYING THE GUHA PROCEDURES  6. Examples Overview  7. 4ft-Miner – GUHA Association Rules  8. CF-Miner – Histograms  9. KL-Miner – Pairs of Categorical Attributes  10. SD-4ft-Miner – Couples of GUHA Association Rules  11. SDCF-Miner – Couples of Histograms  12. SDKL-Miner – Couples of Pairs of Categorical Attributes  13. Ac4ft-Miner – Action Rules  14. GUHA Procedures and Business Intelligence  15. CleverMiner – GUHA and Python  SECTION III: RELATED RESEARCH AND THEORY  16. Artificial Data Generation and LM ReverseMiner Module  17. Applying Domain Knowledge  18. Observational Calculi 

    Biography

    Jan Rauch graduated from the Faculty of Mathematics and Physics of Charles University in Prague. He received his Ph.D. in Mathematical Logic in 1987 from the Institute of Mathematics of the Czechoslovak Academy of Sciences. He is a full professor at the Department of Information and Knowledge Engineering, Prague University of Economics and Business since 2011.

    Milan Šimůnek is an associate professor (since 2012) at the Faculty of Informatics and Statistics, Prague University of Economics and Business. His research activities include data mining, databases, virtual reality and software projects development. He is the software project leader of the LISp-Miner system since its launch in 1996 and author of its core-modules implementation.

    David Chudán is an assistant professor of Applied Informatics at the Faculty of Informatics and Statistics, Prague University of Economics and Business. He received his Ph.D. in 2015 in the field of Applied informatics. His research interests include data mining and machine learning on different tools and platforms. Another research area is GUHA association rules and their complementary usage with business intelligence.

    Petr Máša graduated from the Prague University of Economics and Business and the Faculty of Mathematics and Physics of Charles University in Prague. He received his Ph.D. in 2006. He also works on business projects where he uses data mining, data science, data analytics and he is also business responsible.