Knowledge Discovery from Data Streams: 1st Edition (Hardback) book cover

Knowledge Discovery from Data Streams

1st Edition

By Joao Gama

Chapman and Hall/CRC

255 pages | 62 B/W Illus.

Purchasing Options:$ = USD
Hardback: 9781439826119
pub: 2010-05-25
$98.95
x
eBook (VitalSource) : 9780429103797
pub: 2010-05-25
from $28.98


FREE Standard Shipping!

Description

Since the beginning of the Internet age and the increased use of ubiquitous computing devices, the large volume and continuous flow of distributed data have imposed new constraints on the design of learning algorithms. Exploring how to extract knowledge structures from evolving and time-changing data, Knowledge Discovery from Data Streams presents a coherent overview of state-of-the-art research in learning from data streams.

The book covers the fundamentals that are imperative to understanding data streams and describes important applications, such as TCP/IP traffic, GPS data, sensor networks, and customer click streams. It also addresses several challenges of data mining in the future, when stream mining will be at the core of many applications. These challenges involve designing useful and efficient data mining solutions applicable to real-world problems. In the appendix, the author includes examples of publicly available software and online data sets.

This practical, up-to-date book focuses on the new requirements of the next generation of data mining. Although the concepts presented in the text are mainly about data streams, they also are valid for different areas of machine learning and data mining.

Reviews

… this book is the first authored text (that is, not an edited collection) about the area … The book covers a lot of ground in just 200 pages, including discussion of relatively advanced methods such as wavelets, bagging, boosting, dynamic time warping, and symbolic representation of time series. There is also, I was pleased to see, a chapter on evaluating streaming algorithms … . Evaluation, in general, deserves more attention than it generally receives, so I was delighted to see the focus on it here. … a good introduction to an area of data analysis which is going to be very important indeed.

—David J. Hand, International Statistical Review, 2012

Gama is one of the leading investigators in the hottest research topic in machine learning and data mining: data streams. … This book is the first book to didactically cover in a clear, comprehensive and mathematically rigorous way the main machine learning related aspects of this relevant research field. … an up-to-date, broad and useful source of reference for all those interested in knowledge acquisition by learning techniques.

—From the Foreword by André Ponce de Leon Ferreira de Carvalho, University of São Paulo, Brazil

Table of Contents

Knowledge Discovery from Data Streams

Introduction

An Illustrative Example

A World in Movement

Data Mining and Data Streams

Introduction to Data Streams

Data Stream Models

Basic Streaming Methods

Illustrative Applications

Change Detection

Introduction

Tracking Drifting Concepts

Monitoring the Learning Process

Final Remarks

Maintaining Histograms from Data Streams

Introduction

Histograms from Data Streams

The Partition Incremental Discretization (PiD) Algorithm

Applications to Data Mining

Evaluating Streaming Algorithms

Introduction

Learning from Data Streams

Evaluation Issues

Lessons Learned and Open Issues

Clustering from Data Streams

Introduction

Clustering Examples

Clustering Variables

Frequent Pattern Mining

Introduction to Frequent Itemset Mining

Heavy Hitters

Mining Frequent Itemsets from Data Streams

Sequence Pattern Mining

Decision Trees from Data Streams

Introduction

The Very Fast Decision Tree Algorithm

Extensions to the Basic Algorithm

OLIN: Info-Fuzzy Algorithms

Novelty Detection in Data Streams

Introduction

Learning and Novelty

Novelty Detection as a One-Class Classification Problem

Learning New Concepts

The Online Novelty and Drift Detection Algorithm

Ensembles of Classifiers

Introduction

Linear Combination of Ensembles

Sampling from a Training Set

Ensembles of Trees

Adapting to Drift Using Ensembles of Classifiers

Mining Skewed Data Streams with Ensembles

Time Series Data Streams

Introduction to Time Series Analysis

Time Series Prediction

Similarity between Time Series

Symbolic Approximation (SAX)

Ubiquitous Data Mining

Introduction to Ubiquitous Data Mining

Distributed Data Stream Monitoring

Distributed Clustering

Algorithm Granularity

Final Comments

The Next Generation of Knowledge Discovery

Where We Want to Go

Appendix: Resources

Bibliography

Index

Notes appear at the end of each chapter.

About the Author

João Gama is an associate professor and senior researcher in the Laboratory of Artificial Intelligence and Decision Support (LIAAD) at the University of Porto in Portugal.

About the Series

Chapman & Hall/CRC Data Mining and Knowledge Discovery Series

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
BUS061000
BUSINESS & ECONOMICS / Statistics
COM000000
COMPUTERS / General
COM021030
COMPUTERS / Database Management / Data Mining