Feature Engineering for Machine Learning and Data Analytics: 1st Edition (Hardback) book cover

Feature Engineering for Machine Learning and Data Analytics

1st Edition

Edited by Guozhu Dong, Huan Liu

CRC Press

400 pages | 76 B/W Illus.

Purchasing Options:$ = USD
Hardback: 9781138744387
pub: 2018-04-04
$99.95
x
eBook (VitalSource) : 9781315181080
pub: 2018-03-14
from $49.98


FREE Standard Shipping!

Description

Feature engineering plays a vital role in big data analytics. Machine learning and data mining algorithms cannot work without data. Little can be achieved if there are few features to represent the underlying data objects, and the quality of results of those algorithms largely depends on the quality of the available features. Feature Engineering for Machine Learning and Data Analytics provides a comprehensive introduction to feature engineering, including feature generation, feature extraction, feature transformation, feature selection, and feature analysis and evaluation.

The book presents key concepts, methods, examples, and applications, as well as chapters on feature engineering for major data types such as texts, images, sequences, time series, graphs, streaming data, software engineering data, Twitter data, and social media data. It also contains generic feature generation approaches, as well as methods for generating tried-and-tested, hand-crafted, domain-specific features.

The first chapter defines the concepts of features and feature engineering, offers an overview of the book, and provides pointers to topics not covered in this book. The next six chapters are devoted to feature engineering, including feature generation for specific data types. The subsequent four chapters cover generic approaches for feature engineering, namely feature selection, feature transformation based feature engineering, deep learning based feature engineering, and pattern based feature generation and engineering. The last three chapters discuss feature engineering for social bot detection, software management, and Twitter-based applications respectively.

This book can be used as a reference for data analysts, big data scientists, data preprocessing workers, project managers, project developers, prediction modelers, professors, researchers, graduate students, and upper level undergraduate students. It can also be used as the primary text for courses on feature engineering, or as a supplement for courses on machine learning, data mining, and big data analytics.

Table of Contents

1. Preliminaries and Overview

Guozhu Dong and Huan Liu

Preliminaries

Overview of the Chapters

Beyond this Book

2 Feature Engineering for Text Data

Chase Geigle, Qiaozhu Mei, and ChengXiang Zhai

Overview of Text Representation

Text as Strings

Sequence of Words Representation

Bag of Words Representation

Structural Representation of Text

Latent Semantic Representation

Explicit Semantic Representation

Embeddings for Text Representation

Context-Sensitive Text Representation

 

3 Feature Extraction and Learning for Visual Data

Parag S. Chandakkar, Ragav Venkatesan, and Baoxin Li

Classical Visual Feature Representations

Latent-feature Extraction

Deep Image Features

4 Feature-based time-series analysis

Ben D. Fulcher

Feature-based representations of time series

Global features

Subsequence features

Combining time-series representations

Feature-based forecasting

5 Feature Engineering for Data Streams

Yao Ma, Jiliang Tang, and Charu Aggarwal

Streaming Settings

Linear Methods for Streaming Feature Construction

Non-linear Methods for Streaming Feature Construction

Feature Selection for Data Streams with Streaming Feature

Feature Selection for Data Streams with Streaming Instances

Discussions and Challenges

6 Feature Generation and Feature Engineering for Sequences

Guozhu Dong, Lei Duan, Jyrki Nummenmaa, and Peng Zhang

Basics on Sequence Data and Sequence Patterns

Approaches to Using Patterns in Sequence Features

Traditional Pattern-Based Sequence Features

Mined Sequence Patterns for Use in Sequence Features

Sequence Features Not De_ned by Patterns

Sequence Databases

7 Feature Generation for Graphs and Networks

Yuan Yao, Hanghang Tong, Feng Xu, and Jian Lu

Feature Types

Feature Generation .

Feature Usages

Future Directions

8 Feature Selection and Evaluation

Yun Li and Tao Li

Feature Selection Frameworks

Advanced Topics for Feature Selection

Future Work and Conclusion

9 Automating Feature Engineering in Supervised Learning

Udayan Khurana

A Few Simple Approaches

Hierarchical Exploration of Feature Transformations

Learning Optimal Traversal Policy

Finding E_ective Features without Model Training

Miscellenious

10 Pattern based Feature Generation

Yunzhe Jia, James Bailey, Ramamohanarao Kotagiri, and Christopher

Leckie

Preliminaries

Framework of pattern based feature generation

Pattern mining algorithms

Pattern selection approaches .

Pattern based feature generation

Pattern based feature generation for classi_cation

Pattern based feature generation for clustering

11 Deep Learning for Feature Representation

Suhang Wang and Huan Liu

Restricted Boltzmann Machine

AutoEncoder

Convolutional Neural Networks

Word Embedding and Recurrent Neural Networks .

Generative Adversarial Networks and Variational Autoencoder

Discussion and Further Readings

12 Feature Engineering for Social Bot Detection

Onur Varol, Clayton A. Davis, Filippo Menczer, and Alessandro Flammini

Social bot detection .

Online bot detection framework

13 Feature Generation and Engineering for Software Analytics

Xin Xia and David Lo

Features for Defect Prediction

Features for Crash Release Prediction for Apps

Features from Mining Monthly Reports to Predict Developer Turnover

14 Feature Engineering for Twitter-based Applications

Sanjaya Wijeratne, Amit Sheth, Shrenyansh Bhatt, Lakshika Balasuriya, Hussein S. Al-Olimat, Manas Gaur, Amir Hossein Yazdavar, Krishnaprasad Thirunarayan

About the Editors

Dr. Guozhu Dong is a professor of Computer Science and Engineering at Wright State University. He obtained his Ph.D. in Computer Science from University of Southern California and his B.S. in Mathematics from Shandong University. Before joining Wright State University, he was a faculty member at Flinders University and then at the University of Melbourne. At Wright State University, he was recognized for Excellence in Research in the College of Engineering and Computer Science. His research interests are in data mining, machine learning, database, data science, and artificial intelligence. He co-authored a book on Sequence Data Mining and co-edited a book on Contrast Data Mining. He has served on numerous conference program committees.

Dr. Huan Liu is a professor of Computer Science and Engineering at Arizona State University. He obtained his Ph.D. in Computer Science at University of Southern California and B.Eng. in Computer Science and Electrical Engineering at Shanghai JiaoTong University. Before he joined ASU, he worked at Telecom Australia Research Labs and was on the faculty at National University of Singapore. At Arizona State University, he was recognized for excellence in teaching and research in Computer Science and Engineering and received the 2014 President's Award for Innovation. His research interests are in data mining, machine learning, social computing, and artificial intelligence, investigating interdisciplinary problems that arise in many real-world, data-intensive applications with high-dimensional data of disparate forms such as social media. His well-cited publications include books, book chapters, encyclopedia entries as well as conference and journal papers. He is a co-author of Social Media Mining: An Introduction by Cambridge University Press. He serves on journal editorial boards and numerous conference program committees, and is a founding organizer of the International Conference Series on Social Computing, Behavioral-Cultural Modeling, and Prediction. He is an IEEE Fellow. More can be found at http://www.public.asu.edu/~huanliu.

About the Series

Chapman & Hall/CRC Data Mining and Knowledge Discovery Series

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
BUS061000
BUSINESS & ECONOMICS / Statistics
COM021030
COMPUTERS / Database Management / Data Mining
COM037000
COMPUTERS / Machine Theory