1st Edition

Computational Methods of Feature Selection

Edited By Huan Liu, Hiroshi Motoda Copyright 2007
    440 Pages 91 B/W Illustrations
    by Chapman & Hall

    Due to increasing demands for dimensionality reduction, research on feature selection has deeply and widely expanded into many fields, including computational statistics, pattern recognition, machine learning, data mining, and knowledge discovery. Highlighting current research issues, Computational Methods of Feature Selection introduces the basic concepts and principles, state-of-the-art algorithms, and novel applications of this tool.

    The book begins by exploring unsupervised, randomized, and causal feature selection. It then reports on some recent results of empowering feature selection, including active feature selection, decision-border estimate, the use of ensembles with independent probes, and incremental feature selection. This is followed by discussions of weighting and local methods, such as the ReliefF family, k-means clustering, local feature relevance, and a new interpretation of Relief. The book subsequently covers text classification, a new feature selection score, and both constraint-guided and aggressive feature selection. The final section examines applications of feature selection in bioinformatics, including feature construction as well as redundancy-, ensemble-, and penalty-based feature selection.

    Through a clear, concise, and coherent presentation of topics, this volume systematically covers the key concepts, underlying principles, and inventive applications of feature selection, illustrating how this powerful tool can efficiently harness massive, high-dimensional data and turn it into valuable, reliable information.

    PREFACE
    Introduction and Background
    Less Is More
    Huan Liu and Hiroshi Motoda
    Background and Basics
    Supervised, Unsupervised, and Semi-Supervised Feature Selection
    Key Contributions and Organization of the Book
    Looking Ahead
    Unsupervised Feature Selection
    Jennifer G. Dy
    Introduction
    Clustering
    Feature Selection
    Feature Selection for Unlabeled Data
    Local Approaches
    Summary
    Randomized Feature Selection
    David J. Stracuzzi
    Introduction
    Types of Randomizations
    Randomized Complexity Classes
    Applying Randomization to Feature Selection
    The Role of Heuristics
    Examples of Randomized Selection Algorithms
    Issues in Randomization
    Summary
    Causal Feature Selection
    Isabelle Guyon, Constantin Aliferis, and André Elisseeff
    Introduction
    Classical “Non-Causal” Feature Selection
    The Concept of Causality
    Feature Relevance in Bayesian Networks
    Causal Discovery Algorithms
    Examples of Applications
    Summary, Conclusions, and Open Problems
    Extending Feature Selection
    Active Learning of Feature Relevance
    Emanuele Olivetti, Sriharsha Veeramachaneni, and Paolo Avesani
    Introduction
    Active Sampling for Feature Relevance Estimation
    Derivation of the Sampling Benefit Function
    Implementation of the Active Sampling Algorithm
    Experiments
    Conclusions and Future Work
    A Study of Feature Extraction Techniques Based on Decision Border Estimate
    Claudia Diamantini and Domenico Potena
    Introduction
    Feature Extraction Based on Decision Boundary
    Generalities about Labeled Vector Quantizers
    Feature Extraction Based on Vector Quantizers
    Experiments
    Conclusions
    Ensemble-Based Variable Selection Using Independent Probes
    Eugene Tuv, Alexander Borisov, and Kari Torkkola
    Introduction
    Tree Ensemble Methods in Feature Ranking
    The Algorithm: Ensemble-Based Ranking against Independent Probes
    Experiments
    Discussion
    Efficient Incremental-Ranked Feature Selection in Massive Data
    Roberto Ruiz, Jesús S. Aguilar-Ruiz, and José C. Riquelme
    Introduction
    Related Work
    Preliminary Concepts
    Incremental Performance over Ranking
    Experimental Results
    Conclusions
    Weighting and Local Methods
    Non-Myopic Feature Quality Evaluation with (R)ReliefF
    Igor Kononenko and Marko Robnik Šikonja
    Introduction
    From Impurity to Relief
    ReliefF for Classification and RReliefF for Regression
    Extensions
    Interpretation
    Implementation Issues
    Applications
    Conclusion
    Weighting Method for Feature Selection in k-Means
    Joshua Zhexue Huang, Jun Xu, Michael Ng, and Yunming Ye
    Introduction
    Feature Weighting in k-Means
    W-k-Means Clustering Algorithm
    Feature Selection
    Subspace Clustering with k-Means
    Text Clustering
    Related Work
    Discussions
    Local Feature Selection for Classification
    Carlotta Domeniconi and Dimitrios Gunopulos
    Introduction
    The Curse of Dimensionality
    Adaptive Metric Techniques
    Large Margin nearest Neighbor Classifiers
    Experimental Comparisons
    Conclusions
    Feature Weighting through Local Learning
    Yijun Sun
    Introduction
    Mathematical Interpretation of Relief
    Iterative Relief Algorithm
    Extension to Multiclass Problems
    Online Learning
    Computational Complexity
    Experiments
    Conclusion
    Text Classification and Clustering
    Feature Selection for Text Classification
    George Forman
    Introduction
    Text Feature Generators
    Feature Filtering for Classification
    Practical and Scalable Computation
    A Case Study
    Conclusion and Future Work
    A Bayesian Feature Selection Score Based on Naïve Bayes Models
    Susana Eyheramendy and David Madigan
    Introduction
    Feature Selection Scores
    Classification Algorithms
    Experimental Settings and Results
    Conclusion
    Pairwise Constraints-Guided Dimensionality Reduction
    Wei Tang and Shi Zhong
    Introduction
    Pairwise Constraints-Guided Feature Projection
    Pairwise Constraints-Guided Co-Clustering
    Experimental Studies
    Conclusion and Future Work
    Aggressive Feature Selection by Feature Ranking
    Masoud Makrehchi and Mohamed S. Kamel
    Introduction
    Feature Selection by Feature Ranking
    Proposed Approach to Reducing Term Redundancy
    Experimental Results
    Summary
    Feature Selection in Bioinformatics
    Feature Selection for Genomic Data Analysis
    Lei Yu
    Introduction
    Redundancy-Based Feature Selection
    Empirical Study
    Summary
    A Feature Generation Algorithm with Applications to Biological Sequence Classification
    Rezarta Islamaj Dogan, Lise Getoor, and W. John Wilbur
    Introduction
    Splice-Site Prediction
    Feature Generation Algorithm
    Experiments and Discussion
    Conclusions
    An Ensemble Method for Identifying Robust Features for Biomarker Discovery
    Diana Chan, Susan M. Bridges, and Shane C. Burgess
    Introduction
    Biomarker Discovery from Proteome Profiles
    Challenges of Biomarker Identification
    Ensemble Method for Feature Selection
    Feature Selection Ensemble
    Results and Discussion
    Conclusion
    Model Building and Feature Selection with Genomic Data
    Hui Zou and Trevor Hastie
    Introduction
    Ridge Regression, Lasso, and Bridge
    Drawbacks of the Lasso
    The Elastic Net
    The Elastic-Net Penalized SVM
    Sparse Eigen-Genes
    Summary
    INDEX

    Biography

    Huan Liu, Hiroshi Motoda

    This book is a really comprehensive review of the modern techniques designed for feature selection in very large datasets. Dozens of algorithms and their comparisons in experiments with synthetic and real data are presented, which can be very helpful to researchers and students working with large data stores.
    —Stan Lipovetsky, Technometrics, November 2010

    Overall, we enjoyed reading this book. It presents state-of-the-art guidance and tutorials on methodologies and algorithms in computational methods in feature selection. Enhanced by the editors insights, and based on previous work by these leading experts in the field, the book forms another milestone of relevant research and development in feature selection.
    —Longbing Cao and David Taniar, IEEE Intelligent Informatics Bulletin, 2008, Vol. 99, No. 99