Data Mining Tools for Malware Detection: 1st Edition (Hardback) book cover

Data Mining Tools for Malware Detection

1st Edition

By Mehedy Masud, Latifur Khan, Bhavani Thuraisingham

Auerbach Publications

450 pages | 131 B/W Illus.

Purchasing Options:$ = USD
Hardback: 9781439854549
pub: 2011-12-07
SAVE ~$20.00
$100.00
$80.00
x


FREE Standard Shipping!

Description

Although the use of data mining for security and malware detection is quickly on the rise, most books on the subject provide high-level theoretical discussions to the near exclusion of the practical aspects. Breaking the mold, Data Mining Tools for Malware Detection provides a step-by-step breakdown of how to develop data mining tools for malware detection. Integrating theory with practical techniques and experimental results, it focuses on malware detection applications for email worms, malicious code, remote exploits, and botnets.

The authors describe the systems they have designed and developed: email worm detection using data mining, a scalable multi-level feature extraction technique to detect malicious executables, detecting remote exploits using data mining, and flow-based identification of botnet traffic by mining multiple log files. For each of these tools, they detail the system architecture, algorithms, performance results, and limitations.

  • Discusses data mining for emerging applications, including adaptable malware detection, insider threat detection, firewall policy analysis, and real-time data mining
  • Includes four appendices that provide a firm foundation in data management, secure systems, and the semantic web
  • Describes the authors’ tools for stream data mining

From algorithms to experimental results, this is one of the few books that will be equally valuable to those in industry, government, and academia. It will help technologists decide which tools to select for specific applications, managers will learn how to determine whether or not to proceed with a data mining project, and developers will find innovative alternative designs for a range of applications.

Table of Contents

Introduction

Trends

Data Mining and Security Technologies

Data Mining for Email Worm Detection

Data Mining for Malicious Code Detection

Data Mining for Detecting Remote Exploits

Data Mining for Botnet Detection

Stream Data Mining

Emerging Data Mining Tools for Cyber Security Applications

Organization of This Book

Next Steps

Part I: DATA MINING AND SECURITY

Introduction to Part I: Data Mining and Security

Data Mining Techniques

Introduction

Overview of Data Mining Tasks and Techniques

Artificial Neural Network

Support Vector Machines

Markov Model

Association Rule Mining (ARM)

Multi-class Problem

2.7.1 One-VS-One

2.7.2 One-VS-All

Image Mining

2.8.1 Feature Selection

2.8.2 Automatic Image Annotation

2.8.3 Image Classification

Summary

References

Malware

Introduction

Viruses

Worms

Trojan Horses

Time and Logic Bombs

Botnet

Spyware

Summary

References

Data Mining for Security Applications

Overview

Data Mining for Cyber Security

4.2.1 Overview

4.2.2 Cyber-terrorism, Insider Threats, and External Attacks

4.2.3 Malicious Intrusions

4.2.4 Credit Card Fraud and Identity Theft

4.2.5 Attacks on Critical Infrastructures

4.2.6 Data Mining for Cyber Security

Current Research and Development

Summary

References

Design and Implementation of Data Mining Tools

Introduction

Intrusion Detection

Web Page Surfing Prediction

Image Classification

Summary and Directions

References

Conclusion to Part I

DATA MINING FOR EMAIL WORM DETECTION

Introduction to Part II

Email Worm Detection

Introduction

Architecture

Related Work

Overview of Our Approach

Summary

References

Design of the Data Mining Tool

Introduction

Architecture

Feature Description

7.3.1 Per-Email Features

7.3.2 Per-Window Features

Feature Reduction Techniques

7.4.1 Dimension Reduction

7.4.2 Two-Phase Feature Selection (TPS)

7.4.2.1 Phase I

7.4.2.2 Phase II

Classification Techniques

Summary

References

Evaluation and Results

Introduction

Dataset

Experimental Setup

Results

8.4.1 Results from Unreduced Data

8.4.2 Results from PCA-Reduced Data

8.4.3 Results from Two-Phase Selection

Summary

References

Conclusion to Part II

Part III: DATA MINING FOR DETECTING MALICIOUS EXECUTABLES

Introduction to Part III

Malicious Executables

Introduction

Architecture

Related Work

Hybrid Feature Retrieval (HFR) Model

Summary and Directions

References

Design of the Data Mining Tool

Introduction

Feature Extraction Using n-Gram Analysis

10.2.1 Binary n-Gram Feature

10.2.2 Feature Collection

10.2.3 Feature Selection

10.2.4 Assembly n-Gram Feature

10.2.5 DLL Function Call Feature

The Hybrid Feature Retrieval Model

10.3.1 Description of the Model

10.3.2 The Assembly Feature Retrieval (AFR) Algorithm

10.3.3 Feature Vector Computation and Classification

Summary and Directions

References

Evaluation and Results

Introduction

Experiments

Dataset

Experimental Setup

Results

11.5.1 Accuracy

11.5.1.1 Dataset1

11.5.1.2 Dataset2

11.5.1.3 Statistical Significance Test

11.5.1.4 DLL Call Feature

11.5.2 ROC Curves

11.5.3 False Positive and False Negative

11.5.4 Running Time

11.5.5 Training and Testing with Boosted J48

Example Run

Summary and Directions

References

Conclusion to Part III

DATA MINING FOR DETECTING REMOTE EXPLOITS

Introduction to Part IV

Detecting Remote Exploits

Introduction

Architecture

Related Work

Overview of Our Approach

Summary and Directions

References

Design of the Data Mining Tool

Introduction

DExtor Architecture

Disassembly

Feature Extraction

13.4.1 Useful Instruction Count (UIC)

13.4.2 Instruction Usage Frequencies (IUF)

13.4.3 Code vs. Data Length (CDL)

Combining Features and Compute Combined Feature Vector

Classification

Summary and Directions

References

Evaluation and Results

Introduction

Dataset

Experimental Setup

14.3.1 Parameter Settings

14.2.2 Baseline Techniques

Results

14.4.1 Running Time

Analysis

Robustness and Limitations

14.6.1 Robustness against Obfuscations

14.6.2 Limitations

Summary and Directions

References

Conclusion to Part IV

Part V: DATA MINING FOR DETECTING BOTNETS

Introduction to Part V

Detecting Botnets

Introduction

Botnet Architecture

Related Work

Our Approach

Summary and Directions

References

Design of the Data Mining Tool

Introduction

Architecture

System Setup

Data Collection

Bot Command Categorization

Feature Extraction

16.6.1 Packet-level Features

16.6.2 Flow-level Features

Log File Correlation

Classification

Packet Filtering

Summary and Directions

References

Evaluation and Results

Introduction

17.1.1 Baseline Techniques

17.1.2 Classifiers

Performance on Different Datasets

Comparison with Other Techniques

Further Analysis

Summary and Directions

References

Conclusion to Part V

STREAM MINING FOR SECURITY APPLICATIONS

Introduction to Part VI

Stream Mining

Introduction

Architecture

Related Work

Our Approach

Overview of the Novel Class Detection Algorithm

Classifiers Used

Security Applications

Summary

References

Design of the Data Mining Tool

Introduction

Definitions

Novel Class Detection

19.3.1 Saving the Inventory of Used Spaces during Training

19.3.1.1 Clustering

19.3.1.2 Storing the Cluster Summary Information

19.3.2 Outlier Detection and Filtering

19.3.2.1 Filtering

19.3.2.2 Detecting Novel Class

Security Applications

Summary and Directions

Reference

Evaluation and Results

Introduction

Datasets

20.2.1 Synthetic Data with Only Concept-Drift (SynC)

20.2.2 Synthetic Data with Concept-Drift and Novel Class (SynCN)

20.2.3 Real Data—KDDCup 99 Network Intrusion Detection

20.2.4 Real Data—Forest Cover (UCI Repository)

Experimental Setup

20.3.1 Baseline Method

Performance Study

20.4.1 Evaluation Approach

20.4.2 Results

20.4.3 Running Time

Summary and Directions

References

Conclusion for Part VI

EMERGING APPLICATIONS

Introduction to Part VII

Data Mining For Active Defense

Introduction

Related Work

Architecture

A Data Mining–Based Malware Detection Model

21.4.1 Our Framework

21.4.2 Feature Extraction

21.4.2.1 Binary n-Gram Feature Extraction

21.4.2.2 Feature Selection

21.4.2.3 Feature Vector Computation

21.4.3 Training

21.4.4 Testing

Model-Reversing Obfuscations

21.5.1 Path Selection

21.5.2 Feature Insertion

21.5.3 Feature Removal

Experiments

Summary and Directions

References

Data Mining for Insider Threat Detection

Introduction

The Challenges, Related Work, and Our Approach

Data Mining for Insider Threat Detection

22.3.1 Our Solution Architecture

22.3.2 Feature Extraction and Compact Representation

22.3.3 RDF Repository Architecture

22.3.4 Data Storage

22.3.4.1 File Organization

22.3.4.2 Predicate Split (PS)

22.3.4.3 Predicate Object Split (POS)

22.3.5 Answering Queries Using Hadoop MapReduce

22.3.6 Data Mining Applications

Comprehensive Framework

Summary and Directions

References

Dependable Real-Time Data Mining

Introduction

Issues in Real-Time Data Mining

Real-Time Data Mining Techniques

Parallel, Distributed, Real-Time Data Mining

Dependable Data Mining

Mining Data Streams

Summary and Directions

References

Firewall Policy Analysis

Introduction

Related Work

Firewall Concepts

24.3.1 Representation of Rules

24.3.2 Relationship between Two Rules

24.3.3 Possible Anomalies between Two Rules

Anomaly Resolution Algorithms

24.4.1 Algorithms for Finding and Resolving Anomalies

24.4.1.1 Illustrative Example

24.4.2 Algorithms for Merging Rules

24.4.2.1 Illustrative Example of the Merge Algorithm

Summary and Directions

References

Conclusion to Part VII

Summary and Directions

Overview

Summary of This Book

Directions for Data Mining Tools for Malware Detection

Where Do We Go from Here?

Appendix A: Data Management Systems: Developments and Trends

Overview

Developments in Database Systems

Status, Vision, and Issues

Data Management Systems Framework

Building Information Systems from the Framework

Relationship between the Texts

Summary and Directions

References

Appendix B: Trustworthy Systems

Overview

Secure Systems

B.2.1 Overview

B.2.2 Access Control and Other Security Concepts

B.2.3 Types of Secure Systems

B.2.4 Secure Operating Systems

B.2.5 Secure Database Systems

B.2.6 Secure Networks

B.2.7 Emerging Trends

B.2.8 Impact of the Web

B.2.9 Steps to Building Secure Systems

Web Security

Building Trusted Systems from Untrusted Components

Dependable Systems

B.5.1 Overview

B.5.2 Trust Management

B.5.3 Digital Rights Management

About the Authors

Mehedy Masud is a postdoctoral fellow at the University of Texas at Dallas (UTD), where he earned his PhD in computer science in December 2009. He has published in premier journals and conferences, including IEEE Transactions on Knowledge and Data Engineering and the IEEE Data Mining Conference. He will be appointed as a research assistant professor at UTD in Fall 2012. Masud’s research projects include reactively adaptive malware, data mining for detecting malicious executables, botnet, and remote exploits, and cloud data mining. He has a patent pending on stream mining for novel class detection.

Latifur Khan is an associate professor in the computer science department at the University of Texas at Dallas, where he has been teaching and conducting research since September 2000. He received his PhD and MS degrees in computer science from the University of Southern California in August 2000 and December 1996, respectively. Khan is (or has been) supported by grants from NASA, the National Science Foundation (NSF), Air Force Office of Scientific Research (AFOSR), Raytheon, NGA, IARPA, Tektronix, Nokia Research Center, Alcatel, and the SUN academic equipment grant program. In addition, Khan is the director of the state-of-the-art DML@UTD, UTD Data Mining/Database Laboratory, which is the primary center of research related to data mining, semantic web, and image/videoannotation at the University of Texas at Dallas. Khan has published more than 100 papers, including articles in several IEEE Transactions journals, the Journal of Web Semantics, and the VLDB Journal and conference proceedings such as IEEE ICDM and PKDD. He is a senior member of IEEE.

Bhavani Thuraisingham joined the University of Texas at Dallas (UTD) in October 2004 as a professor of computer science and director of the Cyber Security Research Center in the Erik Jonsson School of Engineering and Computer Science and is currently the Louis Beecherl Jr. Distinguished Professor. She is an elected Fellow of three professional organizations: the IEEE (Institute for Electrical and Electronics Engineers), the AAAS (American Association for the Advancement of Science), and the BCS (British Computer Society) for her work in data security. She received the IEEE Computer Society’s prestigious 1997 Technical Achievement Award for "outstanding and innovative contributions to secure data management." Prior to joining UTD, Thuraisingham worked for the MITRE Corporation for 16 years, which included an IPA (Intergovernmental Personnel Act) at the National Science Foundation as Program Director for Data and Applications Security. Her work in information security and information management has resulted in more than 100 journal articles, more than 200 refereed conference papers, more than 90 keynote addresses, and 3 U.S. patents. She is the author of ten books in data management, data mining, and data security.

Subject Categories

BISAC Subject Codes/Headings:
COM021000
COMPUTERS / Database Management / General
COM021030
COMPUTERS / Database Management / Data Mining
COM053000
COMPUTERS / Security / General