Cybersecurity Analytics: 1st Edition (Hardback) book cover

Cybersecurity Analytics

1st Edition

By Rakesh M. Verma, David J. Marchette

Chapman and Hall/CRC

352 pages

Purchasing Options:$ = USD
Hardback: 9780367346010
pub: 2019-11-15
SAVE ~$23.99
Available for pre-order
$119.95
$95.96
x


FREE Standard Shipping!

Description

Cybersecurity Analytics is for the cybersecurity student and professional who wants to learn data science techniques critical for tackling cybersecurity challenges, and for the data science student and professional who wants to learn about cybersecurity adaptations. Trying to build a malware detector, a phishing email detector, or just interested in finding patterns in your datasets? This book can let you do it on your own. Numerous examples and datasets links are included so that the reader can "learn by doing." Anyone with a basic college-level calculus course and some probability knowledge can easily understand most of the material.

The book includes chapters containing: unsupervised learning, semi-supervised learning, supervised learning, text mining, natural language processing, and more. It also includes background on security, statistics, and linear algebra. The website for the book contains a listing of datasets, updates, and other resources for serious practitioners.

Table of Contents

Preface

Introduction

What is Data Analytics?

Data Ingestion

Data Processing and Cleaning

Visualization and Exploratory Analysis

Scatterplots

Pattern Recognition

Classification

Clustering

Feature extraction

Feature Selection

Random Projections

Modeling

Model Specification

Model Selection and Fitting

Evaluation

Strengths and Limitations

The Curse of Dimensionality

Security: Basics and Security Analytics

Basics of Security

Know Thy Enemy – Attackers and Their Motivations

Security Goals

Mechanisms for Ensuring Security Goals

Confidentiality

Integrity

Availability

Authentication

Access Control

Accountability

Non-repudiation

Threats, Attacks and Impacts

Passwords

Malware

Spam, Phishing and its Variants

Intrusions

Internet Surfing

System Maintenance and Firewalls

Other Vulnerabilities

Protecting Against Attacks

Applications of Data Science to Security Challenges

Cybersecurity Datasets

Data Science Applications

Passwords

Malware

Intrusions

Spam/Phishing

Credit Card Fraud/Financial Fraud

Opinion Spam

Denial of Service

Security Analytics and Why Do We Need It

Statistics

Probability Density Estimation

Models

Poisson

Uniform

Normal

Parameter Estimation

The Bias-Variance Trade-Off

The Law of Large Numbers and the Central Limit Theorem

Confidence Intervals

Hypothesis Testing

Bayesian Statistics

Regression

Logistic Regression

Regularization

Principal Components

Multidimensional Scaling

Procrustes

Nonparametric Statistics

Time Series

Data Mining – Unsupervised Learning

Data Collection

Types of Data and Operations

Properties of Datasets

Data Exploration and Preprocessing

Data Exploration

Data Preprocessing/Wrangling

Data Representation

Association Rule Mining

Variations on the Apriori Algorithm

Clustering

Partitional Clustering

Choosing K

Variations on K-means Algorithm

Hierarchical Clustering

Other Clustering Algorithms

Measuring the Clustering Quality

Clustering Miscellany: Clusterability, Robustness, Incremental,

Manifold Discovery

Spectral Embedding

Anomaly Detection

Statistical Methods

Distance-based Outlier Detection

kNN based approach

Density-based Outlier Detection

Clustering-based Outlier Detection

One-class learning based Outliers

Security Applications and Adaptations

Data Mining for Intrusion Detection

Malware Detection

Stepping-stone Detection

Malware Clustering

Directed Anomaly Scoring for Spear Phishing Detection

Concluding Remarks and Further Reading

Machine Learning – Supervised Learning

Fundamentals of Supervised Learning

The Bayes Classifier

Naïve Bayes

Nearest Neighbors Classifiers

Linear Classifiers

Decision Trees and Random Forests

Random Forest

Support Vector Machines

Semi-Supervised Classification

Neural Networks and Deep Learning

Perceptron

Neural Networks

Deep Networks

Topological Data Analysis

Ensemble Learning

Majority

Adaboost

One-class Learning

Online Learning

Adversarial Machine Learning

Adversarial Examples

Adversarial Training

Adversarial Generation

Beyond Continuous Data

Evaluation of Machine Learning

Cost-sensitive Evaluation

New Metrics for Unbalanced Datasets

Security Applications and Adaptations

Intrusion Detection

Malware Detection

Spam and Phishing Detection

For Further Reading

Text Mining

Tokenization

Preprocessing

Bag-Of-Words

Vector space model

Weighting

Latent Semantic Indexing

Embedding

Topic Models: Latent Dirichlet Allocation

Sentiment Analysis

Natural Language Processing

Challenges of NLP

Basics of Language Study and NLP Techniques

Text Preprocessing

Feature Engineering on Text Data

Morphological, Word and Phrasal Features

Clausal and Sentence Level Features

Statistical Features

Corpus-based Analysis

Advanced NLP Tasks

Part of Speech Tagging

Word sense Disambiguation

Language Modeling

Topic Modeling

Sequence to Sequence Tasks

Knowledge Bases and Frameworks

Natural Language Generation

Issues with Pipelining

Security Applications of NLP

Password Checking

Email Spam Detection

Phishing Email Detection

Malware Detection

Attack Generation

Big Data Techniques and Security

Key terms

Ingesting the Data

Persistent Storage

Computing and Analyzing

Techniques for Handling Big Data

Visualizing

Streaming Data

Big Data Security

Implications of Big Data Characteristics on Security and Privacy

Mechanisms for Big Data Security Goals

Linear Algebra Basics

Vectors

Matrices

Eigenvectors and Eigenvalues

The Singular Value Decomposition

Graphs

Graph Invariants

The Laplacian

Probability

Probability

Conditional Probability and Bayes’ Rule

Base Rate Fallacy

Expected Values and Moments

Distribution Functions and Densities

Models

Bernoulli and Binomial

Multinomial

Uniform

Bibliography

Author Index

Index

About the Authors

Rakesh Verma is a professor of computer science at the University of Houston where he is leading a research group that applies reasoning and data science to cybersecurity challenges. He teaches a course on security analytics that includes some of the material here. Since 2015, he has been co-organizing and editing the proceedings of the ACM International Workshop on Security and Privacy Analytics. He is an editor of Frontiers of Big Data in the Cybersecurity Area, an ACM Distinguished Speaker (2011-2018), and the winner of two Best Paper Awards. He received the Lifetime Mentoring Award from the University of Houston and he is a Fulbright Senior Specialist in Computer Science.

David Marchette is a principal scientist at the Naval Surface Warfare Center, Dahlgren Division where he is responsible for leading basic and applied research projects in computational statistics, graph theory, network analysis, pattern recognition, computer intrusion detection, and text analysis. He is a fellow of the American Statistical Association (ASA) and the American Association for the Advancement of Science (AAAS) and an elected member of the International Statistical Institute (ISI).

About the Series

Chapman & Hall/CRC Data Science Series

Reflecting the interdisciplinary nature of the field, this new data science book series brings together researchers, practitioners, and instructors from statistics, computer science, machine learning, and analytics. The series will publish cutting-edge research, industry applications, and textbooks in data science.

Features:
* Presents the latest research and applications in the field, including new statistical and computational techniques
* Covers a broad range of interdisciplinary topics
* Provides guidance on the use of software for data science, including R, Python, and Julia
* Includes both introductory and advanced material for students and professionals
* Presents concepts while assuming minimal theoretical background

The scope of the series is broad, including titles in machine learning, pattern recognition, artificial intelligence, predictive analytics, business analytics, visualization, programming, software, learning analytics, data collection and wrangling, interactive graphics, reproducible research, and more. The inclusion of examples, applications, and code implementation is essential.

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
BUS061000
BUSINESS & ECONOMICS / Statistics
COM037000
COMPUTERS / Machine Theory
COM043050
COMPUTERS / Networking / Security
COM053000
COMPUTERS / Security / General
COM083000
COMPUTERS / Security / Cryptography
MAT000000
MATHEMATICS / General
MAT029000
MATHEMATICS / Probability & Statistics / General