Fundamentals of Database Indexing and Searching presents well-known database searching and indexing techniques. It focuses on similarity search queries, showing how to use distance functions to measure the notion of dissimilarity.
After defining database queries and similarity search queries, the book organizes the most common and representative index structures according to their characteristics. The author first describes low-dimensional index structures, memory-based index structures, and hierarchical disk-based index structures. He then outlines useful distance measures and index structures that use the distance information to efficiently solve similarity search queries. Focusing on the difficult dimensionality phenomenon, he also presents several indexing methods that specifically deal with high-dimensional spaces. In addition, the book covers data reduction techniques, including embedding, various data transforms, and histograms.
Through numerous real-world examples, this book explores how to effectively index and search for information in large collections of data. Requiring only a basic computer science background, it is accessible to practitioners and advanced undergraduate students.
Basics
Database Queries
Basic Setting
Exact Search
Similarity Search
Join
Errors
Low-Dimensional Index Structures
Hashing
Static Hashing
Dynamic Hashing
Locality Sensitive Hashing (LSH)
Multi-Dimensional Hashing
Space-Filling Curves
Memory-Based Index Structures
Index Structures
Binary Search Tree (BST)
Quadtree
K-D-Tree
Range Tree
Voronoi Diagram
Tries
Suffix Tree
Bitmap Index
Disk-Based Index Structures
Hierarchical Structures
B-Tree and B+-Tree
K-D-B-Tree
General Framework
R-Tree
R*-Tree
R+-Tree
Hilbert R-Tree
SS-Tree
SR-Tree
P-Tree
Bulk-Loading
Distances
Distance Functions
Metric Spaces
Lp Norm
Quadratic Form Distance
Cosine Similarity
Statistical Distance Measures
Distances between Sets of Objects
Earth Mover’s Distance
Edit Distance
Distance-Based Structures
Triangular Inequality
VP-Tree
GH-Tree
GNAT
M-Tree
SA-Tree
AESA
Linear AESA (LAESA)
AESA for Vector Spaces
High-Dimensional Spaces
Curse of Dimensionality
Analysis of Search for High-Dimensional Data
Expected Nearest Neighbor Distance
Expected Number of Page Accesses
Curse of Dimensionality
High-Dimensionality Structures
X-Tree
Pyramid Technique
IMinMax
VA-File
A-Tree
IQ-Tree
Data Reduction Techniques
Dimensionality Reduction Techniques
Properties Useful for Similarity Search
Quality Measures
Embedding
Singular Value Decomposition (SVD)
Principal Component Analysis (PCA)
Multi-Dimensional Scaling (MDS)
IsoMap
FastMap
Embedding Methods
Bounds on Distortion
Data Representation Techniques
Discrete Fourier Transform (DFT)
Discrete Cosine Transform (DCT)
Discrete Wavelet Transform (DWT)
V-Optimal Histogram
Appendices
A Memory and Disk Accesses
Memory Access
Disks
Flash
B Distances of Bounding Boxes
Distance of a Point from a Rectangle
Distance of a Point from a Sphere
Distance of a Sphere from a Rectangle
Distance of a Sphere from a Sphere
Distance of a Rectangle from a Rectangle
C Vectors and Matrices
Vector Spaces
Matrices
Properties of Matrices
Dimensionality
D Probability and Statistics
Random Variable
Probability Distribution
Statistical Parameters
Biography
Arnab Bhattacharya
"The author analyzes a very large list of database indexing and searching techniques … the book can be consulted to get inspiration from state-of-the-art techniques for database indexing and searching."
—Computing Reviews, August 2015