1st Edition

Fundamentals of Database Indexing and Searching

By Arnab Bhattacharya Copyright 2015
    280 Pages 59 B/W Illustrations
    by Chapman & Hall

    284 Pages 59 B/W Illustrations
    by Chapman & Hall

    Fundamentals of Database Indexing and Searching presents well-known database searching and indexing techniques. It focuses on similarity search queries, showing how to use distance functions to measure the notion of dissimilarity.

    After defining database queries and similarity search queries, the book organizes the most common and representative index structures according to their characteristics. The author first describes low-dimensional index structures, memory-based index structures, and hierarchical disk-based index structures. He then outlines useful distance measures and index structures that use the distance information to efficiently solve similarity search queries. Focusing on the difficult dimensionality phenomenon, he also presents several indexing methods that specifically deal with high-dimensional spaces. In addition, the book covers data reduction techniques, including embedding, various data transforms, and histograms.

    Through numerous real-world examples, this book explores how to effectively index and search for information in large collections of data. Requiring only a basic computer science background, it is accessible to practitioners and advanced undergraduate students.

    Basics
    Database Queries
    Basic Setting
    Exact Search
    Similarity Search
    Join
    Errors

    Low-Dimensional Index Structures
    Hashing
    Static Hashing
    Dynamic Hashing
    Locality Sensitive Hashing (LSH)
    Multi-Dimensional Hashing
    Space-Filling Curves

    Memory-Based Index Structures
    Index Structures
    Binary Search Tree (BST)
    Quadtree
    K-D-Tree
    Range Tree
    Voronoi Diagram
    Tries
    Suffix Tree
    Bitmap Index

    Disk-Based Index Structures
    Hierarchical Structures
    B-Tree and B+-Tree
    K-D-B-Tree
    General Framework
    R-Tree
    R*-Tree
    R+-Tree
    Hilbert R-Tree
    SS-Tree
    SR-Tree
    P-Tree
    Bulk-Loading

    Distances
    Distance Functions
    Metric Spaces
    Lp Norm
    Quadratic Form Distance
    Cosine Similarity
    Statistical Distance Measures
    Distances between Sets of Objects
    Earth Mover’s Distance
    Edit Distance

    Distance-Based Structures
    Triangular Inequality
    VP-Tree
    GH-Tree
    GNAT
    M-Tree
    SA-Tree
    AESA
    Linear AESA (LAESA)
    AESA for Vector Spaces

    High-Dimensional Spaces
    Curse of Dimensionality

    Analysis of Search for High-Dimensional Data
    Expected Nearest Neighbor Distance
    Expected Number of Page Accesses
    Curse of Dimensionality

    High-Dimensionality Structures
    X-Tree
    Pyramid Technique
    IMinMax
    VA-File
    A-Tree
    IQ-Tree

    Data Reduction Techniques
    Dimensionality Reduction Techniques
    Properties Useful for Similarity Search
    Quality Measures
    Embedding
    Singular Value Decomposition (SVD)
    Principal Component Analysis (PCA)
    Multi-Dimensional Scaling (MDS)
    IsoMap
    FastMap
    Embedding Methods
    Bounds on Distortion

    Data Representation Techniques
    Discrete Fourier Transform (DFT)
    Discrete Cosine Transform (DCT)
    Discrete Wavelet Transform (DWT)
    V-Optimal Histogram

    Appendices
    A Memory and Disk Accesses
    Memory Access
    Disks
    Flash

    B Distances of Bounding Boxes
    Distance of a Point from a Rectangle
    Distance of a Point from a Sphere
    Distance of a Sphere from a Rectangle
    Distance of a Sphere from a Sphere
    Distance of a Rectangle from a Rectangle

    C Vectors and Matrices
    Vector Spaces
    Matrices
    Properties of Matrices
    Dimensionality

    D Probability and Statistics
    Random Variable
    Probability Distribution
    Statistical Parameters

    Biography

    Arnab Bhattacharya

    "The author analyzes a very large list of database indexing and searching techniques … the book can be consulted to get inspiration from state-of-the-art techniques for database indexing and searching."
    Computing Reviews, August 2015