The Beauty of Mathematics in Computer Science explains the mathematical fundamentals of information technology products and services we use every day, from Google Web Search to GPS Navigation, and from speech recognition to CDMA mobile services. The book was published in Chinese in 2011 and has sold more than 600,000 copies. Readers were surprised to find that many daily-used IT technologies were so tightly tied to mathematical principles. For example, the automatic classification of news articles uses the cosine law taught in high school.
The book covers many topics related to computer applications and applied mathematics including:
Natural language processingSpeech recognition and machine translation
Statistical language modeling
Quantitive measurement of information
Graph theory and web crawler
Pagerank for web search
Matrix operation and document classification
Mathematical background of big data
Neural networks and Google’s deep learning
Jun Wu was a staff research scientist in Google who invented Google’s Chinese, Japanese, and Korean Web Search Algorithms and was responsible for many Google machine learning projects. He wrote official blogs introducing Google technologies behind its products in very simple languages for Chinese Internet users from 2006-2010. The blogs had more than 2 million followers. Wu received PhD in computer science from Johns Hopkins University and has been working on speech recognition and natural language processing for more than 20 years. He was one of the earliest engineers of Google, managed many products of the company, and was awarded 19 US patents during his 10-year tenure there. Wu became a full-time VC investor and co-founded Amino Capital in Palo Alto in 2014 and is the author of eight books.
1. Words and languages, numbers and information
Information
Words and numbers
The mathematics behind language
2. Natural language processing|From rules to statistics
Machine intelligence
From rules to statistics
3. Statistical language model
Describing language through mathematics
Extended reading: Implementation caveats
Higher order language models
Training methods, zero-probability problems, and smoothing
Corpus selection
4. Word segmentation
Evolution of Chinese word segmentation
Extended reading: evaluating results
Consistency
Granularity
5. Hidden Markov model
Communication models
Hidden Markov model
Extended reading: HMM training
6. Quantifying information
Information entropy
Role of information
Mutual information
Extended reading: Relative entropy
7. Jelinek and modern language processing
Early life
From Watergate to Monica Lewinsky
An old man's miracle
8. Boolean algebra and search engines
Boolean algebra
Indexing
9. Graph theory and web crawlers
Graph theory
Web crawlers
Extended reading: two topics in graph theory
Euler's proof of the Königsberg bridges
The engineering of a web crawler
10.PageRank: Google's democratic ranking technology
The PageRank algorithm
Extended reading: PageRank calculations
11.Relevance in web search
TF-IDF
Extended reading: TF-IDF and information theory
12.Finite state machines and dynamic programming: Navigation in Google Maps
Address analysis and Finite state machines
Global navigation and dynamic programming
Finite state transducer
13.Google's AK- designer, Dr Amit Singhal
14.Cosines and news classification
Feature vectors for news
Vector distance
Extended reading: The art of computing cosines
Cosines in big data
Positional weighting
15.Solving classification problems in text processing with matrices
Matrices of words and texts
Extended reading: Singular value decomposition method and applications
16.Information Fingerprinting and its application
Information Fingerprint
Applications of information Fingerprint
Determining identical sets
Detecting similar sets
YouTube's anti-piracy
Extended reading: Information Fingerprint's repeatability and SimHash
Probability of repeated information Fingerprint
SimHash
17.Thoughts inspired by the Chinese TV series Plot: The mathematical principles of cryptography
The spontaneous era of cryptography
Cryptography in the information age
18.Not all that glitters is gold: Search engine's anti-SPAM problem and search result authoritativeness question
Search engine anti-SPAM
Authoritativeness of search results
Summary
19.Discussion on the importance of mathematical models
20.Don't put all your eggs in one basket: The principle of maximum entropy
Principle of maximum entropy and maximum entropy model
Extended reading: Maximum entropy model training
21.Mathematical principles of pinyin input method
Input method and coding
How many keystrokes to type a Chinese character?
Discussion on Shannon's First Theorem
The algorithm of phonetic transcription
Extended reading: Personalized language models
22.Bloom Filters
The principle of Bloom Filters
Extended reading: The false alarm problem of Bloom Filters
23.Bayesian network: Extension of Markov Chain
Bayesian network
Bayesian network's application in word classification
Extended reading: Training a Bayesian network
24.Conditional random Fields, syntactic parsing, and more
Syntactic parsing|the evolution of computer algorithms
Conditional random fields
Conditional random fields' applications in other fields
25.Andrew Viterbi and the Viterbi Algorithm
The Viterbi algorithm
CDMA technology: The foundation of G mobile communication
26.God's algorithm: The expectation maximization algorithm
Self-converged document classification
Extended reading: Convergence of expectation-maximization algorithms
27.Logistic regression and web search advertisement
The evaluation of web search advertisement
The logistic model
28.Google Brain and artificial neural networks
Artificial neural network
Training an artificial neural network
The relationship between artificial neural networks and
Bayesian networks
Extended reading: \Google Brain"
29.The power of big data
The importance of data
Statistics and information technology
Why we need big data
Biography
Jun Wu was a staff research scientist in Google who invented Google’s Chinese, Japanese, and Korean Web Search Algorithms and was responsible for many Google machine learning projects. He wrote official blogs introducing Google technologies behind its products in very simple languages for Chinese internet users from 2006-2010. The blogs had more than two million followers. He received Ph.D. in computer science from the Johns Hopkins University and had been working on speech recognition and natural language processing for more than 20 years. He was one of the earliest engineers of Google, managed many products of the company, and was awarded more than ten US patents during his ten-year tenure there. He became a full-time VC investor and co-founded Amino Capital in Palo Alto in 2014 and is the author of eight books.
"This volume originates from a series of blog articles by the author, who works as senior staff research scientist for Google China. The blog articles have been rewritten to make them more accessible to uninitiated readers. As a result, the book contains 29 chapters which may be read independently. The aim is to provide evidence for the beauty of mathematics and the wealth of its applications to the layman . . . The volume may be quite valuable for readers who want to get some insight into how enterprises like Google achieve their performance, and how much mathematics is at work in the background of many commonplace services . . . "
~Dieter Riebesehl (Lüneburg), zbMath