Speech and audio processing has undergone a revolution in preceding decades that has accelerated in the last few years generating game-changing technologies such as truly successful speech recognition systems; a goal that had remained out of reach until very recently. This book gives the reader a comprehensive overview of such contemporary speech and audio processing techniques with an emphasis on practical implementations and illustrations using MATLAB code. Core concepts are firstly covered giving an introduction to the physics of audio and vibration together with their representations using complex numbers, Z transforms and frequency analysis transforms such as the FFT.
Later chapters give a description of the human auditory system and the fundamentals of psychoacoustics. Insights, results, and analyses given in these chapters are subsequently used as the basis of understanding of the middle section of the book covering: wideband audio compression (MP3 audio etc.), speech recognition and speech coding.
The final chapter covers musical synthesis and applications describing methods such as (and giving MATLAB examples of) AM, FM and ring modulation techniques. This chapter gives a final example of the use of time-frequency modification to implement a so-called phase vocoder for time stretching (in MATLAB).
- A comprehensive overview of contemporary speech and audio processing techniques from perceptual and physical acoustic models to a thorough background in relevant digital signal processing techniques together with an exploration of speech and audio applications.
- A carefully paced progression of complexity of the described methods; building, in many cases, from first principles.
- Speech and wideband audio coding together with a description of associated standardised codecs (e.g. MP3, AAC and GSM).
- Speech recognition: Feature extraction (e.g. MFCC features), Hidden Markov Models (HMMs) and deep learning techniques such as Long Short-Time Memory (LSTM) methods.
- Book and computer-based problems at the end of each chapter.
- Contains numerous real-world examples backed up by many MATLAB functions and code.
Table of Contents
Preface. Introduction. Acoustics. Perception. Psychoacoustic Audio Coding. Features for Automatic Speech Recognition. Automatic Speech Recognition (ASR). Musical Applications. 3D Audio.
Dr Paul Hill received his B.Sc degree from the Open University (1996), an M.Sc degree from the University of Bristol, Bristol, U.K. (1998) and a Ph.D. also from the University of Bristol (2002). His research interests include image and video analysis, compression, fusion and multiscale transforms together with audio applications such as compression, retrieval and signal separation. He is currently a senior research fellow at the Department of Electrical and Electronic Engineering at the University of Bristol. He has taught the speech and audio processing course that the university for over 8 years and has supervised numerous audio MSc projects over that time. He has published over 30 academic papers and is also an amateur musician and composer often reflecting his passion for electronic music in his lectures and presentations.
"Audio and Speech Processing with MATLAB is a very welcome and precisely realized introduction to the field of audio and speech processing. The initial chapters give numerous, novel and well-organized insights into the background of the subject. The combination of engineering, mathematics and perceptual analysis of the audio processing will to give the reader a unique understanding of the subject and its applications. Contemporary approaches such as speech recognition using deep learning are also a very timely addition. This book would form an important reference text for undergraduate and masters courses in the field."
— Dave Bull, The University of Bristol, UK
"This constitutes an excellent introduction to the subject. The first chapters give a structured and comprehensive coverage of the core subjects necessary to understand the processing and analysis of audio: signal processing, acoustics, psychoacoustics, frequency analysis, and machine learning, all illustrated with MATLAB code. Throughout the book, important audio and speech applications are explored in detail, such as musical sound processing, speech coding, and wideband audio coding. I would thoroughly recommend this book as an excellent introduction to audio and speech processing for both undergraduate and postgraduate students."
— Alin Achim, University of Bristol, UK