Document Processing Using Machine Learning  book cover
1st Edition

Document Processing Using Machine Learning

ISBN 9780367218478
Published December 2, 2019 by Chapman & Hall
182 Pages 97 B/W Illustrations

FREE Standard Shipping
SAVE $44.99
was $149.95
USD $104.96

Prices & shipping based on shipping country


Book Description

Document Processing Using Machine Learning aims at presenting a handful of resources for students and researchers working in the document image analysis (DIA) domain using machine learning since it covers multiple document processing problems. Starting with an explanation of how Artificial Intelligence (AI) plays an important role in this domain, the book further discusses how different machine learning algorithms can be applied for classification/recognition and clustering problems regardless the type of input data: images or text.

In brief, the book offers comprehensive coverage of the most essential topics, including:

· The role of AI for document image analysis

· Optical character recognition

· Machine learning algorithms for document analysis

· Extreme learning machines and their applications

· Mathematical foundation for Web text document analysis

· Social media data analysis

· Modalities for document dataset generation

This book serves both undergraduate and graduate scholars in Computer Science/Information Technology/Electrical and Computer Engineering. Further, it is a great fit for early career research scientists and industrialists in the domain.

Table of Contents




1. Artificial Intelligence for Document Image Analysis

Himadri Mukherjee, Payel Rakshit, Ankita Dhar, Sk Md Obaidullah, KC Santosh, Santanu Phadikar and Kaushik Roy

2. An Approach toward Character Recognition of Bangla Handwritten Isolated Characters

Payel Rakshit, Chayan Halder and Kaushik Roy

3. Artistic Multi-Character Script Identification

Mridul Ghosh, Himadri Mukherjee, Sk Md Obaidullah, KC Santosh, Nibaran Das and Kaushik Roy

4. A Study on the Extreme Learning Machine and Its Applications

Himadri Mukherjee, Sahana Das, Subhashmita Ghosh, Sk Md Obaidullah, KC Santosh, Nibaran Das and Kaushik Roy

5. A Graph-Based Text Classification Model for Web Text Documents

Ankita Dhar, Niladri Sekhar Dash and Kaushik Roy

6. A Study of Distance Metrics in Document Classification

Ankita Dhar, Niladri Sekhar Dash and Kaushik Roy

7. A Study of Proximity of Domains for Text Categorization

Ankita Dhar, Niladri Sekhar Dash and Kaushik Roy

8. Supervised Learning for Aggression Identification and Author Profiling over Twitter Dataset

Kashyap Raiyani and Roy Bayot

9. The Effect of Using Features Computed from Generated Offline Images for Online Bangla Handwritten Character Recognition

Shibaprasad Sen, Ankan Bhattacharyya and Kaushik Roy

10. Handwritten Character Recognition for Palm-Leaf Manuscripts

Papangkorn Inkeaw, Jeerayut Chaijaruwanich and Jakramate Bootkrajang


View More



Sk Md Obaidullah has completed Ph.D(Engg.) from Jadavpur University, M.Tech in Computer Sc. & Application from University of Calcutta and B.E in Computer Sc. & Engineering from Vidyasagar University in the year 2017, 2009, 2004 respectively. He was Erasmus Post-Doctoral fellow funded by European Commission at University of Evora, Portugal from Nov. 2017 to Sept. 2018. He has more than eleven years of professional experience including two years in industry and nine years in academia out of which five years of research.  Presently he is working as an Assistant Professor in the Dept. of Computer Science & Engineering, Aliah University, Kolkata. He has published more than 60 research articles in  renowned journals and reputed national/international conferences. He is an active researcher in the field of Document Image Processing, Medical Image Analysis, Pattern Recognition, Machine Learning.

K.C. Santosh (Senior Member, IEEE) is an Assistant Professor and Graduate Program Director for the department of computer science at University of South Dakota (USD). Also, Dr. Santosh serves the School of Computing and IT, Taylor's University as a Visiting Associate Professor. Before joining USD, Dr. Santosh worked as a research fellow at the U.S. National Library of Medicine (NLM), National Institutes of Health (NIH). He worked as a postdoctoral research scientist at the LORIA research centre, Universite de Lorraine in direct collaboration with industrial partner ITESOFT, France. He also worked as a research scientist at the INRIA Nancy Grand Est research centre, France, where, he completed his PhD diploma in Computer Science. Before that, he worked as a graduate research scholar at SIIT, Thammasat University, Thailand. He published more than 120 peer-reviewed research articles; 2 authored books (Springer); and edited 10 books, journal issues, and conference proceedings. Dr. Santosh serves as an associate editor for the International Journal of Machine Learning & Cybernetics. Dr. Santosh demonstrated expertise in artificial intelligence, machine learning, pattern recognition, computer vision, image processing, data mining, and big data with various application domains, such as healthcare and medical imaging, document information content exploitation, biometrics, forensics, speech/audio analysis, satellite imaging, robotics, and Internet of Things.

Teresa Gonçalves is an assistant professor in the Department of Informatics at the University of Évora, Portugal. She has a PhD degree in Informatics from the University of Évora since 2008, having the 5 years degree and master in Informatics Engineering, both from Faculty of Sciences and Tecnology, New University of Lisbon in 1992 and 1996, respectively. She has published more than 60 research papers in reputed journal and conferences and worked as an organizing and programme committee chair of various international conferences. She worked as PI for different research and mobility projects funded by Portugal government and European commission. Her research interests include machine learning and data mining, namely with textual data and images, recommendation systems, and evolutionary algorithms.She is responsible for several courses of undergraduate, masters and doctorate level in Computer Science. Having successfully supervised two doctorate and six master students, currently she supervises six PhD and six master students mainly on applying and adapting machine learning approaches to text or image related problems.

Dr. Nibaran Das is an Associate Professor of the Department of Computer Science and Engineering at Jadavpur University. Before joining the Jadavpur University, from 2005 to 2006, Dr. Das worked as a lecturer in Techno India, Saltlake. He worked as a postdoctoral research scientist at the University of Evora for six months in between 2012-14. He also worked as a research intern at the Competence Center Multimedia Analysis and Data Mining (MADM) at the DFKI, University of Kaiserslautern, Germany in the year 2007. Dr. Das serves as an associate editor for the journal Sadhana: Academy Proceedings in Engineering Sciences. Dr. Das has demonstrated expertise in Deep Learning, pattern recognition; image processing and machine learning with various applications in handwriting recognition, especially character recognition, medical image analysis. Dr. Das published more than 125 research articles, including the books of Handbook of Research on Recent Developments in Intelligent Communication Application, IGI global and co-authoring several conference proceedings. He guided more than 30 master degree students in his department. Dr. Das    Dr. Das served as a chairperson of the young professional affinity group, IEEE Kolkata section from  2014-2015. He is the founder editor of Bangla monthly computer magazine “Computer Jagat”.  He is a regular reviewer for high-quality journals (IEEE, Springer, and Elsevier) and high-quality conferences and workshops (sponsored by IEEE and Springer) in the domain. 

Prof. Kaushik Roy has completed B.E in Computer Science & Engineering from NIT Silchar, M.E and PhD(Engg.) in Computer Science & Engg. from Jadavpur University in the year 1998, 2002 and 2008 respectively. He has worked as a project linked personnel in ISI-Kolkata and as a Scientific Officer in CDAC-Kolkata. He has also worked as an Assistant Professor in Maulana Abul Kalam Azad University of Technology, India formerly known as West Bengal University of Technology. He is currently working as a Professor and Head of the Department of Computer Science, West Bengal State University, Barasat, India. In 2004 he has received Young IT Professional award from Computer Society of India. He has published more than 150 research papers/book chapters in reputed conferences and journals. His research interest includes pattern recognition, document image processing, medical image analysis, online handwriting recognition, speech recognition and audio signal processing. He is Life Member of IUPRAI (an unit of IAPR) and Computer Society of India.