The authors provide an understanding of big data and MapReduce by clearly presenting the basic terminologies and concepts. They have employed over 100 illustrations and many worked-out examples to convey the concepts and methods used in big data, the inner workings of MapReduce, and single node/multi-node installation on physical/virtual machines. This book covers almost all the necessary information on Hadoop MapReduce for most online certification exams. Upon completing this book, readers will find it easy to understand other big data processing tools such as Spark, Storm, etc.
Ultimately, readers will be able to:
• understand what big data is and the factors that are involved
• understand the inner workings of MapReduce, which is essential for certification exams
• learn the features and weaknesses of MapReduce
• set up Hadoop clusters with 100s of physical/virtual machines
• create a virtual machine in AWS
• write MapReduce with Eclipse in a simple way
• understand other big data processing tools and their applications
Table of Contents
Preface. 1. Introduction to Big Data. 2. Hadoop Framework. 3. Hadoop 1.2.1 Installation. 4. Hadoop Ecosystem. 5. Hadoop 2.7.0. 6. Hadoop. 2.7.0 Installation. 7. Data Science. 8. MapReduce Exercise. 9. Case Study: Application Development for NYSE Dataset.
Rathinaraja Jeyaraj is a Research Scholar in the Department of Information Technology at the National Institute of Technology Karnataka, India. He recently worked as a visiting researcher at Connected Computing and Media Processing Lab, Kyungpook National University, South Korea. His research interests include big data processing tools, cloud computing, IoT, and machine learning.
Ganeshkumar Pugalendhi, PhD, is an Assistant Professor in the Department of Information Technology, Anna University Regional Campus, Coimbatore, India. He is the resource person for delivering technical talks and seminars sponsored by various organizations, including the University Grants Commission of India, All India Council for Technical Education, Technical Education Quality Improvement Programme of Government of India, Indian Council of Medical Research, and many others. He has written two research-oriented textbooks: Data Classification Using Soft Computing and Soft Computing for Microarray Data Analysis.
Anand Paul, PhD, is an Associate Professor at the School of Computer Science and Engineering at Kyungpook National University, South Korea. He was a delegate representing South Korea for the M2M focus group in 2010–2012 and is serving as associate editor for the journals IEEE Access, IET Wireless Sensor Systems, ACM Applied Computing Reviews, Cyber Physical Systems, Human Behaviour and Emerging Technology, and the Journal of Platform Technology. He is the track chair for smart human computer interaction with the Association for Computing Machinery Symposium on Applied Computing 2014–2019, and general chair for the 8th International Conference on Orange Technology (ICOT 2020).