Large Scale and Big Data: Processing and Management provides readers with a central source of reference on the data management techniques currently available for large-scale data processing. Presenting chapters written by leading researchers, academics, and practitioners, it addresses the fundamental challenges associated with Big Data processing tools and techniques across a range of computing environments.
The book begins by discussing the basic concepts and tools of large-scale Big Data processing and cloud computing. It also provides an overview of different programming models and cloud-based deployment models. The book’s second section examines the usage of advanced Big Data processing techniques in different domains, including semantic web, graph processing, and stream processing. The third section discusses advanced topics of Big Data processing such as consistency management, privacy, and security.
Supplying a comprehensive summary from both the research and applied perspectives, the book covers recent research discoveries and applications, making it an ideal reference for a wide range of audiences, including researchers and academics working on databases, data mining, and web scale data processing.
After reading this book, you will gain a fundamental understanding of how to use Big Data-processing tools and techniques effectively across application domains. Coverage includes cloud data management architectures, big data analytics visualization, data management, analytics for vast amounts of unstructured data, clustering, classification, link analysis of big data, scalable data mining, and machine learning techniques.
Table of Contents
Distributed Programming for the Cloud: Models, Challenges, and Analytics Engines; Mohammad Hammoud and Majd F. Sakr
MapReduce Family of Large-Scale Data-Processing Systems; Sherif Sakr, Anna Liu, and Ayman G. Fayoumi
iMapReduce: Extending MapReduce for Iterative Processing; Yanfeng Zhang, Qixin Gao, Lixin Gao, and Cuirong Wang
Incremental MapReduce Computations; Pramod Bhatotia, Alexander Wieder, Umut A. Acar, and Rodrigo Rodrigues
Large-Scale RDF Processing with MapReduce; Alexander Schätzle, Martin Przyjaciel-Zablocki, Thomas Hornung, and Georg Lausen
Algebraic Optimization of RDF Graph Pattern Queries on MapReduce; Kemafor Anyanwu, Padmashree Ravindra, and HyeongSik Kim
Network Performance Aware Graph Partitioning for Large Graph Processing Systems in the Cloud; Rishan Chen, Xuetian Weng, Bingsheng He, Byron Choi, and Mao Yang
PEGASUS: A System for Large-Scale Graph Processing; Charalampos E. Tsourakakis
An Overview of the NoSQL World; Liang Zhao, Sherif Sakr, and Anna Liu
Consistency Management in Cloud Storage Systems; Houssem-Eddine Chihoub, Shadi Ibrahim, Gabriel Antoniu, and Maria S. Perez
CloudDB AutoAdmin: A Consumer-Centric Framework for SLA Management of Virtualized Database Servers; Sherif Sakr, Liang Zhao, and Anna Liu
An Overview of Large-Scale Stream Processing Engines; Radwa Elshawi and Sherif Sakr
Advanced Algorithms for Efficient Approximate Duplicate Detection in Data Streams Using Bloom Filters; Sourav Dutta and Ankur Narang
Large-Scale Network Traffic Analysis for Estimating the Size of IP Addresses and Detecting Traffic Anomalies; Ahmed Metwally, Fabio Soldo, Matt Paduano, and Meenal Chhabra
Recommending Environmental Big Data Using Semantically Guided Machine Learning; Ritaban Dutta, Ahsan Morshed, and Jagannath Aryal
Virtualizing Resources for the Cloud; Mohammad Hammoud and Majd F. Sakr
Toward Optimal Resource Provisioning for Economical and Green MapReduce Computing in the Cloud; Keke Chen, Shumin Guo, James Powers, and Fengguang Tian
Performance Analysis for Large IaaS Clouds; Rahul Ghosh, Francesco Longo, and Kishor S. Trivedi
Security in Big Data and Cloud Computing: Challenges, Solutions, and Open Problems; Ragib Hasan
Dr. Sherif Sakr is a Senior Researcher at National ICT Australia (NICTA), Sydney, Australia. He is also a Conjoint Senior Lecturer at the University of New South Wales (UNSW). He received his PhD degree in Computer and Information Science from Konstanz University, Germany in 2007. He received his BSc and MSc degrees in Computer Science from Cairo University, Egypt, in 2000 and 2003 respectively. In 2011, Sherif held a Visiting Researcher position at the eXtreme Computing Group, Microsoft Research, USA. In 2012, he held a Research MTS position in Alcatel-Lucent Bell Labs. Dr. Sakr has published more than 60 refereed research publications in international journals and conferences such as the IEEE TSC, ACM CSUR, JCSS, IEEE COMST, VLDB, SIGMOD, ICDE, WWW, and CIKM. He has served in the organizing and program committees of numerous conferences and workshops.
Dr. Mohamed Medhat Gaber is a reader in the School of Computing Science and Digital Media of Robert Gordon University, UK. Mohamed received his PhD from Monash University, Australia, in 2006. He then held appointments with the University of Sydney, CSIRO, Monash University, and the University of Portsmouth. Dr. Gaber has published over 100 papers, coauthored one monograph-style book, and edited/coedited four books on data mining, and knowledge discovery. He has served in the program committees of major conferences related to data mining, including ICDM, PAKDD, ECML/PKDD, and ICML. He has also been a member of the organizing committees of numerous conferences and workshops.