Distributed System Design

Description

Future requirements for computing speed, system reliability, and cost-effectiveness entail the development of alternative computers to replace the traditional von Neumann organization. As computing networks come into being, one of the latest dreams is now possible - distributed computing. Distributed computing brings transparent access to as much computer power and data as the user needs for... Read more

Future requirements for computing speed, system reliability, and cost-effectiveness entail the development of alternative computers to replace the traditional von Neumann organization. As computing networks come into being, one of the latest dreams is now possible - distributed computing.
Distributed computing brings transparent access to as much computer power and data as the user needs for accomplishing any given task - simultaneously achieving high performance and reliability.
The subject of distributed computing is diverse, and many researchers are investigating various issues concerning the structure of hardware and the design of distributed software. Distributed System Design defines a distributed system as one that looks to its users like an ordinary system, but runs on a set of autonomous processing elements (PEs) where each PE has a separate physical memory space and the message transmission delay is not negligible. With close cooperation among these PEs, the system supports an arbitrary number of processes and dynamic extensions.
Distributed System Design outlines the main motivations for building a distributed system, including:

inherently distributed applications

performance/cost

resource sharing

flexibility and extendibility

availability and fault tolerance

scalability
Presenting basic concepts, problems, and possible solutions, this reference serves graduate students in distributed system design as well as computer professionals analyzing and designing distributed/open/parallel systems.
Chapters discuss:

the scope of distributed computing systems

general distributed programming languages and a CSP-like distributed control description language (DCDL)

expressing parallelism, interprocess communication and synchronization, and fault-tolerant design

two approaches describing a distributed system: the time-space view and the interleaving view

mutual exclusion and related issues, including election, bidding, and self-stabilization

prevention and detection of deadlock

reliability, safety, and security as well as various methods of handling node, communication, Byzantine, and software faults

efficient interprocessor communication mechanisms as well as these mechanisms without specific constraints, such as adaptiveness, deadlock-freedom, and fault-tolerance

virtual channels and virtual networks

load distribution problems

synchronization of access to shared data while supporting a high degree of concurrency

Read less

Introduction
Motivation
Basic Computer Organizations
Definition of a Distributed System
Our Model
Interconnection Networks
Applications and Standards
Scope
Source of References
Distributed Programming Languages
Requirement for Distributed Programming Support
An Overview of Parallel/Distributed Programming Language
Expressing Parallelism
Process Communication and Synchronization
Remote Procedure Calls
Robustness
Formal Approaches to Distributed Systems Design
Introduction to Models
Causally Related Events
Global States
Logical Clocks
Applications
Classification of Distributed Control Algorithms
The Complexity of Distributed Algorithms
Mutual Exclusion and Election Algorithms
Mutual Exclusion
Non-Token-Based Solutions
Token-Based Solutions
Election
Bidding
Self-Stabilization
Prevention, Avoidance, and Detection of Deadlock
The Deadlock Problem
Deadlock Prevention
A Deadlock Prevention Example: Distributed Database Systems
Deadlock Avoidance
A Deadlock Prevention Example: Multi-Robot Flexible Assembly Cells
Deadlock Detection and Recovery
Deadlock Detection and Recovery Examples
Distributed Routing Algorithms
Introduction
General-Purpose Shortest Path Routing
Unicasting in Special-Purpose Networks
Broadcasting in Special-Purpose Networks
Multicasting in Special-Purpose Networks
Adaptive Deadlock-Free, and Fault-Tolerant Routing
Virtual Channels and Virtual Networks
Fully Adaptive and Deadlock-Free Routing
Partially Adaptive and Deadlock-Free Routing
Fault-Tolerant Unicasting: General Approaches
Fault-Tolerant Unicasting in 2-D Meshes and Tori
Fault-Tolerant Unicasting in Hypercubes
Fault-Tolerant Broadcasting
Fault-Tolerant Multicasting
Reliability in Distributed Systems
Basic Models
Building Blocks of Fault-Tolerant Design
Handling of Node Faults
Issues in Backward Recovery
Handling of Byzantine Faults
Handling of Communication Faults
Handling of Storage Faults
Static Load Distribution
Classification of Load Distribution
Static Load Distribution
An Overview of Different Scheduling Models
Task Scheduling Based on Task Precedent Graphs
Case Study: Two Optimal Scheduling Algorithms
Task Scheduling Based on Task Interaction Graphs
Case Study: Domain Partition
Scheduling Using Other Models and Objectives
Future Directions
Dynamic Load Distribution
Dynamic Load Distribution
Load Balancing Design Decisions
Migration Policies: Sender-Initiated and Receiver-Initiated
Parameters Used for Load Balancing
Other Relevant Issues
Sample Load Balancing Algorithms
Case Study: Load Balancing on Hypercube Multicomputers
Future Directions
Distributed Data Management
Basic Concepts
Serializability Theory
Concurrency Control
Replica and Consistency Management
Distributed Reliability Protocols
Distributed System Applications
Distributed Operating Systems
Distributed File Systems
Distributed Shared Memory
Distributed Database Systems
Heterogeneous Processing
Future Directions of Distributed Systems

Author(s)

Biography

Jie Wu

Distributed System Design

Description

Table of Contents

Author(s)

Biography

SOCIAL NETWORKS

Secure Shopping Payment Options