1st Edition

Introduction to Data Technologies

By Paul Murrell Copyright 2009
    448 Pages 179 B/W Illustrations
    by Chapman & Hall

    444 Pages 179 B/W Illustrations
    by Chapman & Hall

    Providing key information on how to work with research data, Introduction to Data Technologies presents ideas and techniques for performing critical, behind-the-scenes tasks that take up so much time and effort yet typically receive little attention in formal education. With a focus on computational tools, the book shows readers how to improve their awareness of what tasks can be achieved and describes the correct approach to perform these tasks.

    Practical examples demonstrate the most important points
    The author first discusses how to write computer code using HTML as a concrete example. He then covers a variety of data storage topics, including different file formats, XML, and the structure and design issues of relational databases. After illustrating how to extract data from a relational database using SQL, the book presents tools and techniques for searching, sorting, tabulating, and manipulating data. It also introduces some very basic programming concepts as well as the R language for statistical computing. Each of these topics has supporting chapters that offer reference material on HTML, CSS, XML, DTD, SQL, R, and regular expressions.

    One-stop shop of introductory computing information
    Written by a member of the R Development Core Team, this resource shows readers how to apply data technologies to tasks within a research setting. Collecting material otherwise scattered across many books and the web, it explores how to publish information via the web, how to access information stored in different formats, and how to write small programs to automate simple, repetitive tasks.

    Introduction
    Case Study: Point Nemo
    Writing Computer Code
    Case Study: Point Nemo (continued)
    Syntax
    Semantics
    Writing Code
    Checking Code
    Running Code
    The DRY Principle
    HTML Reference
    HTML Syntax
    HTML Semantics
    CSS Reference
    CSS Syntax
    CSS Semantics
    Linking CSS to HTML
    CSS Tips
    Data Storage
    Case Study: YBC 7289
    Plain Text Formats
    Binary Formats
    Spreadsheets
    XML
    Databases
    XML Reference
    XML Syntax
    Document Type Definitions
    Data Queries
    Case Study: The Data Expo (continued)
    Querying Databases
    Querying XML
    SQL Reference
    SQL Syntax
    SQL Queries
    Other SQL Commands
    Data Processing
    Case Study: The Population Clock
    The R Environment
    The R Language
    Data Types and Data Structures
    Subsetting
    More on Data Structures
    Data Import/Export
    Data Manipulation
    Text Processing
    Data Display
    Programming
    Other Software
    R Reference
    R Syntax
    Data Types and Data Structures
    Functions
    Getting Help
    Packages
    Searching for Functions
    Regular Expressions Reference
    Literals
    Metacharacters
    Conclusion
    Attributions
    Bibliography
    Index
    Further Reading appears at the end of each chapter.

    Biography

    Paul Murrell is a Senior Lecturer in the Department of Statistics at the University of Auckland, New Zealand. Author of the bestselling R Graphics (2006), he is also part of the development team for the R and Omegahat statistical computing projects. Dr. Murrell’s research interests include computational and graphical statistics.

    Paul Murrell, best known for his R Graphics book, has delivered a second masterpiece for people who have the difficult task to clean and prepare raw data for further use in common statistical software packages. … provides the perfect basis for a course on data literacy … Moreover, the book also is an excellent basis for advanced M.S. and Ph.D. students as well as practitioners in academia and industry who are confronted with the task to clean and preprocess their own or their colleagues’ data.
    —Jürgen Symanzik, Technometrics, May 2011

    Introduction to Data Technologies introduces various computer-related topics, including markup languages, statistical computing languages, coding, storage, and querying, in a systematic manner. … the book may serve as an introduction to readers with general interest who plan to supplement their knowledge in specific computer-related topics, in addition to R programming.
    Journal of the American Statistical Association, Vol. 105, No. 492, December 2010

    This is a very gentle book. It enables students and statisticians, particularly those just entering the profession, to begin to familiarize themselves with important concepts and tools from the world of databases … it is encouraging that such topics are finding their way into statistics courses at all. … I found the style of the book very engaging … . It has the Paul Murrell light touch, first evident to me in his eminently readable and comprehensive book on R graphics. Like that one, the present book has interesting, occasionally slightly unusual examples and an easy and elegant writing style. The book does not hesitate to offer plain, direct advice in contexts in which other authors might simply let readers exercise their personal preferences. For students, particularly, I think this is a good thing. …
    —Bill Venables, CSIRO, Australian & New Zealand Journal of Statistics, 2010