1st Edition

Data Stewardship for Open Science Implementing FAIR Principles

By Barend Mons Copyright 2018
    244 Pages 19 B/W Illustrations
    by Chapman & Hall

    244 Pages 19 B/W Illustrations
    by Chapman & Hall

    244 Pages 19 B/W Illustrations
    by Chapman & Hall

    244 Pages 19 B/W Illustrations
    by Chapman & Hall

    Data Stewardship for Open Science: Implementing FAIR Principles has been written with the intention of making scientists, funders, and innovators in all disciplines and stages of their professional activities broadly aware of the need, complexity, and challenges associated with open science, modern science communication, and data stewardship. The FAIR principles are used as a guide throughout the text, and this book should leave experimentalists consciously incompetent about data stewardship and motivated to respect data stewards as representatives of a new profession, while possibly motivating others to consider a career in the field.

    The ebook, avalable for no additional cost when you buy the paperback, will be updated every 6 months on average (providing that significant updates are needed or avaialble). Readers will have the opportunity to contribute material towards these updates, and to develop their own data management plans, via the free Data Stewardship Wizard.

    Chapter 1. Introduction

    1.1 Data stewardship for open science

    1.2 Introduction by the author

    1.3 Definitions and context

    1.4 The lines of thinking

    1.5 The basics of good data stewardship

    Chapter 2. Data cycle step 1: Design of experiment

    2.1 Is there preexisting data?

    2.2 Will you use preexisting data (including opedas)?

    2.3 Will you use reference data?

    2.4 Where is it available?

    2.5 What format?

    2.6 Is the data resource versioned?

    2.7 Will you be using any existing (nonreference) data sets?

    2.8 Will owners of that data work with you on this study?

    2.9 Is reconsent needed?

    2.10 Do you need to harmonize different sources of opedas?

    2.11 What/how/who will integrate existing data?

    2.12 Will reference data be created?

    2.13 Will you be storing physical samples?

    2.14 Will you be collecting experimental data?

    2.15 Are there data formatting considerations?

    2.16 Are there potential issues regarding data ownership and access control?

    Chapter 3. Data cycle step 2: Data design and planning

    3.1 Are you using data types used by others too?

    3.1.1 What format(s) will you use for the data?

    3.2 Will you be using new types of data?

    3.3 How will you be storing metadata?

    3.4 Method stewardship

    3.5 Storage (how will you store your data?

    3.6 Is there (critical) software in the workspace?

    3.7 Do you need the storage close to compute capacity?

    3.8 Compute capacity planning

    Chapter 4. Data cycle step 3: Data Capture (equipment phase)

    4.1 Where does the data come from? Who will need the data?

    4.2 Capacity and harmonisation planning

    Chapter 5. Data cycle step 4: Data Processing and Curation

    5.1 Workflow development

    5.2 Choose the workflow engine

    5.3 Workflow running

    5.4 Tools and data directory (for the experiment)

    Chapter 6. Data cycle step 5 Data Linking and ‘Integration’

    6.1 What is the approach you will use for data integration?

    6.2 Will you make your output semantically interoperable data?

    6.3 Will you use a workflow e.g. with tools for database access or conversion?

    Chapter 7. Data cycle step 6: Data Analysis, Interpretation

    7.1 Will you use static or dynamic (systems) models?

    7.2 Machine learning?

    7.3 Will you be building kinetic models?

    7.4 How will you make sure the analysis is best suited to answer your biological question?

    7.5 How will you ensure reproducibility?

    7.6 Will you be doing (automated) knowledge discovery?

    Chapter 8. Data cycle step 7: Information and insight in publishing

    8.1 How much will be open data/access?

    8.2 Who will pay for open access data publishing?

    8.3 Legal issues

    8.4 What technical issues are associated with hpr?

    8.5 Will you publish also if the results are negative?


    Barend Mons is a molecular biologist by training (PhD, Leiden University, 1986) and spent over 15 years in malaria research. After that he gained two decades of experience in computer-assisted knowledge discovery, which is still his research focus at the Leiden University Medical Centre.

    Data Stewardship for Open Science is especially recommended for corporate; college, and university library Computer Science and Engineering collections. It should be noted for the personal reading lists of students, academia, and non-specialist general readers that "Data Stewardship for Open Science" is also available in a paperback edition

    —James A. Cox, The Midwest Book Review, Science Shelf

    The practice of science in the information age is changing in ways that have not been more pronounced since the advent of mathematics. Data Stewardship for Open Science provides a comprehensive inventory of what scientists need to know about putting data into public repositories in a manner that will allow their teams and those of other investigators to understand what experiments actually have been done, to explore online data in search of new discoveries, to build on the datasets of other scientists, and to prepare for a world in which the dissemination of research objects in digital form will become the primary means by which scientists communicate with one another. Mons does not offer a cookbook for managing experimental data, but rather a rich enumeration of what scientists need to do—and not do—to be successful in addressing the opportunities and the challenges provided by open science. This book provides essential advice for anyone who needs to generate, to access, or to manage scientific data for broad, public consumption—which is now just about anyone in the scientific community.

    —Mark A. Musen, Stanford University

    Prof. Barend Mons is a visionary of a new age of data-driven and machine-assisted science: the era of FAIR data, where the immensely valuable data is not lost any more, but it persists, and is reusable to enable currently imaginable avenues to increase human knowledge. This book is the first comprehensive publication containing the essential vision on a way forward, a book that sets a new standard of effective data stewardship and that should be on a shelf of every responsible data steward. We are honoured to be among the authors of a supporting software project: the Data Stewardship Web portal. [https://dmp.fairdata.solutions]

    —Dr. Robert Pergl and Dr. Jiri Vondrasek, Elixir Node Czech Republic

    Data stewardship, a concept that involves all those data management issues related to long-term data reusability and interoperability, requires careful planning and thought from the beginning of a research project. Producing research that complies with FAIR principles is an ethical responsibility for all scientists, and a plan for reuse should be an obligatory and fundamental part of study design, especially for those working with public funding. Beyond the ethical responsibility to produce transparent and reproducible research, young scientists today should view cultivation of data stewardship skills as an opportunity to participate in the exciting, innovative research of tomorrow. This book will greatly help early career researchers to get acquainted with the basics of modern data stewardship, as we described in our common paper: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1351-7

    —The collective students of the 2016 Summerschool of the League of European Research Universities

    This book is an excellent source to learn and practice exploration, explanation, and interpretation of meanings hidden in the data using what the author names ‘FAIR’ principles.

    -Ramalingam Shanmugam, Journal of Statistical Computation and Simulation