1st Edition
Managing Your Biological Data with Python
Take Control of Your Data and Use Python with Confidence
Requiring no prior programming experience, Managing Your Biological Data with Python empowers biologists and other life scientists to work with biological data on their own using the Python language. The book teaches them not only how to program but also how to manage their data. It shows how to read data from files in different formats, analyze and manipulate the data, and write the results to a file or computer screen.
The first part of the text introduces the Python language and teaches readers how to write their first programs. The second part presents the basic elements of the language, enabling readers to write small programs independently. The third part explains how to create bigger programs using techniques to write well-organized, efficient, and error-free code. The fourth part on data visualization shows how to plot data and draw a figure for an article or slide presentation. The fifth part covers the Biopython programming library for reading and writing several biological file formats, querying the NCBI online databases, and retrieving biological records from the web. The last part provides a cookbook of 20 specific programming "recipes," ranging from secondary structure prediction and multiple sequence alignment analyses to superimposing protein three-dimensional structures.
Tailoring the programming topics to the everyday needs of biologists, the book helps them easily analyze data and ultimately make better discoveries. Every piece of code in the text is aimed at solving real biological problems.
Getting Started
The Python Shell
In This Chapter You Will Learn
Story: Calculating the ΔG of ATP Hydrolysis
What Do the Commands Mean?
Examples
Testing Yourself
Your First Python Program
In This Chapter You Will Learn
Story: How to Calculate the Frequency of Amino Acids from Insulin
What Do the Commands Mean?
Examples
Testing Yourself
Data Management
Analyzing a Data Column
In This Chapter You Will Learn
Story: Dendritic Lengths
What Do the Commands Mean?
Examples
Testing Yourself
Parsing Data Records
In This Chapter You Will Learn
Story: Integrating Mass Spectrometry Data into Metabolic Pathways
What Do the Commands Mean?
Examples
Testing Yourself
Searching Data
In This Chapter You Will Learn
Story: Translating an RNA Sequence into the Corresponding Protein Sequence
What Do the Commands Mean?
Examples
Testing Yourself
Filtering Data
In This Chapter You Will Learn
Story: Working with RNA-Seq Output Data
What Do the Commands Mean?
Examples
Testing Yourself
Managing Tabular Data
In This Chapter You Will Learn
Story: Determining Protein Concentrations
What Do the Commands Mean?
Examples
Testing Yourself
Sorting Data
In This Chapter You Will Learn
Story: Sort a Data Table
What Do the Commands Mean?
Examples
Testing Yourself
Pattern Matching and Text Mining
In This Chapter You Will Learn
Story: Search a Phosphorylation Motif in a Protein Sequence
What Do the Commands Mean?
Examples
Testing Yourself
Modular Programming
Divide a Program into Functions
In This Chapter You Will Learn
Story: Working with Three-Dimensional Coordinate Files
What Do the Commands Mean?
Examples
Testing Yourself
Managing Complexity with Classes
In This Chapter You Will Learn
Story: Mendelian Inheritance
What Do the Commands Mean?
Examples
Testing Yourself
Debugging
In This Chapter You Will Learn
Story: When Your Program Does Not Work
What Do the Commands Mean?
Examples
Testing Yourself
Using External Modules: The Python Interface to R
In This Chapter You Will Learn
Story: Reading Numbers from a File and Calculating Their Mean Value Using R with Python
What Do the Commands Mean?
Examples
Testing Yourself
Building Program Pipelines
In This Chapter You Will Learn
Story: Building an NGS Pipeline
What Do the Commands Mean?
Examples
Testing Yourself
Writing Good Programs
In This Chapter You Will Learn
Problem Description: Uncertainty
What Do the Commands Mean?
Examples
Testing Yourself
Data Visualization
Creating Scientific Diagrams
In This Chapter You Will Learn
Story: Nucleotide Frequencies in the Ribosome
What Do the Commands Mean?
Examples
Testing Yourself
Creating Molecule Images with PyMOL
In This Chapter You Will Learn
Story: The Zinc Finger
Seven Steps to Create a High-Resolution Image
Examples
Testing Yourself
Manipulating Images
In This Chapter You Will Learn
Story: Plot a Plasmid
What Do the Commands Mean?
Examples
Testing Yourself
Biopython
Working with Sequence Data
In This Chapter You Will Learn
Story: How to Translate a DNA Coding Sequence into the Corresponding Protein Sequence and Write It to a FASTA File
What Do the Commands Mean?
Examples
Testing Yourself
Retrieving Data from Web Resources
In This Chapter You Will Learn
Story: Searching Publications by Keywords in PubMed, Downloading the Corresponding Records, and Writing Papers Published in a Given Year to a File
What Do the Commands Mean?
Examples
Testing Yourself
Working with 3D Structure Data
In This Chapter You Will Learn
Story: Extracting Atom Names and Three-Dimensional Coordinates from a PDB File
What Do the Commands Mean?
Examples
Testing Yourself
Cookbook
Recipe 1: The PyCogent Library
Recipe 2: Reversing and Randomizing a Sequence
Recipe 3: Creating a Random Sequence with Probabilities
Recipe 4: Parsing Multiple Sequence Alignments Using Biopython
Recipe 5: Calculating a Consensus Sequence from a Multiple Sequence Alignment
Recipe 6: Calculating the Distance between Phylogenetic Tree Nodes
Recipe 7: Codon Frequencies in a Nucleotide Sequence
Recipe 8: Parsing RNA 2D Structures in the Vienna Format
Recipe 9: Parsing BLAST XML Output
Recipe 10: Parsing SBML Files
Recipe 11: Running BLAST
Recipe 12: Accessing, Downloading, and Reading Web Pages in Python
Recipe 13: Parsing HTML Files
Recipe 14: Split a PDB File into PDB Chain Files
Recipe 15: Find the Two Closest Cα Atoms in a PDB Structure
Recipe 16: Extract the Interface between Two PDB Chains
Recipe 17: Building Homology Models Using Modeller
Recipe 18: RNA 3D Homology Modeling with ModeRNA
Recipe 19: Calculating RNA Base Pairs from a 3D Structure
Recipe 20: A Real Case of Structural Superimposition: The Serine Protease Catalytic Triad
Appendix A: Command Overview
Appendix B: Python Resources
Appendix C: Record Samples
Appendix D: Handling Directories and Programs with UNIX
Biography
Allegra Via, Kristian Rother, Anna Tramontano
“… a significant step forward … The book is cleverly designed to cover a wide range of subjects in a pleasant, easy-to-follow sequence of chapters. These have been carefully prepared so that the minimum level of interdependence is kept, making it possible to begin working at virtually any level without falling into intricate cross-references. A beginner will find the first chapters quite welcoming while a person with medium or even high levels of programming experience can easily find a suitable entry point in the middle.
The book is written using an entertaining style that pushes the reader into a naturally built engaging experience … the authors have chosen a collection of underlying subject areas that cover a very wide variety of interests, ensuring that mixed audiences are kept engaged. In that sense, the content becomes adaptable to the wide diversity of learners that are found in today's communities of specialised biologists.
… also usable as a reference guide, due to the richness of its worked examples that will prove valuable as seeds for code development for programmers at any level. … as a single book to support learning Python for problem solvers in the life sciences, this book is certainly a very smart choice. It is also ready for creative teachers to develop more in the same direction.”
—Pedro L. Fernandes, Instituto Gulbenkian de Ciência"Having read Managing Your Biological Data with Python brings back memories of the times I started writing my first lines of code nearly a decade ago. As a beginning structural biologist without any coding experience, this book would have been a welcome companion to quickly get me started on my bioinformatical projects with Python. It is this, often pragmatic, attitude scientists have towards programming that makes Python the language of choice for many. A clear syntax, powerful build-in functions and a lively ecosystem of user contributed modules allow you to do advanced things with only little lines of code.
The book introduces you to the basic principles of programming in Python using the many build-in functions. It does so using practical examples that you can start using right away in your day-to-day research.
Python’s modular design principles could even be seen in the organization of this book. If you have never written a line of code in your life, the first chapters are indispensable to teach you basic coding principles but if you have some experience, you can safely skip these. I would however, recommend to read the ones introducing the build-in functions. It never hurts to refresh your memory on the many powerful build-ins Python actually has; I certainly forgot about one or two of them. Working your way through the first chapters will help you get comfortable with Python and lay the foundation for writing more advanced programs in the remaining chapters. These chapters introduce some of the powerful community contributed Python modules that make your life as a biologist a whole lot easier. Again, the example code introducing these modules is of high practical value and together with the coding recipes in the ‘cookbook’ chapter they provide a solid blueprint for you to build your own code upon.
I’m confident that reading Managing Your Biological Data with Python will quickly allow you to get the most out of your data and start answering those trilling scientific questions you have, and do all of that while having fun. "
—Marc van Dijk, Structural biologist, bioinformaticien, and eScience entrepreneur, Bijvoet Center for Biomolecular Research, Utrecht University, The Netherlands"For many biologists faced with computational challenges, Python has become the language of choice, due to its power, elegance, and simplicity. Managing Your Biological Data with Python by Allegra Via et al. teaches Python using biological examples and discusses important Python-driven applications, such as PyMol and Biopython. The book is an excellent resource for any biologist needing relevant programming skills."
—Thomas Hamelryck, Associate Professor, Bioinformatics Center, University of Copenhagen, Denmark"Biological data volumes are growing rapidly as high-throughput technologies (e.g., DNA microarrays or DNA/RNA sequencing) improve. Managing and analyzing biological data are becoming more demanding and the application of programming techniques has simply become a standard. Managing Your Biological Data with Python is one of very few user-friendly books for biologists. It is amazing how clearly authors explain the possible applications of Python for data management (parsing data records, filtering and sorting data) and data visualization (also using the Python interface to R). The book also offers the description of modular programming, which is simply excellent! It guides readers from writing simple functions through writing classes to building program pipelines—everything according to Python coding standards and in an easy-to-follow way. This is absolutely the best book to start learning Python. Intermediate Python users can use this book to learn some new tricks that they could implement in their own code. I can highly recommend this book to researchers, students, and their lecturers."
—Dr. Barbara Uszczynska, Centre de Regulació Genòmica (CRG), Barcelona, Spain