You're reading from Bioinformatics with Python Cookbook Use modern Python libraries and applications to solve real-world computational biology problems

Product type Paperback

Published in Sep 2022

Publisher Packt

ISBN-13 9781803236421

Length 360 pages

Edition 3rd Edition

Languages

Python

Tools

Dask

Concepts

Bioinformatics

Author (1):

Tiago Antao

View More author details

Table of Contents (15) Chapters

Preface

1. Chapter 1: Python and the Surrounding Software Ecology

2. Chapter 2: Getting to Know NumPy, pandas, Arrow, and Matplotlib FREE CHAPTER

3. Chapter 3: Next-Generation Sequencing

4. Chapter 4: Advanced NGS Data Processing

5. Chapter 5: Working with Genomes

6. Chapter 6: Population Genetics

7. Chapter 7: Phylogenetics

8. Chapter 8: Using the Protein Data Bank

9. Chapter 9: Bioinformatics Pipelines

10. Chapter 10: Machine Learning for Bioinformatics

11. Chapter 11: Parallel Processing with Dask and Zarr

12. Chapter 12: Functional Programming for Bioinformatics

13. Index

Why subscribe?

14. Other Books You May Enjoy

Getting to Know NumPy, pandas, Arrow, and Matplotlib

One of Python’s biggest strengths is its profusion of high-quality science and data processing libraries. At the core of all of them is NumPy, which provides efficient array and matrix support. On top of NumPy, we can find almost all of the scientific libraries. For example, in our field, there’s Biopython. But other generic data analysis libraries can also be used in our field. For example, pandas is the de facto standard for processing tabled data. More recently, Apache Arrow provides efficient implementations of some of pandas’ functionality, along with language interoperability. Finally, Matplotlib is the most common plotting library in the Python space and is appropriate for scientific computing. While these are general libraries with wide applicability, they are fundamental for bioinformatics processing, so we will study them in this chapter.

We will start by looking at pandas as it provides a high-level library with very broad practical applicability. Then, we’ll introduce Arrow, which we will use only in the scope of supporting pandas. After that, we’ll discuss NumPy, the workhorse behind almost everything we do. Finally, we’ll introduce Matplotlib.

Our recipes are very introductory – each of these libraries could easily occupy a full book, but the recipes should be enough to help you through this book. If you are using Docker, and because all these libraries are fundamental for data analysis, they can be found in the tiagoantao/bioinformatics_base Docker image from Chapter 1.

In this chapter, we will cover the following recipes:

Using pandas to process vaccine-adverse events
Dealing with the pitfalls of joining pandas DataFrames
Reducing the memory usage of pandas DataFrames
Accelerating pandas processing with Apache Arrow
Understanding NumPy as the engine behind Python data science and bioinformatics
Introducing Matplotlib for chart generation

You're reading from Bioinformatics with Python Cookbook Use modern Python libraries and applications to solve real-world computational biology problems

Table of Contents (15) Chapters

Getting to Know NumPy, pandas, Arrow, and Matplotlib

Authors (1)

Personalised recommendations for you