Packt+ | Advance your knowledge in tech

0

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Free Learning

Bioinformatics with Python Cookbook

You're reading from Bioinformatics with Python Cookbook Learn how to use modern Python bioinformatics libraries and applications to do cutting-edge research in computational biology

Product type Paperback

Published in Nov 2018

Publisher Packt

ISBN-13 9781789344691

Length 360 pages

Edition 2nd Edition

Languages

Python

Tools

Biopython

Concepts

Bioinformatics

Author (1):

Tiago Antao

View More author details

Table of Contents (12) Chapters

Preface

1. Python and the Surrounding Software Ecology

2. Next-Generation Sequencing FREE CHAPTER

3. Working with Genomes

4. Population Genetics

5. Population Genetics Simulation

6. Phylogenetics

7. Using the Protein Data Bank

8. Bioinformatics Pipelines

9. Python for Big Genomics Datasets

10. Other Topics in Bioinformatics

11. Advanced NGS Processing

Using high-performance data formats – HDF5

VCF processing is very slow: if you do an empty for loop over a big VCF file, it can easily take days just to parse it. This is because text parsing is very demanding. Alternatively, using NumPy arrays is fast, but you are limited to whatever fits in memory. There are several alternatives to deal with both of these problems (and we will explore more than one in this chapter). Here, we will consider representing our data in HDF5 format.

We will use an existing HDF5 file that was exported from a VCF file and do some basic extraction of data.

Getting ready

We will revisit the Anopheles gambiae dataset that we used in previous chapters. There is a HDF5 version of the VCF file that we used previously. You can download chromosome arm 3L from ftp://ngs.sanger.ac.uk/production/ag1000g/phase1/AR3/variation/main/hdf5/ag1000g.phase1.ar3.pass.3L.h5. Remember that we are dealing with a VCF representation of 765 mosquitoes that can be carriers of Plasmodium falciparum...

The rest of the chapter is locked

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (1)

Tiago Antao

Tiago Antao

Tiago Antao is a bioinformatician currently working in the field of genomics. A former computer scientist, Tiago moved into computational biology with an MSc in Bioinformatics from the Faculty of Sciences at the University of Porto (Portugal) and a PhD on the spread of drug-resistant malaria from the Liverpool School of Tropical Medicine (UK). Postdoctoral, Tiago has worked with human datasets at the University of Cambridge (UK) and with mosquito whole genome sequencing data at the University of Oxford (UK), before helping to set up the bioinformatics infrastructure at the University of Montana. He currently works as a data engineer in the biotechnology field in Boston, MA. He is one of the co-authors of Biopython, a major bioinformatics package written in Python.

See other products by Tiago Antao

Other recommended products

Related to this chapter

R Bioinformatics Cookbook

R Bioinformatics Cookbook

In the R Bioinformatics Cookbook, you encounter common and not-so-common challenges in the bioinformatics domain and solve them using real-world examples. The book guides you through varied bioinformatics analysis, from raw data to clean results. It shows you how to import, explore and evaluate your data and how to report it.

Oct 2019 10h 32m

R Bioinformatics Cookbook

R Bioinformatics Cookbook

In the R Bioinformatics Cookbook, you encounter common and not-so-common challenges in the bioinformatics domain and solve them using real-world examples. The book guides you through varied bioinformatics analysis, from raw data to clean results. It shows you how to import, explore and evaluate your data and how to report it.

Oct 2019 10h 32m

R Bioinformatics Cookbook

R Bioinformatics Cookbook

In the R Bioinformatics Cookbook, you encounter common and not-so-common challenges in the bioinformatics domain and solve them using real-world examples. The book guides you through varied bioinformatics analysis, from raw data to clean results. It shows you how to import, explore and evaluate your data and how to report it.

Oct 2019 10h 32m

Personalised recommendations for you

Based on your interests and search pattern

Mastering Customer Success

Mastering Customer Success

This guide unveils strategies to cultivate enduring customer relationships. Grounded in effective communication and problem-solving, it shows you how to harness cross-functional collaboration for competitive advantage.

May 2024 5h 40m

Mastering Customer Success

Mastering Customer Success

This guide unveils strategies to cultivate enduring customer relationships. Grounded in effective communication and problem-solving, it shows you how to harness cross-functional collaboration for competitive advantage.

May 2024 5h 40m

Mastering Customer Success

Mastering Customer Success

This guide unveils strategies to cultivate enduring customer relationships. Grounded in effective communication and problem-solving, it shows you how to harness cross-functional collaboration for competitive advantage.

May 2024 5h 40m

Mastering Customer Success

Mastering Customer Success

This guide unveils strategies to cultivate enduring customer relationships. Grounded in effective communication and problem-solving, it shows you how to harness cross-functional collaboration for competitive advantage.

May 2024 5h 40m

Mastering Customer Success

Mastering Customer Success

This guide unveils strategies to cultivate enduring customer relationships. Grounded in effective communication and problem-solving, it shows you how to harness cross-functional collaboration for competitive advantage.

May 2024 5h 40m

Mastering Customer Success

Mastering Customer Success

This guide unveils strategies to cultivate enduring customer relationships. Grounded in effective communication and problem-solving, it shows you how to harness cross-functional collaboration for competitive advantage.

May 2024 5h 40m

Mastering Customer Success

Mastering Customer Success

This guide unveils strategies to cultivate enduring customer relationships. Grounded in effective communication and problem-solving, it shows you how to harness cross-functional collaboration for competitive advantage.

May 2024 5h 40m

Mastering Customer Success

Mastering Customer Success

This guide unveils strategies to cultivate enduring customer relationships. Grounded in effective communication and problem-solving, it shows you how to harness cross-functional collaboration for competitive advantage.

May 2024 5h 40m

Mastering Customer Success

Mastering Customer Success

This guide unveils strategies to cultivate enduring customer relationships. Grounded in effective communication and problem-solving, it shows you how to harness cross-functional collaboration for competitive advantage.

May 2024 5h 40m

Mastering Customer Success

Mastering Customer Success

This guide unveils strategies to cultivate enduring customer relationships. Grounded in effective communication and problem-solving, it shows you how to harness cross-functional collaboration for competitive advantage.

May 2024 5h 40m

Mastering Customer Success

Mastering Customer Success

This guide unveils strategies to cultivate enduring customer relationships. Grounded in effective communication and problem-solving, it shows you how to harness cross-functional collaboration for competitive advantage.

May 2024 5h 40m

Mastering Customer Success

Mastering Customer Success

This guide unveils strategies to cultivate enduring customer relationships. Grounded in effective communication and problem-solving, it shows you how to harness cross-functional collaboration for competitive advantage.

May 2024 5h 40m