Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Mastering Numerical Computing with NumPy Master scientific computing and perform complex operations with ease

Product type Paperback

Published in Jun 2018

Publisher Packt

ISBN-13 9781788993357

Length 248 pages

Edition 1st Edition

Languages

Python

Tools

NumPy

Concepts

Scientific Computing

Authors (3):

Tiago Antao

Mert Cuhadaroglu

Umit Mert Cakmak

View More author details

Table of Contents (11) Chapters

Preface

1. Working with NumPy Arrays FREE CHAPTER

2. Linear Algebra with NumPy

3. Exploratory Data Analysis of Boston Housing Data with NumPy Statistics

4. Predicting Housing Prices Using Linear Regression

5. Clustering Clients of a Wholesale Distributor Using NumPy

6. NumPy, SciPy, Pandas, and Scikit-Learn

7. Advanced Numpy

8. Overview of High-Performance Numerical Computing Libraries

9. Performance Benchmarks

10. Other Books You May Enjoy

Leave a review - let other readers know what you think

Trimmed statistics

As you will have noticed in the previous section, the distributions of our features are very dispersed. Handling the outliers in your model is a very important part of your analysis. It is also very crucial when you look at descriptive statistics. You can be easily confused and misinterpret the distribution because of these extreme values. SciPy has very extensive statistical functions for calculating your descriptive statistics in regards to trimming your data. The main idea of using the trimmed statistics is to remove the outliers (tails) in order to reduce their effect on statistical calculations. Let's see how we can use these functions and how they will affect our feature distribution:

In [58]: np.set_printoptions(suppress= True, linewidth= 125)
         samples = dataset.data
         CRIM = samples[:,0:1]
         minimum = np.round(np.amin(CRIM), decimals...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (3)

Mert Cakmak

Umit Mert Cakmak is a data scientist at IBM, where he excels at helping clients solve complex data science problems, from inception to delivery of deployable assets. His research spans multiple disciplines beyond his industry and he likes sharing his insights at conferences, universities, and meet-ups.

See other products by Mert Cakmak

Tiago Antao

Tiago Antao is a bioinformatician currently working in the field of genomics. A former computer scientist, Tiago moved into computational biology with an MSc in Bioinformatics from the Faculty of Sciences at the University of Porto (Portugal) and a PhD on the spread of drug-resistant malaria from the Liverpool School of Tropical Medicine (UK). Postdoctoral, Tiago has worked with human datasets at the University of Cambridge (UK) and with mosquito whole genome sequencing data at the University of Oxford (UK), before helping to set up the bioinformatics infrastructure at the University of Montana. He currently works as a data engineer in the biotechnology field in Boston, MA. He is one of the co-authors of Biopython, a major bioinformatics package written in Python.

See other products by Tiago Antao

Cuhadaroglu

Mert Cuhadaroglu is a BI Developer in EPAM, developing E2E analytics solutions for complex business problems in various industries, mostly investment banking, FMCG, media, communication, and pharma. He consistently uses advanced statistical models and ML algorithms to provide actionable insights. Throughout his career, he has worked in several other industries, such as banking and asset management. He continues his academic research in AI for trading algorithms.

See other products by Cuhadaroglu