Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Geospatial Data Science Quick Start Guide
Geospatial Data Science Quick Start Guide

Geospatial Data Science Quick Start Guide: Effective techniques for performing smarter geospatial analysis using location intelligence

Arrow left icon
Profile Icon Abdishakur Hassan Profile Icon Jayakrishnan Vijayaraghavan
Arrow right icon
$32.99
Full star icon Full star icon Full star icon Full star icon Empty star icon 4 (6 Ratings)
Paperback May 2019 170 pages 1st Edition
eBook
$15.99 $22.99
Paperback
$32.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Abdishakur Hassan Profile Icon Jayakrishnan Vijayaraghavan
Arrow right icon
$32.99
Full star icon Full star icon Full star icon Full star icon Empty star icon 4 (6 Ratings)
Paperback May 2019 170 pages 1st Edition
eBook
$15.99 $22.99
Paperback
$32.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$15.99 $22.99
Paperback
$32.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Table of content icon View table of contents Preview book icon Preview Book

Geospatial Data Science Quick Start Guide

Consuming Location Data Like a Data Scientist

Location comes in different forms, but what if it comes in a simple structured data format and we overlooked it all this time? Most machine learning algorithms, such as random forests, are geared toward creating insights from structured data in tabular form. In this chapter, we will discuss how to leverage spatial data that is masquerading as tabular data and apply machine learning techniques to it as any data scientist would. For this chapter, we will be using New York taxi trip data to predict trip duration for any given New York taxi trip. We are choosing this dataset because of the following reasons:

  • Predicting trip duration has the right mix of geospatial analytics and machine learning
  • Finding the time it takes to travel from point A to point B is a routing problem, which will be dealt with in Chapter 6, Let's Build a Routing...

Exploratory data analysis

For this chapter, we will be using curated data from the New York taxi trip dataset provided by the city of New York. The original source for this data can be found here: https://data.cityofnewyork.us/api/odata/v4/hvrh-b6nb.

Visit the following website for more details about the data that's included in this dataset: https://data.cityofnewyork.us/Transportation/2016-Green-Taxi-Trip-Data/hvrh-b6nb.

For starters, let's have a peek at the data at hand using pandas. The curated data (NYC_sample.csv) that we will be using here can be found at the following download link: https://drive.google.com/file/d/1OkkYZJEcsdCkU0V42eP6pj6YaK2WCGCE/view.

df = pd.read_csv("NYC_sample.csv")
df.head().T

The curated New York taxi trip data that we are using has around 1.14 million records and has columns related to taxi fare, as well as trip duration, as...

Spatial data processing

We will be discussing three things in this section: taxi zones, spatial joins, and the calculation of distances.

Taxi zones in New York

Analyzing and processing a taxi zone spatial data helps us achieve two objectives:

  • Substitute the missing coordinates for pickup and dropoff locations with the taxi zone's centroid
  • Use the taxi zone as a feature in the model

Visualization of taxi zones

We have provided the shapefile for the taxi zones in the data repository. Shapefiles can be read as (Geo)DataFrames with the Python library known as GeoPandas...

Error metric

If we visit the evaluation section of the Kaggle competition, the evaluation metric is defined as the RMSLE. In the competition, the objective is to minimize this metric for the test data. An error is simply the difference between actual values and predicted values:

error = predicted value - actual value

The Root Mean Squared Error (RMSE) would literally be the square root applied over the mean of all the squared error terms for each observation.

However, our metric in the Kaggle competition needs to be a log error:

log_error = log(predicted value + 1) - log(actual value + 1)

Therefore, it is important to apply a log transform over the trip_duration column as we did earlier:

df["trip_duration"] = np.log(df["trip_duration"] + 1) 

Now, we can use a function that can calculate RMSE rather a function that calculates RMSLE:

import math 
def rmse(x,y...

Building the model

Let's build the final model using a random forest regressor. A random forest is a universal machine learning technique, that is, it can handle different kinds of data; it could be a category (classification), a continuous variable (regression), or features of any kind, such an image, price, time, post codes, and so on (that is, both structured and unstructured data). It doesn't generally overfit too much, and it is very easy to stop it from overfitting. For these reasons, random forest is a versatile ML technique which we can effectively use to solve our problem.

Validation data and error metrics

Our initial step is choosing a suitable size for validation data. Before delineating the validation...

Summary

In this chapter, we chose a pertinent problem that had both analytics and geospatial components and tried to apply a very robust ML technique known as random forest to it. Before building the model, we had to handle the date component, the spatial component of data, as well as the categorical and continuous variables. We were able to achieve a good score in our first pass and build a world-class model with a few lines of code and a little bit of spatial data processing.

In the next chapter, we will discuss more accurate real-world distance metrics and perform other spatial computations, such as intersection, to make the model better.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Manipulate location-based data and create intelligent geospatial data models
  • Build effective location recommendation systems used by popular companies such as Uber
  • A hands-on guide to help you consume spatial data and parallelize GIS operations effectively

Description

Data scientists, who have access to vast data streams, are a bit myopic when it comes to intrinsic and extrinsic location-based data and are missing out on the intelligence it can provide to their models. This book demonstrates effective techniques for using the power of data science and geospatial intelligence to build effective, intelligent data models that make use of location-based data to give useful predictions and analyses. This book begins with a quick overview of the fundamentals of location-based data and how techniques such as Exploratory Data Analysis can be applied to it. We then delve into spatial operations such as computing distances, areas, extents, centroids, buffer polygons, intersecting geometries, geocoding, and more, which adds additional context to location data. Moving ahead, you will learn how to quickly build and deploy a geo-fencing system using Python. Lastly, you will learn how to leverage geospatial analysis techniques in popular recommendation systems such as collaborative filtering and location-based recommendations, and more. By the end of the book, you will be a rockstar when it comes to performing geospatial analysis with ease.

Who is this book for?

Data Scientists who would like to leverage location-based data and want to use location-based intelligence in their data models will find this book useful. This book is also for GIS developers who wish to incorporate data analysis in their projects. Knowledge of Python programming and some basic understanding of data analysis are all you need to get the most out of this book.

What you will learn

  • Learn how companies now use location data
  • Set up your Python environment and install Python geospatial packages
  • Visualize spatial data as graphs
  • Extract geometry from spatial data
  • Perform spatial regression from scratch
  • Build web applications which dynamically references geospatial data
Estimated delivery fee Deliver to Ecuador

Standard delivery 10 - 13 business days

$19.95

Premium delivery 3 - 6 business days

$40.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : May 31, 2019
Length: 170 pages
Edition : 1st
Language : English
ISBN-13 : 9781789809411
Category :
Languages :
Concepts :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Estimated delivery fee Deliver to Ecuador

Standard delivery 10 - 13 business days

$19.95

Premium delivery 3 - 6 business days

$40.95
(Includes tracking information)

Product Details

Publication date : May 31, 2019
Length: 170 pages
Edition : 1st
Language : English
ISBN-13 : 9781789809411
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 136.97
Learning Geospatial Analysis with Python
$59.99
Mastering Geospatial Development with QGIS 3.x
$43.99
Geospatial Data Science Quick Start Guide
$32.99
Total $ 136.97 Stars icon

Table of Contents

8 Chapters
Introducing Location Intelligence Chevron down icon Chevron up icon
Consuming Location Data Like a Data Scientist Chevron down icon Chevron up icon
Performing Spatial Operations Like a Pro Chevron down icon Chevron up icon
Making Sense of Humongous Location Datasets Chevron down icon Chevron up icon
Nudging Check-Ins with Geofences Chevron down icon Chevron up icon
Let's Build a Routing Engine Chevron down icon Chevron up icon
Getting Location Recommender Systems Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Most Recent
Rating distribution
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
(6 Ratings)
5 star 66.7%
4 star 0%
3 star 0%
2 star 33.3%
1 star 0%
Filter icon Filter
Most Recent

Filter reviews by




Iyyanki Jun 16, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Contents are good
Amazon Verified review Amazon
Kurt Jan 02, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is a straightforward guide about geospatial analysis with python. A github repository accompanies the book as well. The only drawback is the black-white print out.
Amazon Verified review Amazon
Josh Brown Oct 02, 2020
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
Two chapters in I'm finding that portions of this book rely on FastAI which is a cutting edge python ML library with a rapidly changing API, and documentation that isn't versioned. The book uses a lot of the extremely useful helper functions included in Fastai, but the modules in this library where these functions are stored have clearly been changed a few times by now. The author suggests using Google Colab to make getting started easier for the reader, but even the github code repo is not kept current with the version of fastai that is pip installed through Colab, or the most current version of fastai which would allow you to use their documentation to sort this out. I've gotten some of the notebooks to work locally in jupyter, but the amount of RAM on my laptop struggles in the WSL Linux environment which it's easiest to install fastai in. Also some of the data source links are broken meaning you have to look elsewhere for the data. This is publicly available data so it's not a huge problem, but this also stands in the way of learning the core concepts.I plan to struggle through this because I recognize there is a bit of a dependency struggle in this domain, and I think that is worth working through for my own learning, but it's worth pointing out that this book forces you to learn a lot of technical detail that is arbitrary to the concepts it is selling. Considering that this is marketed as a quick start guide, I think they've failed magnificently in that regard.If you want to hack away at exploring an out of date, undocumented API while learning geospatial fundamentals then proceed, because the use cases in this book seem valuable, but if you don't want to struggle with fixing the broken code in this book, look for a book that uses less cutting edge or better documented libraries.
Amazon Verified review Amazon
Vinicius Oct 29, 2019
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Livro básico para uma área carente de fontes. Muito bom!
Amazon Verified review Amazon
Johann H. Aug 11, 2019
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
I really wanted to like the book, but I didn't and can't really recommend it for Data Scientists.The book reads like a medium article. That means, very little to no theory explained, a lot of: how to implement this and that, what tools to use etc. For my personal taste this book has to little content. I felt like I learned close to nothing, even though I am not an experienced GIS user. Hence I wonder what the target group is for this book: Data Scientists know how to use pandas and people who don't know Data Science wont learn it from this book since there is to little theory.Furthermore there are a lot of spelling errors, repetitions etc. that makes me wonder if this book had an editor.One thing made me particularly angry: In pretty much every chapter starts with a formulation like "In this chapter we will learn to do Task X", then they call a python function from a loaded package task_x(your_data_goes_here) and close the chapter with a sentence: "In this chapter we learned Task X". No you did not. You simply loaded a damn library, that has nothing to do with learning.There might be a subset of people for whom this book is useful, for me personally it was not.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela