What is GIS?
GIS stands for Geographic Information Systems. GIS are computerized systems used in the creation, collection, organization, analysis, and visualization of geospatial data. Geospatial data is a representation of the real world and it is rooted in geography. Geography is the study of the physical features of the Earth and its atmosphere, as well as how human activity impacts both. Human activity is looked at through many lenses, such as population distribution and land usage.
To represent the Earth in a GIS, you will leverage one of two data formats: vectors or rasters. Figure 1.1 shows a stylized version of how real-world data can be represented in vector and raster formats. We’ll define and discuss both of these terms in more detail in Chapter 2, What Is Geospatial Data and Where Can I Find It?
Figure 1.1 – Real-world data in vector and raster format
A typical GIS enables you to query and combine data assets in relation to the spatial relationship of each asset. This data is then visualized in the form of a static or interactive map or within a mapping application.
Geospatial data stored within a GIS comes in many different formats and from many different domains. A GIS used in local government may include information on the land parcels of local neighborhoods, the roads that run through that neighborhood, and the location of public service infrastructure, such as hospitals and fire stations. A GIS servicing a local weather station may include some of these assets, but will likely also include other types of data, such as real-time feeds of storm paths, rainfall totals, and wind speeds at various points around an area at various times. In Chapter 2, What Is Geospatial Data and Where Can I Find It?, we will focus more on various types of spatial data, their file structure, including shapefiles and GeoJSON, and some of the public sources in which spatial data can be found.
In your day-to-day life, you’ve likely used a GIS platform or an application more frequently than you may have realized. Take, for instance, Google Maps, which is arguably the most used GIS application in the world. Google Maps allows you to search for points of interest around you, such as a coffee shop or an auto mechanic, find directions to these points of interest, and also understand adverse conditions such as rush-hour traffic or roadworks that may impact your commute. There are many other forms of GIS applications out there, including applications that trace the route of an Amazon delivery vehicle as it approaches your home, applications that help you understand where public busses and transit hubs are located, and even applications that help monitor the spread of infectious diseases, as we mentioned in the preface to this book.
In addition to web and mobile GIS systems, there are also desktop-based, point-and-click GIS platforms that allow users to perform more complex spatial operations and analyses. These platforms are often used by specialized GIS practitioners who often have the title of geographer, GIS analyst, GIS engineer, or GIS specialist. These systems are used in a variety of different industries for different purposes. A GIS analyst in local government may use a desktop GIS platform to edit parcel boundaries within a town while a GIS analyst for a rail operator may use it to monitor the operation status and location of each railcar. The uses of GIS and the industries in which it is used are near limitless.
Typically, desktop GIS systems are provided by vendors, with the most dominant vendor in the space being Esri. As the dominant player in the GIS space, Esri’s proprietary software integrates into numerous other applications with other vendors, including Microsoft and AutoCAD. In more recent versions of its software, Esri has also extended its application to work with many open source data science languages, such as Python and R, and Integrated Development Environments (IDEs), such as Jupyter Notebook. This book will focus on open source Python packages that do not require licensing. In Chapter 4, Exploring Geospatial Data Science Packages, we will cover packages including GeoPandas, PySAL, and GeoViews, along with many others you’ll leverage in the case studies later in this book.
Now that you have an understanding of GIS, let’s now define what data science is. As we define data science, hopefully, you’ll begin to see how GIS and data science interact.