Common vector GIS concepts
In this section, we will discuss the different types of GIS processes that are commonly used in geospatial analysis. This list is not exhaustive; however, it will provide you with the essential operations that all other operations are based on. If you understand these operations, you will quickly understand much more complex processes as they are either derivatives or combinations of these processes.
Data structures
GIS vector data uses coordinates consisting of, at a minimum, an X horizontal value and a Y vertical value to represent a location on Earth. In many cases, a point may also contain a Z value. Other ancillary values are possible, including measurements or timestamps.
These coordinates are used to form points, lines, and polygons to model real-world objects. Points can be a geometric feature in and of themselves or they can connect line segments. Closed areas created by line segments are considered polygons. Polygons model objects such as buildings, terrain, or political boundaries.
A GIS feature can consist of a single point, line, or polygon, or it can consist of more than one shape. For example, in a GIS polygon dataset containing world country boundaries, the Philippines, which is made up of 7,107 islands, would be represented as a single country made up of thousands of polygons.
Vector data typically represents topographic features better than raster data. Vector data has more accuracy potential and is more precise. However, collecting vector data on a large scale is also traditionally more costly than raster data.
Two other important terms related to vector data structures are bounding box and convex hull. The bounding box, or minimum bounding box, is the smallest possible square that contains all of the points in a dataset. The following diagram demonstrates a bounding box for a collection of points:
Figure 1.13 – A bounding box is the smallest possible box that fully contains a group of geospatial features
The convex hull of a dataset is similar to the bounding box, but instead of a square, it is the smallest possible polygon that can contain a dataset. The following diagram shows the same point data as the previous example, with the convex hull polygon shown in red:
Figure 1.14 – A convex hull is the smallest possible polygon that fully contains a group of geospatial features
As you can see, the bounding box of a dataset always contains a convex hull.
Geospatial rules about polygons
In geospatial analysis, there are several general rules of thumb regarding polygons that are different from mathematical descriptions of polygons:
- Polygons must have at least four points – the first and last points must be the same
- A polygon boundary should not overlap itself
- Polygons in a layer shouldn’t overlap
- A polygon in a layer inside another polygon is considered a hole in the underlying polygon
Different geospatial software packages and libraries handle exceptions to these rules differently, which can lead to confusing errors or software behaviors. The safest route is to make sure that your polygons obey these rules. There’s one more important piece of information about polygons that we need to talk about.
A polygon is, by definition, a closed shape, which means that the first and last vertices of a polygon are identical. Some geospatial software will throw an error if you haven’t explicitly duplicated the first point as the last point in the polygon dataset. Other software will automatically close the polygon without complaining. The data format that you use to store your geospatial data may also dictate how polygons are defined. This issue is a gray area, so it didn’t make the polygon rules, but knowing this quirk will come in handy someday when you run into an error that you can’t explain easily.
Buffer
A buffer operation can be applied to spatial objects, including points, lines, or polygons. This operation creates a polygon around the object at a specified distance. Buffer operations are used for proximity analysis – for example, establishing a safety zone around a dangerous area. Let’s review the following diagram:
Figure 1.15 – A buffer is a polygon around a geospatial feature at a specified distance
The black shapes represent the original geometry, while the red outlines represent the larger buffer polygons that were generated from the original shape.
Dissolve
A dissolve operation creates a single polygon out of adjacent polygons. Dissolves are also used to simplify data that’s been extracted from remote sensing, as shown here:
Figure 1.16 – A polygon dissolve creates a single polygon out of adjacent polygons
A common use for a dissolve operation is to merge two adjacent properties in a tax database that has been purchased by a single owner.
Generalize
Objects that have more points than necessary for the geospatial model can be generalized to reduce the number of points that are used to represent the shape. This operation usually requires a few attempts to get the optimal number of points without compromising the overall shape. It is a data optimization technique that’s used to simplify data for the efficiency of computing or better visualization. This technique is useful in web mapping applications.
Here is an example of polygon generalization:
Figure 1.17 – Polygon generalization reduces the number of points in a polygon to simplify the geometry to speed up computation geometry or the graphical rendering of the feature. The compromise is losing detail in the shape, which may affect the visualization or analysis
Since computer screens have a resolution of 72 dots per inch (dpi), highly detailed point data, which would not be visible, can be reduced so that less bandwidth is used to send a visually equivalent map to the user.
Intersection
An intersection operation is used to see if one part of a feature intersects with one or more features. This operation is used for spatial queries in proximity analysis and is often a follow-on operation to buffer analysis:
Figure 1.18 – A shape intersection checks whether one feature crosses the geometry of one or more other features
Merge
A merge operation combines two or more non-overlapping shapes in a single multi-shape object. Multi-shape objects are shapes that maintain separate geometries but are treated as a single feature with a single set of attributes by the GIS:
Figure 1.19 – A shape merge combines multiple non-overlapping features into a single dataset
Point in polygon
A fundamental geospatial operation is checking to see whether a point is inside a polygon. This operation is the atomic building block of many different types of spatial queries. If the point is on the boundary of the polygon, it is considered inside. Very few spatial queries exist that do not rely on this calculation in some way. However, it can be very slow on a large number of points.
The most common and efficient algorithm to detect whether a point is inside a polygon is called the ray casting algorithm. First, a test is performed to see whether the point is on the polygon boundary. Next, the algorithm draws a line from the point in question in a single direction. The program counts the number of times the line crosses the polygon’s boundary until it reaches the bounding box of the polygon, as shown here:
Figure 1.20 – The point-in-polygon ray casting algorithm is an efficient way to detect whether a point is inside a polygon
Union
The union operation is less common but is very useful when you wish to combine two or more overlapping polygons in a single shape. It is similar to dissolve, but in this case, the polygons are overlapping as opposed to being adjacent:
Figure 1.21 – A polygon union merges overlapping polygons into a single shape, similar to a dissolve, in which polygons are only adjacent
Usually, this operation is used to clean up automatically generated feature datasets from remote sensing operations.
Join
A join or SQL join is a database operation that’s used to combine two or more tables of information. Relational databases are designed to avoid storing redundant information for one-to-many relationships. For example, a US state may contain many cities. Rather than creating a table for each state containing all of its cities, a table of states with numeric IDs is created, while a table for all the cities in every state is created with a state numeric ID.
In a GIS, you can also have spatial joins that are part of the spatial extension software for a database. In spatial joins, you combine the attributes in the same way that you do in a SQL join. However, the relation is based on the spatial proximity of the two features.
To follow the previous cities example, we could add the county name that each city resides in using a spatial join. The cities layer could be loaded over a county polygon layer whose attributes contain the county’s name. The spatial join would determine which city is in which county and perform a SQL join to add the county name to each city’s attribute row.