Importing nonspatial tabular data (CSV) using GDAL
As an alternative approach to the previous recipe, you will import a CSV file to PostGIS using the ogr2ogr
GDAL command and the GDAL OGR virtual format
. The Geospatial Abstraction Library (GDAL), is a translator library for raster geospatial data formats. OGR is the related library that provides similar capabilities for vector data formats.
This time, as an extra step, you will import only a part of the features in the file and you will reproject them to a different spatial reference system.
Getting ready
You will import the Global_24h.csv
file to the PostGIS database from NASA's Earth Observing System Data and Information System (EOSDIS).
You can download the file from the EOSDIS website at http://firms.modaps.eosdis.nasa.gov/active_fire/text/Global_24h.csv, or copy it from the dataset directory of the book for this chapter.
This file represents the active hotspots detected by the Moderate Resolution Imaging Spectroradiometer (MODIS) satellites in the world for the last 24 hours. For each row, there are the coordinates of the hotspot (latitude, longitude) in decimal degrees (in the WGS 84 spatial reference system, SRID = 4326), and a series of useful fields such as the acquisition date
, acquisition time
, and satellite type
, just to name a few.
You will import only the active fire data scanned by the satellite type marked as "T" (Terra MODIS), and you will project it using the Spherical Mercator projection coordinate system (EPSG:3857
, sometimes marked as EPSG:900913
, where the number 900913 represents Google in 1337 speak, as it was first widely used by Google Maps).
How to do it...
The steps you need to follow to complete this recipe are as follows:
Analyze the structure of the
Global_24h.csv
CSV file (in Windows, open the CSV file with an editor such as Notepad).$ cd ~/postgis_cookbook/data/chp01/ $ head -n 5 Global_24h.csv latitude,longitude,brightness,scan,track,acq_date,acq_time,satellite,confidence,version,bright_t31,frp -23.386,-46.197,307.5,1.1,1,2012-08-20, 0140,T,54,5.0 ,285.7,16.5 -22.952,-47.574,330.1,1.2,1.1,2012-08-20, 0140,T,100,5.0 ,285.2,53.9 -23.726,-56.108,333.3,4.7,2,2012-08-20, 0140,T,100,5.0 ,283.5,404.1 -23.729,-56.155,311.8,4.7,2,2012-08-20, 0140,T,61,5.0 ,272,143.1
Create a GDAL virtual data source composed of just one layer derived from the
Global_24h.csv
file. To do so, create a text file namedglobal_24h.vrt
in the same directory where the CSV file is and edit it as follows:<OGRVRTDataSource> <OGRVRTLayer name="Global_24h"> <SrcDataSource>Global_24h.csv</SrcDataSource> <GeometryType>wkbPoint</GeometryType> <LayerSRS>EPSG:4326</LayerSRS> <GeometryField encoding="PointFromColumns"x="longitude" y="latitude"/> </OGRVRTLayer> </OGRVRTDataSource>
With the
ogrinfo
command, check if the virtual layer is correctly recognized by GDAL. For example, analyze the schema of the layer and the first of its features (fid
=1
):$ ogrinfo global_24h.vrt Global_24h -fid 1 INFO: Open of `global_24h.vrt'using driver `VRT' successful. Layer name: Global_24h Geometry: Point Feature Count: 30326 Extent: (-155.284000, -40.751000) - (177.457000, 70.404000) Layer SRS WKT: GEOGCS["WGS 84", DATUM["WGS_1984", ... latitude: String (0.0) longitude: String (0.0) frp: String (0.0) OGRFeature(Global_24h):1 latitude (String) = -23.386 longitude (String) = -46.197 frp (String) = 16.5 POINT (-46.197 -23.386)
You can also try to open the virtual layer with a Desktop GIS supporting a GDAL/OGR virtual driver such as Quantum GIS (QGIS ). In the following screenshot, the
Global_24h
layer is displayed together with the shapefile of the countries that you can find in the dataset directory of the book:Now, export the virtual layer as a new table in PostGIS using the
ogr2ogr
GDAL/OGR command. You need to use the-f
option to specify the output format, the-t_srs
option to project the points to theEPSG:3857
spatial reference, the-where
option to load only the records from the MODIS Terra satellite type, and the-lco
layer creation option to provide the schema where you want to store the table:$ ogr2ogr -f PostgreSQL -t_srs EPSG:3857 PG:"dbname='postgis_cookbook' user='me' password='mypassword'" -lco SCHEMA=chp01 global_24h.vrt -where "satellite='T'" -lco GEOMETRY_NAME=the_geom
Check how the
ogr2ogr
command created the table as shown in the following command:$ pg_dump -t chp01.global_24h --schema-only -U me postgis_cookbook CREATE TABLE global_24h ( ogc_fid integer NOT NULL, the_geom public.geometry(Point,3857), latitude character varying, longitude character varying, brightness character varying, scan character varying, track character varying, acq_date character varying, acq_time character varying, satellite character varying, confidence character varying, version character varying, bright_t31 character varying, frp character varying );
Now, check the record that should appear in the
geometry_columns
metadata view:postgis_cookbook=# SELECT f_geometry_column, coord_dimension, srid, type FROM geometry_columns WHERE f_table_name = 'global_24h'; f_geometry_column | coord_dimension | srid | type -------------------+-----------------+--------+------- the_geom | 2 | 3857 | POINT (1 row)
Check how many records have been imported in the table:
postgis_cookbook=# select count(*) from chp01.global_24h; count ------- 9190 (1 row)
Note how the coordinates have been projected from
EPSG:4326
toEPSG:3857
:postgis_cookbook=# SELECT ST_AsEWKT(the_geom) FROM chp01.global_24h LIMIT 1; st_asewkt ------------------------------------------------------ SRID=3857;POINT(-5142626.51617686 -2678766.03496892) (1 row)
How it works...
As mentioned in the GDAL documentation:
"OGR Virtual Format is a driver that transforms features read from other drivers based on criteria specified in an XML control file."
GDAL supports the reading and writing of nonspatial tabular data stored as a CSV file, but we need to use a virtual format to derive the geometry of the layers from attribute columns in the CSV file (the longitude and latitude coordinates for each point). For this purpose, you need to at least specify in the driver the path to the CSV file (the SrcDataSource
element), the geometry type (the GeometryType
element), the spatial reference definition for the layer (the LayerSRS
element), and the way the driver can derive the geometric information (the GeometryField
element).
There are many other options and reasons for using OGR virtual formats; if you are interested in having a better understanding, please refer to the GDAL documentation available at http://www.gdal.org/ogr/drv_vrt.html.
After a virtual format is correctly created, the original flat nonspatial dataset is spatially supported by GDAL and the software based on GDAL. This is the reason why we can manipulate these files with GDAL commands such as ogrinfo
and ogr2ogr
, and with Desktop GIS software such as QGIS.
Once we have verified that GDAL can correctly read the features from the virtual driver, we can easily import them in PostGIS using the popular ogr2ogr
command-line utility. The ogr2ogr
command has a plethora of options, so refer to its documentation at http://www.gdal.org/ogr2ogr.html for a more in-depth discussion.
In this recipe, you have just seen some of these options, such as:
-where option: Used to export just a selection of the original feature class
-t_srs option: Used to reproject the data to a different spatial reference system
-lco layer creation option: Used to provide the schema where we would want to store the table (without it, the new spatial table would be created in the
public
schema) and the name of the geometry field in the output layer