Handling batch importing and exporting of datasets
In many GIS workflows, there is a typical scenario where subsets of a PostGIS table must be deployed to external users in a filesystem format (most typically, shapefiles or a spatialite database). Often, there is also the reverse process, where datasets received from different users have to be uploaded to the PostGIS database.
In this recipe, we will simulate both of these data flows. You will first create the data flow for processing the shapefiles out of PostGIS, and then the reverse data flow for uploading the shapefiles.
You will do it using the power of bash scripting and the ogr2ogr
command.
Getting ready
If you didn't follow all the other recipes, be sure to import the hotspots and the countries dataset in PostGIS. The following is how to do it with ogr2ogr
(you should import both the datasets in their original SRID, 4326, to make spatial operations faster):
Import in PostGIS the
Global_24h.csv
file using theglobal_24.vrt
virtual driver you created in a previous recipe:$ ogr2ogr -f PostgreSQL PG:"dbname='postgis_cookbook' user='me' password='mypassword'" -lco SCHEMA=chp01 global_24h.vrt -lco OVERWRITE=YES -lco GEOMETRY_NAME=the_geom -nln hotspots
Import the countries shapefile using
ogr2ogr
:$ ogr2ogr -f PostgreSQL -sql "SELECT ISO2, NAME AS country_name FROM 'TM_WORLD_BORDERS-0.3'" -nlt MULTIPOLYGON PG:"dbname='postgis_cookbook' user='me' password='mypassword'" -nln countries -lco SCHEMA=chp01 -lco OVERWRITE=YES -lco GEOMETRY_NAME=the_geom TM_WORLD_BORDERS-0.3.shp
Note
In case you already imported the hotspots dataset using the 3857 SRID, you can use the new PostGIS 2.0 method that allows the user to modify the geometry type column of an existing spatial table. You can update the SRID definition for the hotspots table in this way thanks to the support of typmod on geometry objects:
postgis_cookbook=# ALTER TABLE hotspots ALTER COLUMN the_geom SET DATA TYPE geometry(Point, 4326) USING ST_Transform(the_geom, 4326);
How to do it...
The steps you need to follow to complete this recipe are as follows:
Check how many hotspots there are for each distinct country by using the following query:
postgis_cookbook=> SELECT c.country_name, MIN(c.iso2) as iso2, count(*) as hs_count FROM chp01.hotspots as hs JOIN chp01.countries as c ON ST_Contains(c.the_geom, hs.the_geom) GROUP BY c.country_name ORDER BY c.country_name;
The output of the preceding command is as follows:
country_name | iso2 | hs_count -------------------------------------------- Albania | AL | 66 Algeria | DZ | 361 ... Yemen | YE | 6 Zambia | ZM | 1575 Zimbabwe | ZW | 179 (103 rows)
Using the same query, generate a CSV file using the PostgreSQL
COPY
command or theogr2ogr
command (in the first case, make sure that the Postgre service user has full write permission to the output directory). If you are following theCOPY
approach and using Windows, be sure to replace/tmp/hs_countries.csv
with a different path:$ ogr2ogr -f CSV hs_countries.csv PG:"dbname='postgis_cookbook' user='me' password='mypassword'" -lco SCHEMA=chp01 -sql "SELECT c.country_name, MIN(c.iso2) as iso2, count(*) as hs_count FROM chp01.hotspots as hs JOIN chp01.countries as c ON ST_Contains(c.the_geom, hs.the_geom) GROUP BY c.country_name ORDER BY c.country_name" postgis_cookbook=> COPY (SELECT c.country_name, MIN(c.iso2) as iso2, count(*) as hs_count FROM chp01.hotspots as hs JOIN chp01.countries as c ON ST_Contains(c.the_geom, hs.the_geom) GROUP BY c.country_name ORDER BY c.country_name) TO '/tmp/hs_countries.csv' WITH CSV HEADER;
If you are using Windows, go to step 5. With Linux, create a bash script named
export_shapefiles.sh
that iterates each record (country) in thehs_countries.csv
file and generates a shapefile with the corresponding hotspots exported from PostGIS for that country:#!/bin/bash while IFS="," read country iso2 hs_count do echo "Generating shapefile $iso2.shp for country $country ($iso2) containing $hs_count features." ogr2ogr out_shapefiles/$iso2.shp PG:"dbname='postgis_cookbook' user='me' password='mypassword'" -lco SCHEMA=chp01 -sql "SELECT ST_Transform(hs.the_geom, 4326), hs.acq_date, hs.acq_time, hs.bright_t31 FROM chp01.hotspots as hs JOIN chp01.countries as c ON ST_Contains(c.the_geom, ST_Transform(hs.the_geom, 4326)) WHERE c.iso2 = '$iso2'" done < hs_countries.csv
Give execution permissions to the bash file, and then run it after creating an output directory (
out_shapefiles
) for the shapefiles that will be generated by the script. Then, go to step 7:chmod 775 export_shapefiles.sh mkdir out_shapefiles $ ./export_shapefiles.sh Generating shapefile AL.shp for country Albania (AL) containing 66 features. Generating shapefile DZ.shp for country Algeria (DZ) containing 361 features. ... Generating shapefile ZM.shp for country Zambia (ZM) containing 1575 features. Generating shapefile ZW.shp for country Zimbabwe (ZW) containing 179 features.
Tip
If you get the output as
ERROR: function getsrid(geometry) does not exist LINE 1: SELECT getsrid("the_geom") FROM (SELECT,...
, you will need to load the legacy support in PostGIS, for example, in a Debian Linux box:psql -d postgis_cookbook -f /usr/share/postgresql/9.1/contrib/postgis-2.1/legacy.sql
If you are using Windows, create a batch file named
export_shapefiles.bat
, that iterates each record (country) in thehs_countries.csv
file and generates a shapefile with the corresponding hotspots exported from PostGIS for that country:@echo off for /f "tokens=1-3 delims=, skip=1" %%a in (hs_countries.csv) do ( echo "Generating shapefile %%b.shp for country %%a (%%b) containing %%c features" ogr2ogr out_shapefiles/%%b.shp PG:"dbname='postgis_cookbook' user='me' password='mypassword'" -lco SCHEMA=chp01 -sql "SELECT ST_Transform(hs.the_geom, 4326), hs.acq_date, hs.acq_time, hs.bright_t31 FROM chp01.hotspots as hs JOIN chp01.countries as c ONST_Contains(c.the_geom, ST_Transform(hs.the_geom, 4326)) WHERE c.iso2 = '%%b'" )
Run the batch file after creating an output directory (
out_shapefiles
) for the shapefiles that will be generated by the script:>mkdir out_shapefiles >export_shapefiles.bat "Generating shapefile AL.shp for country Albania (AL) containing 66 features" "Generating shapefile DZ.shp for country Algeria (DZ) containing 361 features" … "Generating shapefile ZW.shp for country Zimbabwe (ZW) containing 179 features"
Try to open a couple of these output shapefiles in your favorite Desktop GIS. The following screenshot shows you how they look in QGIS:
Now, you will do the round trip, uploading all of the generated shapefiles to PostGIS. You will upload all of the features for each shapefile and include the upload datetime and the original shapefile name. First, create the following PostgreSQL table, where you will upload the shapefiles:
postgis_cookbook=# CREATE TABLE chp01.hs_uploaded ( ogc_fid serial NOT NULL, acq_date character varying(80), acq_time character varying(80), bright_t31 character varying(80), iso2 character varying, upload_datetime character varying, shapefile character varying, the_geom geometry(POINT, 4326), CONSTRAINT hs_uploaded_pk PRIMARY KEY (ogc_fid) );
If you are using Windows, go to step 11. With Linux, create another bash script named
import_shapefiles.sh
:#!/bin/bash for f in `find out_shapefiles -name \*.shp -printf "%f\n"` do echo "Importing shapefile $f to chp01.hs_uploaded PostGIStable..." #, ${f%.*}" ogr2ogr -append -update -f PostgreSQLPG:"dbname='postgis_cookbook' user='me'password='mypassword'" out_shapefiles/$f -nlnchp01.hs_uploaded -sql "SELECT acq_date, acq_time,bright_t31, '${f%.*}' AS iso2, '`date`' AS upload_datetime,'out_shapefiles/$f' as shapefile FROM ${f%.*}" done
Assign the execution permission to the bash script and execute it:
$ chmod 775 import_shapefiles.sh $ ./import_shapefiles.sh Importing shapefile DO.shp to chp01.hs_uploaded PostGIS table... Importing shapefile ID.shp to chp01.hs_uploaded PostGIS table... Importing shapefile AR.shp to chp01.hs_uploaded PostGIS table... ...
If you are using Windows, create a batch script named
import_shapefiles.bat
:@echo off for %%I in (out_shapefiles\*.shp) do ( echo Importing shapefile %%~nxI to chp01.hs_uploadedPostGIS table... ogr2ogr -append -update -f PostgreSQLPG:"dbname='postgis_cookbook' user='me'password='mypassword'" out_shapefiles/%%~nxI -nlnchp01.hs_uploaded -sql "SELECT acq_date, acq_time,bright_t31, '%%~nI' AS iso2, '%date%' AS upload_datetime,'out_shapefiles/%%~nxI' as shapefile FROM %%~nI" )
Run the batch script:
>import_shapefiles.bat Importing shapefile AL.shp to chp01.hs_uploaded PostGIS table... Importing shapefile AO.shp to chp01.hs_uploaded PostGIS table... Importing shapefile AR.shp to chp01.hs_uploaded PostGIS table... ...
Check some of the records that have been uploaded to the PostGIS table by using SQL:
postgis_cookbook=# SELECT upload_datetime, shapefile, ST_AsText(the_geom) FROM chp01.hs_uploaded WHERE ISO2='AT'; upload_datetime | shapefile | st_astext-------------------------------+-----------------------+----------------------Sun Aug 26 01:58:44 CEST 2012 | out_shapefiles/AT.shp | POINT(14.333 48.279)Sun Aug 26 01:58:44 CEST 2012 | out_shapefiles/AT.shp | POINT(14.347 48.277)Sun Aug 26 01:58:44 CEST 2012 | out_shapefiles/AT.shp | POINT(14.327 48.277)... (8 rows)
Check the same query with
ogrinfo
as well:$ ogrinfo PG:"dbname='postgis_cookbook' user='me' password='mypassword'" chp01.hs_uploaded -where "iso2='AT'" INFO: Open of `PG:dbname='postgis_cookbook' user='me' password='mypassword''using driver `PostgreSQL' successful.Layer name: chp01.hs_uploaded Geometry: Point Feature Count: 8 Extent: (-155.284000, -40.751000) - (177.457000, 70.404000) Layer SRS WKT: GEOGCS["WGS 84", ... FID Column = ogc_fid Geometry Column = the_geom acq_date: String (80.0) acq_time: String (80.0) bright_t31: String (80.0) iso2: String (0.0) upload_datetime: String (0.0) shapefile: String (0.0) OGRFeature(chp01.hs_uploaded):6413acq_date (String) = 2012-08-20 acq_time (String) = 0110 bright_t31 (String) = 292.7 iso2 (String) = AT upload_datetime (String) = Sun Aug 26 01:58:44 CEST 2012 shapefile (String) = out_shapefiles/AT.shp POINT (14.333 48.279) ...
How it works...
You could implement both the data flows (processing shapefiles out from PostGIS, and then into it again) thanks to the power of the ogr2ogr
GDAL command.
You have been using this command in different forms and with the most important input parameters in other recipes, so you should now have a good understanding of it.
Here, it is worth mentioning the way OGR lets you export the information related to the current datetime and the original shapefile name to the PostGIS table. Inside the import_shapefiles.sh
(Linux, OS X) or the import_shapefiles.bat
(Windows) scripts, the core is the line with the ogr2ogr
command (here is the Linux version):
ogr2ogr -append -update -f PostgreSQL PG:"dbname='postgis_cookbook' user='me' password='mypassword'" out_shapefiles/$f -nln chp01.hs_uploaded -sql "SELECT acq_date, acq_time, bright_t31, '${f%.*}' AS iso2, '`date`' AS upload_datetime, 'out_shapefiles/$f' as shapefile FROM ${f%.*}"
Thanks to the -sql
option, you can specify the two additional fields—getting their values from the system date command and the filename that is being iterated from the script.