Importing nonspatial tabular data (CSV) using PostGIS functions
There are a couple of alternative approaches to import a Comma Separated Values (CSV) file, which stores attributes and geometries in PostGIS. In this recipe, we will use the approach of importing such a file using the PostgreSQL COPY
command and a couple of PostGIS functions.
Getting ready
We will import the firenews.csv
file that stores a series of web news collected from the various RSS feeds related to forest fires in Europe in the context of the European Forest Fire Information System (EFFIS
), available at http://effis.jrc.ec.europa.eu/.
For each news feed, there are attributes like place name
, size
of the fire in hectares, URL
, and so on. Most importantly, there are the x
and y
fields that give the position of the geolocalized news in decimal degrees (in the WGS 84 spatial reference system, SRID = 4326).
How to do it...
The steps you need to follow to complete this recipe are as shown:
Inspect the structure of the CSV file,
firenews.csv
, which you can find within the book dataset (if you are on Windows, open the CSV file with an editor such as Notepad).Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
$ cd ~/postgis_cookbook/data/chp01/ $ head -n 5 firenews.csv
The output of the preceding command is as shown:
x,y,place,size,update,startdate,enddate,title,url-8.2499,42.37657,Avión,52,2011/03/07,2011/03/05,2011/03/06,Dos incendios calcinan 74 hectáreas el fin de semana,http://www.laregion.es/noticia/145578/incendios/calcinan/hectareas/semana/ -8.1013,42.13924,Quintela de Leirado,22,2011/03/07,2011/03/06,2011/03/06,Dos incendios calcinan 74 hectáreas el fin de semana,http://www.laregion.es/noticia/145578/incendios/calcinan/hectareas/semana/ 3.48159,43.99156,Arrigas,4,2011/03/06,2011/03/05,2011/03/05,"À Arrigas, la forêt sous la menace d'un feu",http://www.midilibre.com/articles/2011/03/06/NIMES-A-Arrigas-la-foret-sous-la-menace-d-39-un-feu-1557923.php5 6.1672,44.96038,Vénéon,9,2011/03/06,2011/03/06,2011/03/06,Isère Spectaculaire incendie dans la vallée du Vénéon,http://www.ledauphine.com/isere-sud/2011/03/06/isere-spectaculaire-incendie-dans-la-vallee-du-veneon
Connect to PostgreSQL and create the following table:
$ psql -U me -d postgis_cookbook postgis_cookbook=> CREATE TABLE chp01.firenews ( x float8, y float8, place varchar(100), size float8, update date, startdate date, enddate date, title varchar(255), url varchar(255), the_geom geometry(POINT, 4326) );
Note
We are using the
psql
client for connecting to PostgreSQL, but you can use your favorite one, for example,pgAdmin
.Using the
psql
client, we will not show the host and port options as we will assume that you are using a local PostgreSQL installation on the standard port.If that is not the case, please provide those options!
Copy the records from the CSV file to the PostgreSQL table using the
COPY
command (if you are on Windows, use an input directory such asc:\temp
instead of/tmp
) as follows:postgis_cookbook=> COPY chp01.firenews (x, y, place, size, update, startdate, enddate, title, url) FROM '/tmp/firenews.csv' WITH CSV HEADER;
Tip
Make sure that the
firenews.csv
file is in a location accessible from the PostgreSQL process user. For example, in Linux, copy the file to the/tmp
directory.If you are on Windows, you most likely will need to set the encoding to
UTF-8
before copying:postgis_cookbook=# set client_encoding to 'UTF-8';
Check if all of the records have been imported from the CSV file to the PostgreSQL table:
postgis_cookbook=> SELECT COUNT(*) FROM chp01.firenews;
The output of the preceding command is as follows:
count ------- 3006 (1 row)
Check if a record related to this new table is in the PostGIS
geometry_columns
metadata view:postgis_cookbook=# SELECT f_table_name, f_geometry_column, coord_dimension, srid, type FROM geometry_columns where f_table_name = 'firenews';
f_table_name | f_geometry_column | coord_dimension | srid | type --------------+-------------------+-----------------+-------+------- firenews | the_geom | 2 | 4326 | POINT (1 row)
Tip
Before PostGIS 2.0, you had to create a table containing spatial data in two distinct steps; in fact, the
geometry_columns
view was a table that needed to be manually updated. For that purpose, you had to use theAddGeometryColumn
function to create the column. For example, for this recipe:postgis_cookbook=> CREATE TABLE chp01.firenews ( x float8, y float8, place varchar(100), size float8, update date, startdate date, enddate date, title varchar(255), url varchar(255) ) WITHOUT OIDS; postgis_cookbook=> SELECT AddGeometryColumn('chp01', 'firenews', 'the_geom', 4326, 'POINT', 2); chp01.firenews.the_geom SRID:4326 TYPE:POINT DIMS:2
Tip
In PostGIS 2.0, you can still use the
AddGeometryColumn
function if you wish; however, you need to set itsuse_typmod
parameter tofalse
.Now, import the points in the geometric column using the
ST_MakePoint
orST_PointFromText
functions (use one of the following two update commands):postgis_cookbook=> UPDATE chp01.firenews SET the_geom = ST_SetSRID(ST_MakePoint(x,y), 4326); postgis_cookbook=> UPDATE chp01.firenews SET the_geom = ST_PointFromText('POINT(' || x || ' ' || y || ')', 4326);
Check how the geometry field has been updated in some records from the table:
postgis_cookbook=# SELECT place, ST_AsText(the_geom) AS wkt_geom FROM chp01.firenews ORDER BY place LIMIT 5;
The output of the preceding comment is as follows:
place | wkt ---------------------------------------------------------- Abbaslık | POINT(29.95... Abeledos, Montederramo | POINT(-7.48... Abreiro | POINT(-7.28... Abrunheira, Montemor-o-Velho | POINT(-8.72... Achaia | POINT(21.89... (5 rows)
Finally, create a spatial index for the geometric column of the table:
postgis_cookbook=> CREATE INDEX idx_firenews_geom ON chp01.firenews USING GIST (the_geom);
How it works...
This recipe showed you how to load nonspatial tabular data (in CSV format) in PostGIS using the COPY
PostgreSQL command.
After creating the table and copying the CSV file rows to the PostgreSQL table, you updated the geometric column using one of the geometry constructor functions that PostGIS provides (ST_MakePoint
and ST_PointFromText
for bi-dimensional points).
These geometry constructors (in this case, ST_MakePoint
and ST_PointFromText
) must always provide the spatial reference system identifier (SRID) together with the point coordinates to define the point geometry.
Each geometric field added in any table in the database is tracked with a record in the geometry_columns
PostGIS metadata view. In the previous PostGIS version (< 2.0), the geometry_fields
view was a table and needed to be manually updated, possibly with the convenient AddGeometryColumn
function.
For the same reason, to maintain the updated geometry_columns
view, when dropping a geometry column or removing a spatial table in the previous PostGIS versions, there were the DropGeometryColumn
and DropGeometryTable
functions. With PostGIS 2.0, you don't need to use these functions any more, but you can safely remove the column or the table with the standard ALTER TABLE DROP COLUMN
and DROP TABLE
SQL commands.
In the last step of the recipe, you have created a spatial index on the table to improve performances. Please be aware that as in the case of alphanumerical database fields, indexes improve performances only when reading data using the SELECT
command. In this case, you are making a number of updates on the table (INSERT
, UPDATE
, and DELETE
); depending on the scenario, it could be less time consuming to drop and recreate the index after the updates.