There are a couple of alternative approaches to importing a Comma Separated Values (CSV) file, which stores attributes and geometries in PostGIS. In this recipe, we will use the approach of importing such a file using the PostgreSQL COPY command and a couple of PostGIS functions.
Importing nonspatial tabular data (CSV) using PostGIS functions
Getting ready
We will import the firenews.csv file that stores a series of web news collected from various RSS feeds related to forest fires in Europe in the context of the European Forest Fire Information System (EFFIS), available at http://effis.jrc.ec.europa.eu/.
For each news feed, there are attributes such as place name, size of the fire in hectares, URL, and so on. Most importantly, there are the x and y fields that give the position of the geolocalized news in decimal degrees (in the WGS 84 spatial reference system, SRID = 4326).
For Windows machines, it is necessary to install OSGeo4W, a set of open source geographical libraries that will allow the manipulation of the datasets. The link is: https://trac.osgeo.org/osgeo4w/
In addition, include the OSGeo4W and the Postgres binary folders in the Path environment variable to be able to execute the commands from any location in your PC.
How to do it...
The steps you need to follow to complete this recipe are as shown:
- Inspect the structure of the CSV file, firenews.csv, which you can find within the book dataset (if you are on Windows, open the CSV file with an editor such as Notepad).
$ cd ~/postgis_cookbook/data/chp01/
$ head -n 5 firenews.csv
The output of the preceding command is as shown:
- Connect to PostgreSQL, create the chp01 SCHEMA, and create the following table:
$ psql -U me -d postgis_cookbook postgis_cookbook=> CREATE EXTENSION postgis; postgis_cookbook=> CREATE SCHEMA chp01; postgis_cookbook=> CREATE TABLE chp01.firenews ( x float8, y float8, place varchar(100), size float8, update date, startdate date, enddate date, title varchar(255), url varchar(255), the_geom geometry(POINT, 4326) );
Using the psql client, we will not show the host and port options as we will assume that you are using a local PostgreSQL installation on the standard port.
If that is not the case, please provide those options!
- Copy the records from the CSV file to the PostgreSQL table using the COPY command (if you are on Windows, use an input directory such as c:\temp instead of /tmp) as follows:
postgis_cookbook=> COPY chp01.firenews (
x, y, place, size, update, startdate,
enddate, title, url
) FROM '/tmp/firenews.csv' WITH CSV HEADER;
If you are on Windows, you most likely will need to set the encoding to UTF-8 before copying: postgis_cookbook=# set client_encoding to 'UTF-8'; and remember to set the full path, 'c:\\tmp\firenews.csv'.
- Check all of the records have been imported from the CSV file to the PostgreSQL table:
postgis_cookbook=> SELECT COUNT(*) FROM chp01.firenews;
The output of the preceding command is as follows:
- Check a record related to this new table is in the PostGIS geometry_columns metadata view:
postgis_cookbook=# SELECT f_table_name,
f_geometry_column, coord_dimension, srid, type
FROM geometry_columns where f_table_name = 'firenews';
The output of the preceding command is as follows:
postgis_cookbook=> CREATE TABLE chp01.firenews(
x float8,
y float8,
place varchar(100),
size float8,
update date,
startdate date,
enddate date,
title varchar(255),
url varchar(255))
WITHOUT OIDS;postgis_cookbook=> SELECT AddGeometryColumn('chp01', 'firenews', 'the_geom', 4326, 'POINT', 2);
chp01.firenews.the_geom SRID:4326 TYPE:POINT DIMS:2
In PostGIS 2.0, you can still use the AddGeometryColumn function if you wish; however, you need to set its use_typmod parameter to false.
- Now, import the points in the geometric column using the ST_MakePoint or ST_PointFromText functions (use one of the following two update commands):
postgis_cookbook=> UPDATE chp01.firenews
SET the_geom = ST_SetSRID(ST_MakePoint(x,y), 4326); postgis_cookbook=> UPDATE chp01.firenews
SET the_geom = ST_PointFromText('POINT(' || x || ' ' || y || ')',
4326);
- Check how the geometry field has been updated in some records from the table:
postgis_cookbook=# SELECT place, ST_AsText(the_geom) AS wkt_geom
FROM chp01.firenews ORDER BY place LIMIT 5;
The output of the preceding comment is as follows:
- Finally, create a spatial index for the geometric column of the table:
postgis_cookbook=> CREATE INDEX idx_firenews_geom
ON chp01.firenews USING GIST (the_geom);
How it works...
This recipe showed you how to load nonspatial tabular data (in CSV format) in PostGIS using the COPY PostgreSQL command.
After creating the table and copying the CSV file rows to the PostgreSQL table, you updated the geometric column using one of the geometry constructor functions that PostGIS provides (ST_MakePoint and ST_PointFromText for bi-dimensional points).
These geometry constructors (in this case, ST_MakePoint and ST_PointFromText) must always provide the spatial reference system identifier (SRID) together with the point coordinates to define the point geometry.
Each geometric field added in any table in the database is tracked with a record in the geometry_columns PostGIS metadata view. In the previous PostGIS version (< 2.0), the geometry_fields view was a table and needed to be manually updated, possibly with the convenient AddGeometryColumn function.
For the same reason, to maintain the updated geometry_columns view when dropping a geometry column or removing a spatial table in the previous PostGIS versions, there were the DropGeometryColumn and DropGeometryTable functions. With PostGIS 2.0 and newer, you don't need to use these functions any more, but you can safely remove the column or the table with the standard ALTER TABLE, DROP COLUMN, and DROP TABLE SQL commands.
In the last step of the recipe, you have created a spatial index on the table to improve performance. Please be aware that as in the case of alphanumerical database fields, indexes improve performances only when reading data using the SELECT command. In this case, you are making a number of updates on the table (INSERT, UPDATE, and DELETE); depending on the scenario, it could be less time consuming to drop and recreate the index after the updates.