Blog.openshift.com

Get More Spatial in Your Database with PostGIS 2.0 on OpenShift

2013-09-15

With the latest release of OpenShift Online (and in the nightlies for OpenShift Origin), you can now use PostGIS 2.0 with PostgreSQL. PostGIS is a spatial extension to PostgreSQL that allows users to carry out high end spatial operations in their database. For those of us who love using spatial data, this upgrade is as significant as our recent PostgreSQL 9.2 upgrade. With this release, there have been some very important features added to PostGIS as well as a lot of performance improvements. In this post I will cover how to enable PostGIS 2.0, migrate your data from PostGIS 1.5, and then show some of the great new features.

Enabling PostGIS in your Database

The latest version of PostGIS now takes advantage of the Postgresql extensions. This means enabling PostGIS on your Postgresql database is as simple as:

Open the SQL prompt to your database. On OpenShift this just means SSH into your applcation and then:

or if you manually created another database:

At the SQL prompt, type the following command and expect the following response (capitalization does not matter):

And that's it - you now have a PostGIS enabled database! No more loading SQL scripts and creating languages, getting it going couldn't be easier. One of the other huge benefits of the extension mechanism is that PostgreSQL no longer dumps all the data type and function definitions for PostGIS so migrating versions and between databases is much easier.

Migrating Data from PostGIS 1.5 (and PostgreSQL 8.4)

Given the amount of change between 1.5 and 2.0 on PostGIS, you can not do a soft upgrade. Instead, you need to do what is called a "Hard upgrade". The procedure on OpenShift is very similar to the steps in the link.

Here is how I carried them out on my system:

New Functionality

The 2.0 release was a huge release in terms of the functionality it brought. Go ahead and take a look at the list but prepare for your jaw to drop. If you want some more flavor on all those changes Paul Ramsey, one of the lead devs on PostGIS, gave a great set of slides explaining a lot of the new features.
For the things you used before the take home message is Most Things Stay the Same. There are very few breaking changes in terms of the SQL most developers used to use with PostGIS. What has changed are the new and highly requested features.

New Features

From my perspective, the biggest features are Rasters, Topologies, and 3D indexing with the new 3D types like TINs and polyhedral surfaces. Unfortunately, I find the documentation on topologies and 3D surfaces lacking and there also seems to be a lack of client applications that can take advantage of them right now. So I am going to spend the rest of this blog post introducing rasters in PostGIS.

Rasters

What are Rasters:

You are probably most familiar with raster images from dealing with digital photographs. A raster spatial data set takes a section of the world and divides it into cells and each cell gets a value. Rasters can be layered together to contain more information. For example, the digital photo you take from your phone has 3 layers of information - reflectance in the red, green, and blue bands of the electromagnetic spectrum. With raster layers in a spatial system, the values in the cell can represent electromagnetic reflectance (like in a satellite image or an aerial photograph) but it also could represent the average elevation of the cell or perhaps the total number of people living in the cell. The very common heatmap visualizations, such as crime mapping, is another common example of rasters.

Rasters are usually contrasted with vector coverages, which model the world as a series of points, lines, and polygons. The benefits of rasters is that they are efficient to use for computations, fairly easy to find software to visualize, and a very simple data structure. The drawback is that a world with non-square and regularly placed objects will end up being averaged and data sizes can be quite large compared to a vector coverage (especially for sparse data).

Here is a comparison of how the spatial and non-spatial information is handled and visualized for both data types:

Where I got my Examples:

Given the area we were focused on from the previos blog posts, I knew I could obtain imagery from the Cal-Atlas site. I obtained some Landsat, specifically [Landsat-7](, satellite imagery from the following folder on the site. Landsat captures multiple bands of the visual spectrum (way too much to explain here but here is a link to show you a visualization of how bands map to spectral information). I chose bands 4, 3, and 2 - which are the best bands for making a false-color infrared imagery. These bands do the best for creating images of vegetation extent and health as compared to our normal human vision (we can't see infrared).

Here is what vegetation looks like when you put Infrared in red, red in green, and green in blue. Notice the healthy vegetation is red, the dried grass is brown, and asphalt and human structures have a blue-ish tint to them.

Another popular source of raster images for background display is a USGS Digital Raster Graphic. These images are scans of the topo maps produced by the United States Geological Survey and are now also served up through a new site. I grabbed the one for Castle Rock Ridge, CA so I could add it to our database as well.

All of these TIFF images have their spatial information embedded in the header - they are called GeoTIFFs. What this means for us is that GDAL can take advantage of the information in the header and either use it in it's processing or return information back to us.

For example if I run the following command on one of my Landsat images:

This gives us a lot of information from the GeoTIFF header of the file. One of the most important is the projection information. This one is a bit hard to decipher because they don't specify the EPSG code for the overall file, instead they just give it for the datums. The California Dept. of Resources specifies an equal area projection that covers the whole state and it is used often enough it actually has an EPSG number of 3310. We need this number later when we import the raster into PostGIS.

A List of Some Cool Analysis you can do

A heat map of race from the US census data

Place a line of your run over an elevation grid and show the change in elevation you covered

Place a point on an elevation grid and analyze what is visible from that point

Take a bunch of different input rasters containing different selection criteria and then highlight the best locations given your selection criteria

Given rasters of fuel load, elevation, and soil moisture using wind speed, you can calculate the expected spread of a forest fire

Given soil types and elevation, calculate the spread of an oil spill on land

Split the world into 1 degree cells and then visualize the GPD or Carbon emission/retention for each cell on earth

Send up a kite with a digital camera attached, reference it to points on the ground, and then digitize or do classifcation analysis on the image.

As you can see, there are many interesting applications to raster - here is great slide deck by Juniper GIS with an intro to spatial rasters and then a lot of different analysis and modeling you can do with rasters.

How you use it

I have made the rasters I downloaded CASIL available as a [zip file], so if you want to follow along please download them now.

If you don't have PostGIS installed on your local machine, you will need to scp or sftp the files up to your $OPENSHIFT_DATA_DIR (~/app-root/data). Then you will need to ssh into your application that has PostgreSQL and PostGIS enabled and cd into the data dir. From there all the commands will be the same as running them on your local machine.

So our first step is to convert our raster files into SQL so we can import them in to PostGIS.

Here we are choosing to convert the raster into SQL that will be stored in the database. PostGIS also has the option to "store" a raster as an out-of-db raster where the DB keeps the raster on disk and just stores the path.

The tiling size "-t" is how the database choses to "chunk" the raster when it stores it in the database. If you are going to do analysis on small areas relative to the overall size of the raster, you should make your tiles small so as to speed up analysis. In other words, when the DB goes to fetch the relevant piece of the raster to analyze it, it can fetch a small tile and pull it into memory rather than the whole entire raster.

If you are going to do analysis on the whole raster (like we will do with the Landsat image) then you should not tile your image at all. In my testing I found that there was a null boundary created around each tile when you run moving window analysis. This is why there will be no -t flag for importing the Landsat image

If you have the sql files, you can load them into PostGIS with the following commands

If you are executing these commands right on your gear, you can replace everything after the > and instead pipe it straight into the database

This is a great blog post talking about how to use PostGIS raster and look at examples in QGIS.

Example Calls

We are going to take the landsat image infrared band, run a filter over it to smooth out some of the noise in the pixels (probably a moving average filter with a small averaging window), and then reclass anything with a pixel value below 20 as water. As noted here water tends to aborb infrared radiation. If you really wanted to do a better job detecting water you would probably look at the ratio of blue to infrared since since water tends to reflect in blue and vegetation tends to absorb in blue. Today I will leave that as an exercise to the reader.

So first, let see what the infrared band looks like in postgreSQL. I loaded the PostGIS raster driver into QGIS and took a screen shot of the entire Landsat image of the "Bay Area".

You can see the major water features quite clearly but you can also see the brighter reflectence from the crops in the Central Valley and down in the valley by Monterey and Watsonville.

Now let's zoom into San Francisco:

Here you can see the brighter spots from the healthy vegetation reflectance of parks and golf courses in the north as compared to the dark absorption of the water and the large buildings in the Financial District and SOMA. You can also see how much variation there is in pixels intensity over small areas. We are looking at pixel values ranging from 8 out in the deep water to 255 for the really bright spots. If you wanted to classify this into water and non-water you would want to get rid of some of the pixel variation using a smoothing filter.

Here is the function call to produce the smoothing filter:

We are basically passing a 3x3 filter over every raster cell. When we are centered on a cell we take neighbors one cell up/down, one cell left/right, and one cell diagonally and average them along with the center cell. Postgis then replaces the value of the center cell with the average value. This will produce a smoothing (removing single wildly different pixels) on our image.

Now when we look at San Francisco it looks like this:

Finally, we do a reclassification of the raster image where we say any pixel with a value less than or equal to 20 is water ( with a value of 1) and everything else is not and so we give it a 0 cell. Since we are also reclassifying the image into binary values, rather than using 8 bits of information (like we do with 8BUI - which stands for 8 bits unsigned), we will just use 1 bit of information by using 1BB.

Here is the code to carry out that operation:

And if I tell QGIS to do 0 as white and 1 as blue here is the image we end up with:

Rather than spending our time trying to digitize in all the boundaries with a vector coverage or picking each pixel out individually we were able to use some simple techniques to quickly figure out the water areas in the image. If we zoom out to the whole bay area we can see that we would need to do something about cloud cover.

The cloud cover ruins our ability to pick up water in the ocean with just the infrared band.

In the future we could use some of the same techniques to try and find the forested and urban areas. We could then use this to create a land cover map for the entire bay area.

These uses of remote sensing are just one of the ways you can use rasters. Don't forget you can also take vector coverages and use those to make rasters or even interpolate from point data into a raster coverage. Two of the most popular uses for rasters are elevation models (which also end up determining slope and elevations) and also interpolating a raster based on sampled values in the field.

What's Next?

I hope you are excited as I am about the new PostGIS capabilities we now have available for you in OpenShift. With the release of 2.0 there have been a whole host of new functionalities introduced for your spatial application. Please be aware that for most of these new features this is a .0 release meaning they are in the product but they may not be up to production level. There is a new PostGIS 2.1 release that came out while we were getting ready to roll out this release. Here is the bugzilla item to track the building of the new binaries for OpenShift.

So you should be all set, between this post and the previous posts for PostGIS on OpenShift, to start building some cool spatial applications. The time has come to flex your coding skills and start showing us the great things you can build. Once you build it, drop us a note or send us a tweet - I would love to see it!

Sign up for OpenShift Online

Get your own private Platform As a Service (PaaS)
by evaluating OpenShift Enterprise

Need Help? Ask the OpenShift Community your questions in the forums

Showcase your awesome app in the OpenShift Developer Spotlight.
Get in the OpenShift Application Gallery today.