Working with postcodes and spatial data in MySQL

Recently, I have been working on a commercial project involving UK postcode data. Since Ordnance Survey launched their OpenData initiative, they have made their postcode database available for free, which is great news for developers. As I work primarily with MySQL, I have been trying to find the most efficient ways to store and access data.

I knew that Postgres has GIS data types through its PostGIS extension, but never knew that MySQL had similar extensions. While a table could easily store two DECIMAL or FLOAT types, there is a data type that is very useful for storing Latitude and Longitude coordinates: the POINT type. Combined with a SPATIAL index, MySQL is able to perform fast mathematical calculations with coordinates.

SPOILER: If you want to store postcode data in MySQL, you need to learn about the POINT datatype. Read on to find see how it can be used…

Below is an example of how to find the nearest postcodes from any given location. I have hacked together a basic tutorial to show the 10 nearest postcodes when you click on a point on a map. It only took me an hour or so, but you may find the approach interesting and the results useful.

First, I downloaded and imported the Code-Point Open data set. The download contains a collection of CSV files which have the position of each postcode in eastings and northings, which use the OSGB36 grid scheme. The rest of world seems to use WGS84 format, including all of the mapping JavaScript libraries, including Google Maps and LeafletJS. Therefore a couple of conversions are needed. There are plenty of snippets of code available, but the ones belonging to Andrew Brampton and Jonathan Stott did the trick. Thanks guys :)

To store the data in MySQL, we need a table. I’m going to use a basic table with the postcode name as its primary key and a POINT data type to store the latitude and longitude. I’ve added a SPATIAL index on the Point field which will be explained later.

CREATE TABLE `Postcodes` {
  `Name` CHAR(8) NOT NULL DEFAULT "",
  `Point` POINT NOT NULL,
  PRIMARY KEY (`Name`),
  SPATIAL KEY `Point` (`Point`)
) ENGINE=MyISAM DEFAULT CHARSET=ascii;

Running the import script loads the data in the database. There are nearly 1,700,000 postcodes in the database so this took several minutes and consumed 70Mb of disk storage.

In order to perform a search on this table, we need an efficient way of searching the data. MySQL offers a number of geometry functions that we could use, but we are in dangerous territory of writing an inefficient query. An obvious choice is to use GLength(), which calculates the distance between two points. The down side is that it’s impossible to make use of any indexes, as we would have to calculate a value for each postcode stored in the database. Searching against 1,700,000 entries would take far too long to return a value. In order to reduce the collation down, we can restrict the search with a bounding box. Our solution is going to present map that you click on, so we can get the current bounding box that the map covers. As we can get the minimum and maximum coordinates from the current map, we can use this to filter the database table down using MBRWithin() before we start performing GLength() operations. This will use the SPATIAL index and speed up our query significantly. An explanation of how the spatial index works in this scenario can be found in the MySQL documentation.

Here’s my query:

SELECT
  Name,
  X(Point) Latitude,
  Y(Point) Longitude
FROM
  `PostcodePoint`
WHERE
  MBRWITHIN(Point, ENVELOPE(GEOMFROMTEXT('LINESTRING(minLat minLong,maxLat maxLong)')))
ORDER BY
  GLENGTH(GEOMFROMTEXT(CONCAT('LINESTRING(originLat originLong,',X(Point),' ', Y(Point),')')))
LIMIT 10

We still need to apply a limit to the query, otherwise we may have hundreds, if not thousands, of markers showing on our map which would not only be impractical but also slow to render!

Next, we need a way of getting the data out of the database and served to our page. The obvious choice is a simple PHP script that presents the data in JSON that can easily be picked up by an Ajax call from the browser. The heart of this script is our query from above.

<?php

$minLat = $_GET['minLat'];
$minLong = $_GET['minLong'];
$maxLat = $_GET['maxLat'];
$maxLong = $_GET['maxLong'];
$originLat = $_GET['originLat'];
$originLong = $_GET['originLong'];

$success = true;
$error = '';
$points = array();

// Validate the input parameters

if (!isset($minLat) || !is_numeric($minLat)) {
	$error .= 'minLat was not provided or was invalid. ';
	$success = false;
}

if (!isset($minLong) || !is_numeric($minLong)) {
	$error .= 'minLong was not provided or was invalid. ';
	$success = false;
}

if (!isset($maxLat) || !is_numeric($maxLat)) {
	$error .= 'maxLat was not provided or was invalid. ';
	$success = false;
}

if (!isset($maxLong) || !is_numeric($maxLong)) {
	$error .= 'maxLong was not provided or was invalid. ';
	$success = false;
}

if (!isset($originLat) || !is_numeric($originLat)) {
	$error .= 'originLat was not provided or was invalid. ';
	$success = false;
}

if (!isset($originLong) || !is_numeric($originLong)) {
	$error .= 'originLong was not provided or was invalid. ';
	$success = false;
}

// Ensure our bounding box isn't too big
$size = sqrt(pow($maxLat - $minLat, 2) + pow($maxLong - $minLong,2));
if($size &gt; 1) {
	$success = false;
	$error = 'The bounding box is too large, please try a smaller area. ';
}

if ($success) {
	$pdo = new PDO('mysql:host=127.0.0.1;dbname=Postcode', 'user', 'password');

	$stmt = $pdo->prepare(
	&quot;SELECT
		Name,
		X(Point) Latitude,
		Y(Point) Longitude
	FROM
		`PostcodePoint`
	WHERE
		MBRWITHIN(Point, ENVELOPE(GEOMFROMTEXT('LINESTRING(" . $minLat . " "
							. $minLong . ", ". $maxLat . " " . $maxLong
							. &quot;)')))
	ORDER BY
		GLENGTH(GEOMFROMTEXT(CONCAT('LINESTRING(" . $originLat . " "
							. $originLong
							. ",' ,X(Point),' ',Y(Point),')')))
	LIMIT 10
	");

	if (!$stmt->execute(array($postcode))) {
		$error = 'Database failed with error code: '.$stmt->errorCode();
		$success = false;
	} else {
		while ($data = $stmt->fetch(PDO::FETCH_ASSOC)) {
			$points[$data['Name']] = $data;
		}
	}
}
$response = array('success' => $success, 'points' => $points, 'error' => $error);
header('Content-Type: application/json');
echo json_encode($response);

Finally, we need a page to make the call from. I’m using LeafletJS, which has a very simple API and very flexible.

<html>
<head>
<link rel="stylesheet" href="http://cdn.leafletjs.com/leaflet-0.4.5/leaflet.css" />
<!--[if lte IE 8]>
<link rel="stylesheet" href="http://cdn.leafletjs.com/leaflet-0.4.5/leaflet.ie.css" />
<![endif]-->;
<script src="http://cdn.leafletjs.com/leaflet-0.4.5/leaflet.js"></script>;
<style type="text/css">
#map {
	height: 100%;
	width: 100%
}
</style>
<script>
	var map, markerLayer;

	function findPoint(e) {
		// Prevent the query from scanning too wide an area
		if(map.getZoom() &amp;amp;amp;amp;lt; 14) {
			var popup = L.popup()
		    .setLatLng(e.latlng)
		    .setContent('Please zoom before searching for postcodes')
		    .openOn(map);
			return;
		}
		
		// Find the boundary of the map (to minimise our db search)
		bounds = map.getBounds();

		// Find the clicked position
		pos = e.latlng;

		// Make our call
		jQuery.ajax('http://www.tec20.co.uk/tutorials/nearestPostcodes/nearestPostcodesJson.php', {
			data:{
				minLat: bounds._northEast.lat,
				minLong: bounds._northEast.lng,
				maxLat: bounds._southWest.lat,
				maxLong: bounds._southWest.lng,
				originLat: pos.lat,
				originLong: pos.lng
			},
			dataType:'json'
		}).done(function(data) {
			if(data.success) {
				// If we already have a marker layer, remove it
				if(!!markerLayer)
					map.removeLayer(markerLayer);
	
				// Create a new marker layer
				markerLayer = new L.LayerGroup();
				for(i in data.points) {
					// Add markers for each of the points returned
					new L.marker([data.points[i].Latitude, data.points[i].Longitude])
						.addTo(markerLayer)
						.bindPopup(data.points[i].Name);
				}
	
				// Add the point where we clicked on the map
				new L.marker([e.latlng.lat, e.latlng.lng])
					.addTo(markerLayer)
					.bindPopup('Latitude: '+e.latlng.lat+'&lt;br /&gt;Longitude: '+e.latlng.lng);
	
				// Add the layer to the map
				markerLayer.addTo(map);
			} else {
				// The JSON request failed
				var popup = L.popup()
			    .setLatLng(e.latlng)
			    .setContent(data.error)
			    .openOn(map);
			}
		});
	}

	function initializeMap() {
		// Create a map object (roughly positioned over Birmingham)
		map = L.map('map').setView([52.479,-1.896], 16);

		// Select the tiles
		L.tileLayer('http://{s}.tile.osm.org/{z}/{x}/{y}.png', {
		    attribution: 'Map data &amp;copy; &amp;amp;amp;amp;lt;a href=&amp;amp;amp;amp;quot;http://openstreetmap.org&amp;amp;amp;amp;quot;&amp;amp;amp;amp;gt;OpenStreetMap&amp;amp;amp;amp;lt;/a&amp;amp;amp;amp;gt; contributors, &amp;amp;amp;amp;lt;a href=&amp;amp;amp;amp;quot;http://creativecommons.org/licenses/by-sa/2.0/&amp;amp;amp;amp;quot;&amp;amp;amp;amp;gt;CC-BY-SA&amp;amp;amp;amp;lt;/a&amp;amp;amp;amp;gt;',
		    minZoom: 10,
		    maxZoom: 20
		}).addTo(map);

		// Bind our click function to the map
		map.on('click', findPoint);
	}

	</script>
	<script src="http://code.jquery.com/jquery-1.8.1.min.js" type="text/javascript"></script>
</head>
<body onload="initializeMap()">
<div id="map"></div>
</body>
</html>

So we now have a map that allows us to click anywhere to show the 10 nearest postcodes. The only restriction I’ve put in place is that you must be zoomed in enough to trigger a query, otherwise the resulting bounding box would yield a long result set.

Here’s the working example.