OpenStreetMap data extraction

Many people have asked how they can extract data from the OpenStreetMap. Here is an example of how I got some locations I wanted out of the system.

I use PHP in my example but I geuss the method is transferrable. This solution is not the most ideal one but it gets the job done. I will update this example as I see fit to make it as good as possible. If you find a better way to do these things, please leave a comment.

The php and osm files used in this example can be found here (osm_example.rar).

There are many different sized files available at multiple sources but the ting is those files are huge so handling them requires good knowledge of the content of the file (because you can’t open it with a text editor and check it) but also massive, state-of-the-art hardware.

But since I know what I’m looking for and where to get it, I can export custom .osm file from OpenStreetMap.

First go to OpenStreetMap and hit the Export link at the top of the screen. Zoom and pan the map to select the area you want your information from. The maximum ammount of nodes the exported data can have is 50,000 so zoom close.

Select OpenStreetMap XML Data as the format you want to export and hit the Export-button. This should prompt a window where you can save the map.osm file containing the data of the selected area.

So now we have the file. Let’s jump to the .php-file.

I don’t want to get all nodes from the .osm file. I just want some of the places that are stored in there. I have gathered here the kind of places I want. We’ll come to this later so let’s just skip these arrays for now.

$maincategories = array('aerialway', 'craft', 'emergency', 'historic', 'leisure', 'office', 'shop', 'tourism', 'waterway', 'amenity');

$subcategories = array('station', 'aerodrome', 'helipad', 'beekeeper', 'blacksmith', 'brewery', 'carpenter', 'plumber', 'tailor', 'ambulance_station', 'battlefield', 'boundary_stone', 'castle', 'city_gate', 'fort', 'memorial', 'monument', 'ruins', 'rune_stone', 'wayside_cross', 'wayside_shrine', 'wreck', 'ship', 'bird_hide', 'golf_course', 'ice_rink', 'marina', 'miniature_golf', 'sports_centre', 'stadium', 'water_park', 'wildlife_hide', 'accountant', 'administrative', 'architect', 'camping', 'company', 'educational_institution', 'employment_agency', 'estate_agent', 'foundation', 'government', 'insurance', 'it', 'lawyer', 'newspaper', 'ngo', 'political_party', 'quango', 'research', 'telecommunication', 'travel_agent', 'clothes', 'general', 'alcohol', 'anime', 'appliance', 'art', 'baby_goods', 'bag', 'bakery', 'bathroom_furnishing', 'beauty', 'bed', 'beverages', 'bicycle', 'books', 'boutique', 'butcher', 'car', 'car_repair', 'car_parts', 'carpet', 'charity', 'chemist', 'clothes', 'computer', 'confectionery', 'convenience', 'copyshop', 'curtain', 'deli', 'department_store', 'dive', 'dry_cleaning', 'doityourself', 'electronics', 'erotic', 'fabric', 'farm', 'florist', 'frame', 'furnace', 'funeral_directors', 'furniture', 'garden_centre', 'gas', 'general', 'gift', 'glaziery', 'greengrocer', 'hairdresser', 'hardware', 'hearing_aids', 'herbalist', 'hifi', 'hunting', 'interior_decoration', 'jewelry', 'kiosk', 'kitchen', 'laundry', 'mall', 'massage', 'mobile_phone', 'money_lender', 'motorcycle', 'musical_instrument', 'newsagent', 'optician', 'organic', 'outdoor', 'paint', 'pawnbroker', 'pet', 'radiotechnics', 'seafood', 'fish', 'second_hand', 'shoes', 'sports', 'stationery', 'supermarket', 'tattoo', 'ticket', 'tobacco', 'toys', 'trade', 'vacant', 'vacuum_cleaner', 'variety_store', 'video', 'window_blind', 'kaikkimuut', 'alpine_hut', 'attraction', 'artwork', 'camp_site', 'caravan_site', 'chalet', 'guest_house', 'hostel', 'hotel', 'information', 'motel', 'museum', 'picnic_site', 'theme_park', 'viewpoint', 'wilderness_hut', 'zoo', 'dock');

$amenitycategories = array('bar', 'bbq', 'biergarten', 'cafe', 'fast_food', 'food_court', 'ice_cream', 'pub', 'restaurant', 'college', 'kindergarten', 'library', 'school', 'university', 'bicycle_rental', 'bus_station', 'car_rental', 'car_sharing', 'car_wash', 'ev_charging', 'ferry_terminal', 'fuel', 'parking', 'parking_entrance', 'parking_space', 'taxi', 'atm', 'bank', 'bureau_de_change', 'clinic', 'dentist', 'doctors', 'hospital', 'nursing_home', 'pharmacy', 'social_facility', 'veterinary', 'arts_centre', 'cinema', 'community_centre', 'fountain', 'nightclub', 'social_centre', 'stripclub', 'studio', 'theatre', 'courthouse', 'crematorium', 'embassy', 'fire_station', 'grave_yard', 'marketplace', 'place_of_worship', 'police', 'post_box', 'post_office', 'prison', 'recycling', 'sauna', 'shelter', 'telephone', 'toilets', 'townhall', 'vending_machine', 'waste_disposal');

Defining the file to use. In my case the map.osm file is in the same directory as this .php-file.

The file is read into a single simplexml-object. Therefore it’s good the file is only 2MB. I tried to do this trick for a file containing all data of Finland. The file was 2GB big and I ran out of hardware really fast.

There must be a way to read the file one part at a time, but in this case I just needed to get places from a small area of one town.

$xml = simplexml_load_file("map.osm");

Loop through all the elements in the file.

foreach($xml->children() AS $child){

We want to get some of the node-elements so we check if the current element’s name is node.

if($child->getName() == 'node'){

If the element’s name is node it’s time to set the variables for all the information we want to extract.

It is possible that only some of this information can be found in this node and that is not acceptable. In these cases I want to go to the next node-alement and start it all over. That’s why I empty these variables here and in the end of this file.

$lat = "";
$lon = "";
$nimi = "";
$category = "";
$subcategory = "";
$phone = "";
$website = "";
$addr = "";

It’s time to save the first data into variables. In this case I save the ‘lat’ and ‘lon’ attributes of the current node-element.

$lat = $child['lat'];
$lon = $child['lon'];

Most of the data I’m looking for is in node’s child elements. Therefore I have to loop the child elements of each node-element.

foreach($child->children() AS $grandchild){

I know the data is stored in tag-elements so let’s look for those.

if($grandchild->getName() == 'tag'){

A single tag-element can contain more than one tag-element and I just want data from some of those. The tag-elements have usually two attributes: key (‘k’) and value (‘v’).

To get the name of a place, look for a tag with a key-attribute ‘name’. When you find this tag the value attribute holds the name of the place.

if($grandchild['k'] == 'name'){
$nimi = $grandchild['v'];

I also want the phone number if it’s available.

if($grandchild['k'] == 'phone'){
$phone = $grandchild['v'];

Also website is good to have if possible.

if($grandchild['k'] == 'website'){
$website = $grandchild['v'];

Now comes the tricky part.

For some places there are more than one tag thet holds the information on what kind of place it is.

For example a place can have tagsand. I fin’d the tag with amenity-key always less descriptive as the other one. But in case the tag with amenity-key is the only one the place has I can live with it.

So first I loop through the key values of the gurent tag-element in hand and see if the ney is listed in the $maincategories-array at the top of the file. If there’s no tag with listed key I don’t want this place so I just go to the next node.

If the key is found in the $maincategories-array I use this key as the places main category and so I save it into a variable.

$category = $grandchild['k'];

If the category happens to be ‘amenity’ I check if the tag’s value is listed in $amenitycategories-array.

if($grandchild['k'] == 'amenity'){
$subcategory = $grandchild['v'];

If the category is not ‘amenity’ I check if the tag’s value is listed in $subcategories-array.

$subcategory = $grandchild['v'];

Place’s location, name and categories are the thing I want. If they are all saved in variables we’re good to proceed. Otherwise we just go to another node-element.

if($lat != "" && $lon != "" && $nimi != "" && $category != "" && $subcategory != ""){

Wouldn’t it be nice to know the actual address of the place, not just the coordinates? Well let’s find out the address then.

I did this by reverse geocoding the location.

You can also try to find as much of the address from the tag-elements as possible. I just skip that in this example for now. Maybe I add that function later.

Set your api key. In my example we can manage without the key because I’m using localhost.

$api_key = "";

Format this string with the appropriate latitude longitude

$url = ''.$lat.','.$lon.'&output=json&sensor=true_or_false&key='. $api_key;

Make the HTTP request

$data = @file_get_contents($url);

Parse the json response

$jsondata = json_decode($data,true);

If we get a placemark array and the status was good, get the addres

if(is_array($jsondata )&& $jsondata ['Status']['code']==200)

Let’s store the address in variable.

$addr = $jsondata ['Placemark'][0]['address'];

The $addr variable now stores the address in form “street, zip code town, country”. For axample “Ruokasenkatu 13, 96100 rovaniemi, Finland”. Note that the address is stored as string.

I want to store different parts of the address separately so I explode it into pieces.

$address = explode(',',$addr);

The street and country parts are easy to get.

$street = trim($address[0]);
$country = trim($address[2]);

The zip code and town have only whitespace separating them so we have to do a trick to get them.

First I explode the zip code and town name from white spaces. Don’t get fooled, this only gives us the zip code.

$paikka = explode(' ',trim($address[1]));
$zip = $paikka[0];

Since town names can have two or more parts in them we can’t assume that $paikka[1] always holds the whole town name. Therefore I took out the zip code from the original zip-town-variable.

$town = str_replace($zip,'', trim($address[1]));

Now we have parts of the address stored in variables and we can play with them.

At this point of my code I upload the gathered data into my database.

Note thet the website and phone number were not required so I add them to the database only if they exist.

After the data is laying around the database, let’s empty the variables so you don’t get doublicate data.

$lat = "";
$lon = "";
$nimi = "";
$category = "";
$subcategory = "";
$phone = "";
$website = "";
$addr = "";


Tags: , , ,

3 responses to “OpenStreetMap data extraction”

  1. home remedies for hip pain says :

    Therefore, taken frequently, there’s always an opportunity that it might prove to be damaging for your health. the one study in this category has a PEDro score of 8. In the case of lower back pain and back injury, traditional acupuncture has been shown to be effective in providing pain relief for both acute and chronic pain, back aches and back spasms.

  2. related webpage says :

    Huge items in this article. I am just very fulfilled to appear the write-up. Many thanks with this particular waiting for feel a person. Might you please fall us a e-mail?

Trackbacks / Pingbacks

  1. Quora - June 30, 2012

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: