OpenStreetMap Interview: Andy Mabbett, Wikidata and OSM

Our interview series with members of the OpenStreetMap community continues today with a discussion with Andy Mabbett about the relationship between OpenStreetMap and Wikidata.

Andy, many thanks for agreeing to speak with us.

1. Who are you and what do you do? What got you into OpenStreetMap?

I’m Andy Mabbett from Birmingham, England. I’ve been creating content online since 1994 (I was the manager of Birmingham City Council’s website; reputedly the first local authority website in the world). I’ve been a Wikipedia editor since 2003 and have worked as a Wikipedian (or Wikimedian - same difference) in Residence with a number of museums, galleries and learned societies.

Being a campaigner for open and crowd-sourced content, I started editing OSM in 2009 (although I did a bit of editing on the OSM wiki in 2007 - I’m now a bureaucrat there). I facilitated the main stream at State of the Map in Birmingham in 2013 and spoke about Wikidata in OSM at Wikimania (Wikipedias equivalent of SotM) in 2014 and at SotM US in New York in 2015.

2. You have a lot of experience in Wikipedia and on bringing Wikidata ids to OpenStreetMap? Tell us about your background, what the Wikidata project is, and how Wikidata is used in OSM.

Wikidata is a linked- open- data repository from the same community that makes Wikipedia. There are actually around 300 Wikipedias in different languages, and it doesn’t make sense for each of them to store the same data separately, so we built a database to do that, with one record (or “item”) for each Wikipedia article, regardless of the language (so just one for “London” even though there are over two hundred Wikipedias with articles on London) from which Wikipedias can “transclude” data, much like they do images from Wikimedia Commons. Of course it then made sense to make that data available to others, via an API, or SPARQL queries (the same technology as used by Overpass Turbo), or by through full data dumps. And then we started to add data about more things, that do not have Wikipedia articles. So, for example, we might have one item for each artwork listed on “List of public art in Birmingham”, one for every painting by van Gogh, one for every academic paper cited in a Wikipedia, and so on. (Of course, like OSM this is all work in progress.)

I’ve always been interested in adding tags to objects in OSM with identifiers from other systems, lie the ID of a plaque in Open Plaques. We also encourage people to use, in their own databases, the IDs of items in Wikidata as unique identifiers for the subject described. So it makes sense early to tag buildings, statues, lakes and dozens of other types of feature with the Wikidata item identifying the same subject.

This allows users (whether humans or computer programmes, aka “bots”) to fetch related data, that OSM rightly doesn’t store. So you could do things like (to manufacture an exaggerated example) find all the buildings in England that have an architect educated at Oxford University in the 1870s, and show their birthplace on a map.

And not only can you include the Wikidata ID of of the building or other feature, but you can give the OSM ID of the person or thing it’s named after, the company that occupies it, the material it’s made from, and much more besides.

3. Addition of Wikidata ids to OSM has at times been a controversial topic. What are the main arguments? How would you like to see the situation evolve?

Arguments against including Wikidata IDs sit on a spectrum, from valid concerns that need more work to ensure everything is as it should be (Wikidata, like OSM, is not yet nether perfect nor finished) to what I can only describe as technical Luddism.

For instance a few people think there should be no links from OSM to other systems. Some think that OSM should create its own data repository, and in effect, replicate a large part of Wikidata. That makes no more sense that it would for Wikipedia to start drawing its own map of the world, instead of showing tiles derived from (or linking to) OSM. Though such people are in small minority they are sadly quite vocal.

My way of looking at this is to think of Wikipedia, Wikidata and OSM as being parts of a single system. Good system management would ensure that each part works on its speciality, and none is redundant to another.

4. What can OSM learn from wikipedia in how it organises itself?

One of the big differences is in the use of wiki pages for discussion - most Wikipedia discussion happens on talk pages, and that’s the canonical place for reaching consensus. There are mailing list, Facebook groups, and of course real-life meetups, but the Wiki is primary. In OSM, the wiki seems secondary, and underused, and it can be hard to know what consensus has been reached, and where it was reached.

5. As an active member of both, what are major differences between the OSM and wikipedia communities?

Size, obviously. And the Wikipedia community are working on 300 different websites (plus the sister projects, of course) whereas there’s only one OSM. that said, it’s common for a single, English-language, Wikipedia article to be worked on by editors in England, the United States, Canada, Australia and elsewhere. In OSM, one is naturally most likely to spend the majority of time working with ones close geographical neighbours.

6. What is the best way for people to get involved? Is there a mailing list or discussion forum for those interested in this topic?

There is no specific mailing list for using Wikidata IDs in OSM, but there is a wiki page. I’d also urge people to get to know how to add data to Wikidata items, and how to create new ones.

7. In 2014 OSM celebrated its 10th birthday. Where do you think the project will be in 10 years time, both globally and with regards to the relationship with Wikimedia specifically?

I’d like to think OSM will be far more integrated with (linked to, and from) both Wikidata and other linked data systems. It would be good to see a solution for the non-permanence of objects in OSM, where a node the represents a building is deleted when someone replaces it with a polygon for instance. This cause problems for systems that have linked to the URL representing the node.

The WMF has recently launched its own tile server based on OSM data so that slippy maps can be included in Wikipedia articles and on other WMF projects. This is not much used, yet, especially in the English Wikipedia, and I’m sure that its use will be commonplace long before a decade is up.

And I have no doubt that in ten years we’ll still be having incredibly nerdy discussions about the best way to tag certain types of pet toilet…

Many thanks, Andy! As someone who has added many wikidata ids to OSM relations (mainly mid-tier adinistrative levels) around the world so as to be able to link OpenStreetMap with other datasets, I really appreciate the work the Wikipedia community has done. Nevertheless of course the question of relevancy does come up, and probably not every park bench needs its own wikidata id. It’s great to see both communities growing, and I hope a happy and productive balance of collaboration can be found.

BTW - as a reminder, in the OpenCage Geocoder we return the wikidata code of a location as an annotation.

Please let us know if your community would like to be part of our interview series here on our blog. If you are or know of someone we should interview, please get in touch, we’re always looking to promote people doing interesting things with open geo data.