Images from Wikidata

Enter a search term in one of the following six languages, and we'll try to find an image for it using Wikidata, via two search methods. Search terms can be people, places, events, objects, plants, etc. E.g. try "granite" or "Obama" in English, "ordinateur" in French, "δάσος" in Greek, etc. Full explanation and code below.

Search for:

Direct Wikidata search:

Via Wikipedia sitelinks:

Open-domain image retrieval

What's the motivation here?

One motivation: A thing people sometimes want in generative projects is to be able to get an image for any arbitrary search term, allowing the generator to be more open-domain than if you took the approach of starting with a fixed sprite pack. Getting good results is pretty hard, but if you're okay with just "something" for any search term, there have been various solutions over the years. A popular one used to be grabbing the first result from the now-deprecated Google Image Search API; Flickr search is another option. This post shows how to use Wikidata queries for that purpose. I believe using Wikidata for this has a few advantages, such as multi-lingual search and labeling, an open API run by a nonprofit organization, and at least some degree of human curation.

Wikidata

A quick overview of Wikidata: It's a structured-data project under the same umbrella organization as Wikipedia, and contains millions of "entities" that each represent some person, thing, event, etc. Examples of entities are granite (Q41177), Elvis Presley (Q303), and Great Molasses Flood (Q1129089). Entities are canonically referred to by this Qxxx ID, because no specific language is considered primary (the names I gave in the previous sentence are just the English labels attached to them).

Many of the entities were initially imported from Wikipedia articles, and are linked back to the corresponding Wikipedia articles. To each entity is attached a label and a short description in one or more languages, plus a series of "claims", which constitute the structured data. Claims are statements of properties attributed to the entity: date of birth (P569) for a person, DOI (P356) for a scientific paper, etc. These come from a mixture of manual curation, scripts importing data from Wikipedia articles (e.g. from infoboxes), and scripts importing data from other open-data sources.

One of the many possible properties that a Wikidata entity can have is image (P18), which, if it exists, links to one or more freely licensed images on the Wikimedia Commons media repository that are claimed to be a good visual representation of the entity. That's what this post is about retrieving.

Wikibase API

The most direct way of getting things out of Wikidata is to use the Wikibase API. You can use it directly, but this example uses the Wikibase SDK, a small Javascript library that provides some convenience methods to build query URLs and simplify the returned results.

The demo above implements two ways of searching. We can search Wikidata directly in a specified language, which is perhaps the obvious thing to do. Alternately, we can do a "sitelink" search, which means searching for a Wikipedia article in the specified language, and then grabbing its linked Wikidata entity. These often produce the same results, but not always.

A brief sketch of the pros and cons of the two search methods: For languages with large Wikipedias, the sitelink method seems to often produce better results for ambiguous searches, because it makes use of manually created redirects to the most common meaning. For example, a direct Wikidata search in English for "bee" turns up the letter B as the first result, while a sitelink search via the English Wikipedia turns up the insect, which is probably what we wanted. On the other hand, for languages with smaller Wikipedias that have fewer manually created redirects, the direct Wikidata search may produce better results. For example, a direct Wikidata search in Greek for "Έλβις" turns up Elvis Presley, while a sitelink search only works if you use his full name. You can try both methods in a few languages at the top of this post.

Code

For the actual implementation of the demo at the top of this post, view-source in your browser. Below is a cut-down example that doesn't do DOM manipulation or retrieve entity labels/descriptions, showing how to retrieve an image URL. It's a pretty quick-and-dirty implementation with minimal error handling, but it hopefully illustrates how to get an image out of the API.

const wdk = WBK({
    instance: "https://www.wikidata.org",
    sparqlEndpoint:"https://query.wikidata.org/sparql"
})

// Direct Wikidata search
function search(term, lang) {
    const searchUrl = wdk.searchEntities({
	search: term,
	language: lang,
	limit: 1
    })
    fetch(searchUrl).then(r => r.json())
        // grab ID of the first search result
	.then(r => r.search[0].id)
        // look up the claims for that ID
	.then(id => wdk.getEntities({ids:id, props:'claims'}))
	.then(entityUrl => fetch(entityUrl))
	.then(r => r.json())
	.then(wdk.simplify.entities)
        // grab the "P18" (image) claims, if they exist
	.then(r => r[Object.keys(r)[0]].claims["P18"])
        // get a URL for the first image if there is one (resized to width=300)
	.then(images => images ? wdk.getImageUrl(images[0], 300) : null)
        .then(imageUrl => alert(imageUrl)) // do something with the URL
}

// Search Wikidata via Wikipedia sitelinks
function searchSitelinks(term, lang) {
    const searchUrl = wdk.getEntitiesFromSitelinks(term, lang + 'wiki')
    fetch(searchUrl).then(r => r.json())
	.then(wdk.simplify.entities)
        // grab the "P18" (image) claims, if they exist
	.then(r => r[Object.keys(r)[0]].claims["P18"])
        // get a URL for the first image if there is one (resized to width=300)
	.then(images => images ? wdk.getImageUrl(images[0], 300) : null)
        .then(imageUrl => alert(imageUrl)) // do something with the URL
}