Note: A version of this blog post is available as an interactive Jupyter Notebook hosted on Google Colab.
The Wikidata relation doctoral advisor (P184) links researchers to the advisor or advisors who supervised their Ph.D. Note that coverage is currently a bit spotty. But, it's at least a superset of the data in the Mathematics Genealogy Project (because MGP data was imported into Wikidata), plus some from various other sources, such as parsing Wikipedia infoboxes, and manual additions (I've added quite a bit myself).
A nice thing about Wikidata is that it's queryable with SPARQL. Here is a query that finds all my academic "ancestors":
SELECT ?ancestorLabel WHERE {
wd:Q65921654 wdt:P184+ ?ancestor.
?ancestor wdt:P31 wd:Q5.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Brief explanation:
You can edit and run queries like this interactively over at Wikidata, or programmatically from any language that can make HTTP requests.
That's a flat list of ancestors. But how do they relate? It might be nice to draw out the tree. To do that, we need to save a little more data: who's connected to who. (The query below uses some techniques adapted from this Stackoverflow answer by Joshua Taylor.) Specifically, we want every link in the ancestry tree. In the query below, ?ancestor2 is a direct advisor of ?ancestor1, and ?ancestor1 is either myself, or one of my academic ancestors:
SELECT ?ancestor1Label ?ancestor2Label WHERE {
wd:Q65921654 wdt:P184* ?ancestor1.
?ancestor1 wdt:P184 ?ancestor2.
?ancestor1 wdt:P31 wd:Q5.
?ancestor2 wdt:P31 wd:Q5.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
As before, this can be run
interactively. But it's probably more useful to do it programmatically,
since we want to graph the results. For examples of doing that in Python, see
either the Jupyter
notebook version of this post, or the standalone Python
script.
Either way, we now we have enough information to draw an ancestor tree using graphviz!
Mine ends up being pretty giant, so I won't show it here. One of my advisors, Charles Isbell, has an absolutely huge ancestor tree that trails off into long chains of medieval mathematicians that the Mathematics Genealogy Project has meticulously chronicled. But here's the other half of my academic ancestor tree, the one starting at my advisor Michael Mateas:
So far we've been querying ancestors of a specific person. There are of course a lot more ways to slice and dice this big doctoral-advisor graph in Wikidata. Another one I find interesting: do two people have a common ancestor?
Here's one way to pull that out of SPARQL, again borrowing an idea from something Joshua Taylor posted on Stackoverflow. It's a bit hairier than the previous queries.
SELECT ?ancestor1aLabel ?ancestor2aLabel ?ancestor1bLabel ?ancestor2bLabel WHERE {
# ancestors of the first person leading to a common ancestor (or ancestors)
wd:Q65921654 wdt:P184* ?ancestor1a.
?ancestor1a wdt:P184 ?ancestor2a.
?ancestor2a wdt:P184* ?common_ancestor.
# ancestors of the second person leading to a common ancestor (or ancestors)
wd:Q105669257 wdt:P184* ?ancestor1b.
?ancestor1b wdt:P184 ?ancestor2b.
?ancestor2b wdt:P184* ?common_ancestor.
# stop at the common ancestor(s) rather than retrieving their own ancestors
FILTER NOT EXISTS {
wd:Q65921654 wdt:P184* ?intermediate_ancestor.
wd:Q105669257 wdt:P184* ?intermediate_ancestor.
?intermediate_ancestor wdt:P184 ?common_ancestor.
}
?ancestor1a wdt:P31 wd:Q5.
?ancestor2a wdt:P31 wd:Q5.
?ancestor1b wdt:P31 wd:Q5.
?ancestor2b wdt:P31 wd:Q5.
?common_ancestor wdt:P31 wd:Q5.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
This particular query (interactive
version) looks for a common ancestor between myself and Amy Hoover.
The result:
We weren't sure we had one, but it turns out that we do have a common academic ancestor, Roger Schank.
Naturally, all these queries depend on the data being in Wikidata. Anyone can add data there, so if you or your advisor are missing (or are in Wikidata but the advisor link isn't there), it's possible to go add it.
I've written some command-line Python scripts implementing the above queries: