Note
You can download this tutorial in the .ipynb format or the .py format.
Entity API Tutorial
In this tutorial you will learn how to use the Entity
API, which offers information
about several type of entities: agent, place, concept and
timespan. These named entities are part of the Europeana Entity
Collection, a collection of entities in the context of Europeana
harvested from and linked to controlled vocabularies, such as Geonames,
Dbpedia and Wikidata.
The Entity API has three methods:
apis.entity.suggest: returns entities of a certain type matching a text queryapis.entity.retrieve: returns information about an individual entity of a certain typeapis.entity.resolve: returns entities that match a query url
We will use PyEuropeana, a Python client library for Europeana APIs. Read more about how the package works in the Documentation.
Install PyEuropeana with pip:
pip install pyeuropeana
Europeana APIs require a key for authentication, find more information
on how to get your API key
here. Once you obtain your
key you can set it as an environment variable using the os library:
import os
os.environ['EUROPEANA_API_KEY'] = 'your_API_key'
import pandas as pd
pd.set_option('display.max_colwidth', 15)
import pyeuropeana.apis as apis
Agents
In this section we focus on the agent type of entities. We would like to
find out if there are agents that match some query. In the following
cell we import the apis module from pyeuropeana and call the
suggest method, which returns a dictionary
resp = apis.entity.suggest(
text = 'leonardo',
TYPE = 'agent',
)
resp.keys()
dict_keys(['@context', 'type', 'total', 'items'])
The response contains several fields. The field total represents the
number of entities matching our query
resp['total']
10
The field items contains a list where each object represents an
entity, which are the results of the search
len(resp['items'])
10
This list can be converted in a pandas DataFrame as follows:
df = pd.json_normalize(resp['items'])
cols = df.columns.tolist()
cols = cols[-2:]+cols[:-2]
df = df[cols]
The resulting dataframe has several columns. The id column contain
the identifier for the entity. The columns starting with shownBy
contain information about an illustration for a given entity. We can
discard this information if we want
rm_cols = [col for col in df.columns if 'isShownBy' in col]
df = df.drop(columns=rm_cols)
df.head()
| prefLabel.en | altLabel.en | id | type | dateOfBirth | dateOfDeath | |
|---|---|---|---|---|---|---|
| 0 | Leonardo da... | [Leonardo d... | http://data... | Agent | 1452-04-15 | 1519-05-02 |
| 1 | Leonardo Leo | [Leo, Leona... | http://data... | Agent | 1694-08-05 | 1744-10-31 |
| 2 | Leonardo Sc... | [Sciascia, ... | http://data... | Agent | 1921-01-08 | 1989-11-20 |
| 3 | Leonardo Pa... | [Padura Fue... | http://data... | Agent | 1955 | NaN |
| 4 | Bruno Leona... | [Gelber, Br... | http://data... | Agent | 1941-03-19 | NaN |
We have some information about several entities matching our query. What other information can we obtain for these entities?
The method retrieve can be used to obtain more information about a
particular entity using its identifier. The id column in the table
above contains the uris of the different entities, where the identifier
is an integer located at the end of each entiry uri.
For example, for the entity Leonardo da Vinci with uri
http://data.europeana.eu/agent/base/146741 we can call retrieve as:
resp = apis.entity.retrieve(
TYPE = 'agent',
IDENTIFIER = 146741,
)
resp.keys()
dict_keys(['@context', 'id', 'type', 'isShownBy', 'prefLabel', 'altLabel', 'dateOfBirth', 'end', 'dateOfDeath', 'placeOfBirth', 'placeOfDeath', 'biographicalInformation', 'identifier', 'sameAs'])
We observe that the response contains several fields, some of them not present in the suggest method.
The field prefLabel contains a list of the name of the entity in
different languages. We can transform this list into a dataframe
def get_name_df(resp):
lang_name_df = None
if 'prefLabel' in resp.keys():
lang_name_df = pd.DataFrame([{'language':lang,'name':name} for lang,name in resp['prefLabel'].items()])
return lang_name_df
lang_name_df = get_name_df(resp)
lang_name_df.head()
| language | name | |
|---|---|---|
| 0 | ar | ليوناردو دا... |
| 1 | az | Leonardo da... |
| 2 | be | Леанарда да... |
| 3 | bg | Леонардо да... |
| 4 | bs | Leonardo da... |
The field biographicalInformation can be useful to know more about
the biography of the agent in particular. This information is also
multilingual, and can be transformed into a pandas DataFrame
def get_biography_df(resp):
bio_df = None
if 'biographicalInformation' in resp.keys():
bio_df = pd.DataFrame(resp['biographicalInformation'])
return bio_df
bio_df = get_biography_df(resp)
bio_df.head()
| @language | @value | |
|---|---|---|
| 0 | de | Leonardo da... |
| 1 | no | Leonardo di... |
| 2 | hi | लिओनार्दो द... |
| 3 | fi | Leonardo di... |
| 4 | be | Леана́рда д... |
We can access the biography in English for instance in the following way
bio_df['@value'].loc[bio_df['@language'] == 'en'].values[0]
'Leonardo di ser Piero da Vinci (Italian pronunciation: [leoˈnardo da vˈvintʃi] About this sound pronunciation ; April 15, 1452 – May 2, 1519, Old Style) was an Italian Renaissance polymath: painter, sculptor, architect, musician, mathematician, engineer, inventor, anatomist, geologist, cartographer, botanist, and writer. His genius, perhaps more than that of any other figure, epitomized the Renaissance humanist ideal.Leonardo has often been described as the archetype of the Renaissance Man, a man of "unquenchable curiosity" and "feverishly inventive imagination". He is widely considered to be one of the greatest painters of all time and perhaps the most diversely talented person ever to have lived. According to art historian Helen Gardner, the scope and depth of his interests were without precedent and "his mind and personality seem to us superhuman, the man himself mysterious and remote". Marco Rosci states that while there is much speculation about Leonardo, his vision of the world is essentially logical rather than mysterious, and that the empirical methods he employed were unusual for his time.Born out of wedlock to a notary, Piero da Vinci, and a peasant woman, Caterina, in Vinci in the region of Florence, Leonardo was educated in the studio of the renowned Florentine painter Verrocchio. Much of his earlier working life was spent in the service of Ludovico il Moro in Milan. He later worked in Rome, Bologna and Venice, and he spent his last years in France at the home awarded him by Francis I.Leonardo was, and is, renowned primarily as a painter. Among his works, the Mona Lisa is the most famous and most parodied portrait and The Last Supper the most reproduced religious painting of all time, with their fame approached only by Michelangelo's The Creation of Adam. Leonardo's drawing of the Vitruvian Man is also regarded as a cultural icon, being reproduced on items as varied as the euro coin, textbooks, and T-shirts. Perhaps fifteen of his paintings have survived, the small number because of his constant, and frequently disastrous, experimentation with new techniques, and his chronic procrastination. Nevertheless, these few works, together with his notebooks, which contain drawings, scientific diagrams, and his thoughts on the nature of painting, compose a contribution to later generations of artists rivalled only by that of his contemporary, Michelangelo.Leonardo is revered for his technological ingenuity. He conceptualised flying machines, a tank, concentrated solar power, an adding machine, and the double hull, also outlining a rudimentary theory of plate tectonics. Relatively few of his designs were constructed or were even feasible during his lifetime, but some of his smaller inventions, such as an automated bobbin winder and a machine for testing the tensile strength of wire, entered the world of manufacturing unheralded. He made important discoveries in anatomy, civil engineering, optics, and hydrodynamics, but he did not publish his findings and they had no direct influence on later science.'
Now, let’s say that we want to find the biography for all the entities
returned by entity.search. We can encapsulate the previous steps
into a function that can be applied to the DataFrame reulting from
entity.search:
def get_bio_uri(uri):
id = int(uri.split('/')[-1])
resp = apis.entity.retrieve(
TYPE = 'agent',
IDENTIFIER = id,
)
bio_df = get_biography_df(resp)
bio = bio_df['@value'].loc[bio_df['@language'] == 'en'].values[0]
return bio
df['bio'] = df['id'].apply(get_bio_uri)
df.head()
| prefLabel.en | altLabel.en | id | type | dateOfBirth | dateOfDeath | bio | |
|---|---|---|---|---|---|---|---|
| 0 | Leonardo da... | [Leonardo d... | http://data... | Agent | 1452-04-15 | 1519-05-02 | Leonardo di... |
| 1 | Leonardo Leo | [Leo, Leona... | http://data... | Agent | 1694-08-05 | 1744-10-31 | Leonardo Le... |
| 2 | Leonardo Sc... | [Sciascia, ... | http://data... | Agent | 1921-01-08 | 1989-11-20 | Leonardo Sc... |
| 3 | Leonardo Pa... | [Padura Fue... | http://data... | Agent | 1955 | NaN | Leonardo Pa... |
| 4 | Bruno Leona... | [Gelber, Br... | http://data... | Agent | 1941-03-19 | NaN | Bruno Leona... |
The biography in English has been added for each entity. Great!
Something of interest can be the place of birth and death of the agents. We can create a function as:
def get_place_resp(resp, event):
if event == 'birth':
if 'placeOfBirth' not in resp.keys():
return
place = resp['placeOfBirth']
elif event == 'death':
if 'placeOfDeath' not in resp.keys():
return
place = resp['placeOfDeath']
if not place:
return
place = list(place[0].values())[0]
if place.startswith('http'):
place = place.split('/')[-1].replace('_',' ')
return place
resp = apis.entity.retrieve(
TYPE = 'agent',
IDENTIFIER = 146741,
)
get_place_resp(resp, 'birth')
'Republic of Florence'
Note
The function above parses the URI and extracts the name of the places of
birth and date. In reality we should use either the resolve method
of the Entity API, if the URI is that of an entity in Europeana’s Entity
Collection, or seek to de-reference it using (Linked Data) content
negotiation, if it
is not known in the Entity Collection.
Now we can add this information to the original DataFrame:
def get_place(uri,event):
id = int(uri.split('/')[-1])
resp = apis.entity.retrieve(
TYPE = 'agent',
IDENTIFIER = id,
)
return get_place_resp(resp,event)
df['placeOfBirth'] = df['id'].apply(lambda x: get_place(x,'birth'))
df['placeOfDeath'] = df['id'].apply(lambda x: get_place(x,'death'))
df.head()
| prefLabel.en | altLabel.en | id | type | dateOfBirth | dateOfDeath | bio | placeOfBirth | placeOfDeath | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Leonardo da... | [Leonardo d... | http://data... | Agent | 1452-04-15 | 1519-05-02 | Leonardo di... | Republic of... | Kingdom of ... |
| 1 | Leonardo Leo | [Leo, Leona... | http://data... | Agent | 1694-08-05 | 1744-10-31 | Leonardo Le... | None | Naples |
| 2 | Leonardo Sc... | [Sciascia, ... | http://data... | Agent | 1921-01-08 | 1989-11-20 | Leonardo Sc... | Racalmuto | Sicily |
| 3 | Leonardo Pa... | [Padura Fue... | http://data... | Agent | 1955 | NaN | Leonardo Pa... | Cuba | None |
| 4 | Bruno Leona... | [Gelber, Br... | http://data... | Agent | 1941-03-19 | NaN | Bruno Leona... | Buenos Aires | None |
The previous pipeline can be applied to any other agent:
resp = apis.entity.suggest(
text = 'Marguerite Gérard',
TYPE = 'agent',
)
df = pd.json_normalize(resp['items'])
df = df.drop(columns=[col for col in df.columns if 'isShownBy' in col])
df['bio'] = df['id'].apply(get_bio_uri)
df['placeOfBirth'] = df['id'].apply(lambda x: get_place(x,'birth'))
df['placeOfDeath'] = df['id'].apply(lambda x: get_place(x,'death'))
df.head()
| id | type | dateOfBirth | dateOfDeath | prefLabel.en | altLabel.en | bio | placeOfBirth | placeOfDeath | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | http://data... | Agent | 1761-01-28 | 1837-05-18 | Marguerite ... | [Marguerite... | Marguerite ... | France | Paris |
Finally, we can use the method resolve for obtaining the entity
matching a an external URI when it is present as entity in the Europeana
Entity Collection. Find more information in the documentation of the
Entity API
resp = apis.entity.resolve('http://dbpedia.org/resource/Leonardo_da_Vinci')
resp.keys()
dict_keys(['@context', 'id', 'type', 'isShownBy', 'prefLabel', 'altLabel', 'dateOfBirth', 'end', 'dateOfDeath', 'placeOfBirth', 'placeOfDeath', 'biographicalInformation', 'identifier', 'sameAs'])
Places
One of the types of entities we can work with are places. Let’s get the place of death of the previous agent
place_of_death = df['placeOfDeath'].values[0]
place_of_death
'Paris'
We can now search the entity corresponding to this place by using the
suggest method using place as the TYPE argument.
resp = apis.entity.suggest(
text = place_of_death,
TYPE = 'place',
)
place_df = pd.json_normalize(resp['items'])
cols = place_df.columns.tolist()
cols = cols[-1:]+cols[:-1]
place_df = place_df[cols]
place_df.head()
| prefLabel.en | id | type | isPartOf | |
|---|---|---|---|---|
| 0 | Paris | http://data... | Place | [{'id': 'ht... |
| 1 | La Defense | http://data... | Place | [{'id': 'ht... |
| 2 | Jõelähtme P... | http://data... | Place | [{'id': 'ht... |
| 3 | Vihula Parish | http://data... | Place | [{'id': 'ht... |
| 4 | Põlva Parish | http://data... | Place | [{'id': 'ht... |
Let’s use the first uri with the retrieve method
uri = place_df['id'].values[0]
IDENTIFIER = uri.split('/')[-1]
resp = apis.entity.retrieve(
IDENTIFIER = IDENTIFIER,
TYPE = 'place',
)
resp.keys()
dict_keys(['@context', 'id', 'type', 'prefLabel', 'altLabel', 'lat', 'long', 'isPartOf', 'sameAs'])
We can reuse the function get_name_df for places as well, as the
response has a similar data structure as for agent
name_df = get_name_df(resp)
name_df.head()
| language | name | |
|---|---|---|
| 0 | Paris | |
| 1 | de | Paris |
| 2 | en | Paris |
| 3 | es | Paris |
| 4 | fr | Paris |
The response include the field isPartOf, which indicates an entity
that the current entity belongs to, if any
is_part_uri = resp['isPartOf'][0]
is_part_uri
'http://data.europeana.eu/place/base/42377'
Let’s see what this misterious uri refers to using the retrieve method
is_part_id = is_part_uri.split('/')[-1]
resp = apis.entity.retrieve(
IDENTIFIER = is_part_id,
TYPE = 'place',
)
name_df = get_name_df(resp)
name_df.head()
| language | name | |
|---|---|---|
| 0 | Île-de-France | |
| 1 | de | Île-de-France |
| 2 | en | Île-de-France |
| 3 | es | Isla de Fra... |
| 4 | fr | Région pari... |
It had to be the emblematic Île-de-France, of course! And its coordinates are:
f"lat: {resp['lat']}, long: {resp['long']}"
'lat: 48.7, long: 2.5'
Concepts
Let’s query for all concepts
resp = apis.entity.suggest(
text = 'war',
TYPE = 'concept',
)
resp['total']
3
We build a table containing the field items, were we can see the
name and uri of the different concepts
df = pd.json_normalize(resp['items'])
df = df.drop(columns=[col for col in df.columns if 'isShownBy' in col])
df.head()
| id | type | prefLabel.en | |
|---|---|---|---|
| 0 | http://data... | Concept | World War I |
| 1 | http://data... | Concept | War photogr... |
| 2 | http://data... | Concept | Raku ware |
Do we want to know more information about the first concept of the list? We got it
concept_uri = df['id'].values[0]
concept_uri
'http://data.europeana.eu/concept/base/83'
concept_id = concept_uri.split('/')[-1]
resp = apis.entity.retrieve(
IDENTIFIER = concept_id,
TYPE = 'concept',
)
name_df = get_name_df(resp)
name_df.loc[name_df['language'] == 'en']
| language | name | |
|---|---|---|
| 11 | en | World War I |
The concept is World War I. We can get some related concepts from dbpedia
resp['related'][:5]
['http://dbpedia.org/resource/Category:Wars_involving_Nicaragua',
'http://dbpedia.org/resource/Category:Wars_involving_the_United_Kingdom',
'http://dbpedia.org/resource/Category:Wars_involving_Greece',
'http://dbpedia.org/resource/Category:Wars_involving_Sri_Lanka',
'http://dbpedia.org/resource/Category:Wars_involving_Czechoslovakia']
The field note contains a multilingual description of the concept
note_df = pd.json_normalize([{'lang':k,'note':v[0]} for k,v in resp['note'].items()])
note_df.head()
| lang | note | |
|---|---|---|
| 0 | ar | الحرب العال... |
| 1 | az | Birinci dün... |
| 2 | be | Першая сусв... |
| 3 | bg | Първата све... |
| 4 | bs | Prvi svjets... |
We can obtain the description for a particular language as
note_df['note'].loc[note_df['lang'] == 'en'].values[0]
"World War I (WWI or WW1), also known as the First World War, was a global war centred in Europe that began on 28 July 1914 and lasted until 11 November 1918. From the time of its occurrence until the approach of World War II, it was called simply the World War or the Great War, and thereafter the First World War or World War I. In America, it was initially called the European War. More than 9 million combatants were killed; a casualty rate exacerbated by the belligerents' technological and industrial sophistication, and tactical stalemate. It was one of the deadliest conflicts in history, paving the way for major political changes, including revolutions in many of the nations involved.The war drew in all the world's economic great powers, which were assembled in two opposing alliances: the Allies (based on the Triple Entente of the United Kingdom, France and the Russian Empire) and the Central Powers of Germany and Austria-Hungary. Although Italy had also been a member of the Triple Alliance alongside Germany and Austria-Hungary, it did not join the Central Powers, as Austria-Hungary had taken the offensive against the terms of the alliance. These alliances were both reorganised and expanded as more nations entered the war: Italy, Japan and the United States joined the Allies, and the Ottoman Empire and Bulgaria the Central Powers. Ultimately, more than 70 million military personnel, including 60 million Europeans, were mobilised in one of the largest wars in history.Although a resurgence of imperialism was an underlying cause, the immediate trigger for war was the 28 June 1914 assassination of Archduke Franz Ferdinand of Austria, heir to the throne of Austria-Hungary, by Yugoslav nationalist Gavrilo Princip in Sarajevo. This set off a diplomatic crisis when Austria-Hungary delivered an ultimatum to the Kingdom of Serbia, and international alliances formed over the previous decades were invoked. Within weeks, the major powers were at war and the conflict soon spread around the world.On 28 July, the Austro-Hungarians fired the first shots in preparation for the invasion of Serbia. As Russia mobilised, Germany invaded neutral Belgium and Luxembourg before moving towards France, leading Britain to declare war on Germany. After the German march on Paris was halted, what became known as the Western Front settled into a battle of attrition, with a trench line that would change little until 1917. Meanwhile, on the Eastern Front, the Russian army was successful against the Austro-Hungarians, but was stopped in its invasion of East Prussia by the Germans. In November 1914, the Ottoman Empire joined the war, opening fronts in the Caucasus, Mesopotamia and the Sinai. Italy and Bulgaria went to war in 1915, Romania in 1916, and the United States in 1917.The war approached a resolution after the Russian government collapsed in March, 1917, and a subsequent revolution in November brought the Russians to terms with the Central Powers. On 4 November 1918, the Austro-Hungarian empire agreed to an armistice. After a 1918 German offensive along the western front, the Allies drove back the Germans in a series of successful offensives and began entering the trenches. Germany, which had its own trouble with revolutionaries, agreed to an armistice on 11 November 1918, ending the war in victory for the Allies.By the end of the war, four major imperial powers—the German, Russian, Austro-Hungarian and Ottoman empires—ceased to exist. The successor states of the former two lost substantial territory, while the latter two were dismantled. The map of Europe was redrawn, with several independent nations restored or created. The League of Nations formed with the aim of preventing any repetition of such an appalling conflict. This aim failed, with weakened states, renewed European nationalism and the German feeling of humiliation contributing to the rise of fascism and the conditions for World War II."
Tips for using entities with the Search API
Once we know the identifier for a certain entity we can use the Search API to obtain objects containing it.
For instance we can query objects containing the entity “Painting” using its uri http://data.europeana.eu/concept/base/47
concept_uri = 'http://data.europeana.eu/concept/base/47'
resp = apis.search(
query = f'"{concept_uri}"'
)
resp['totalResults']
120708
Notice that in order to use a uri as a query we need to wrap it in quotation marks ““.
We might want to query for object belonging to more than one entity. We can simply do that by using logical operators in the query. Querying for paintings from the 16th century:
resp = apis.search(
query = '"http://data.europeana.eu/timespan/16" AND "http://data.europeana.eu/concept/base/47"',
media = True,
qf = 'TYPE:IMAGE'
)
resp['totalResults']
300
Queyring for paintings with some relation to Paris
resp = apis.search(
query = '"http://data.europeana.eu/place/base/41488" AND "http://data.europeana.eu/concept/base/47"',
media = True,
qf = 'TYPE:IMAGE'
)
resp['totalResults']
14
When querying for entities uris, the objects returned are those that have the requested uris in the metadata.
However, not all objects contain this information and instead many of them contain the name of the entity. It is always a good idea to query for the name of the entities as well, as there might be more objects:
resp = apis.search(
query = 'Paris AND Painting',
media = True,
qf = 'TYPE:IMAGE'
)
resp['totalResults']
3612
Conclusions
In this tutorial we learned:
What types of entities are available in the Europeana Entity API
To use the
suggestmethod for obtaining entities of a certain type matching a text queryTo use the
retrievemethod for obtaining information about an individual entity of a certain typeTo use the method
resolvefor obtaining entities that match a query urlTo process some of the fields contained in the responses of the methods above and convert the responses to Pandas dataframes
To query for entities using Europeana Search API