Note

You can download this tutorial in the .ipynb format or the .py format.

Download Python source code

Download as Jupyter Notebook

Entity API Tutorial

In this tutorial you will learn how to use the Entity API, which offers information about several type of entities: agent, place, concept and timespan. These named entities are part of the Europeana Entity Collection, a collection of entities in the context of Europeana harvested from and linked to controlled vocabularies, such as ​Geonames, Dbpedia and Wikidata.

The Entity API has three methods:

  • apis.entity.suggest: returns entities of a certain type matching a text query

  • apis.entity.retrieve: returns information about an individual entity of a certain type

  • apis.entity.resolve: returns entities that match a query url

We will use PyEuropeana, a Python client library for Europeana APIs. Read more about how the package works in the Documentation.

Install PyEuropeana with pip:

pip install pyeuropeana

Europeana APIs require a key for authentication, find more information on how to get your API key here. Once you obtain your key you can set it as an environment variable using the os library:

import os
os.environ['EUROPEANA_API_KEY'] = 'your_API_key'
import pandas as pd
pd.set_option('display.max_colwidth', 15)

import pyeuropeana.apis as apis

Agents

In this section we focus on the agent type of entities. We would like to find out if there are agents that match some query. In the following cell we import the apis module from pyeuropeana and call the suggest method, which returns a dictionary

resp = apis.entity.suggest(
   text = 'leonardo',
   TYPE = 'agent',
)

resp.keys()
dict_keys(['@context', 'type', 'total', 'items'])

The response contains several fields. The field total represents the number of entities matching our query

resp['total']
10

The field items contains a list where each object represents an entity, which are the results of the search

len(resp['items'])
10

This list can be converted in a pandas DataFrame as follows:

df = pd.json_normalize(resp['items'])
cols = df.columns.tolist()
cols = cols[-2:]+cols[:-2]
df = df[cols]

The resulting dataframe has several columns. The id column contain the identifier for the entity. The columns starting with shownBy contain information about an illustration for a given entity. We can discard this information if we want

rm_cols = [col for col in df.columns if 'isShownBy' in col]
df = df.drop(columns=rm_cols)
df.head()
prefLabel.en altLabel.en id type dateOfBirth dateOfDeath
0 Leonardo da... [Leonardo d... http://data... Agent 1452-04-15 1519-05-02
1 Leonardo Leo [Leo, Leona... http://data... Agent 1694-08-05 1744-10-31
2 Leonardo Sc... [Sciascia, ... http://data... Agent 1921-01-08 1989-11-20
3 Leonardo Pa... [Padura Fue... http://data... Agent 1955 NaN
4 Bruno Leona... [Gelber, Br... http://data... Agent 1941-03-19 NaN

We have some information about several entities matching our query. What other information can we obtain for these entities?

The method retrieve can be used to obtain more information about a particular entity using its identifier. The id column in the table above contains the uris of the different entities, where the identifier is an integer located at the end of each entiry uri.

For example, for the entity Leonardo da Vinci with uri http://data.europeana.eu/agent/base/146741 we can call retrieve as:

resp = apis.entity.retrieve(
   TYPE = 'agent',
   IDENTIFIER = 146741,
)

resp.keys()
dict_keys(['@context', 'id', 'type', 'isShownBy', 'prefLabel', 'altLabel', 'dateOfBirth', 'end', 'dateOfDeath', 'placeOfBirth', 'placeOfDeath', 'biographicalInformation', 'identifier', 'sameAs'])

We observe that the response contains several fields, some of them not present in the suggest method.

The field prefLabel contains a list of the name of the entity in different languages. We can transform this list into a dataframe

def get_name_df(resp):
  lang_name_df = None
  if 'prefLabel' in resp.keys():
    lang_name_df = pd.DataFrame([{'language':lang,'name':name} for lang,name in resp['prefLabel'].items()])
  return lang_name_df

lang_name_df = get_name_df(resp)
lang_name_df.head()
language name
0 ar ليوناردو دا...
1 az Leonardo da...
2 be Леанарда да...
3 bg Леонардо да...
4 bs Leonardo da...

The field biographicalInformation can be useful to know more about the biography of the agent in particular. This information is also multilingual, and can be transformed into a pandas DataFrame

def get_biography_df(resp):
  bio_df = None
  if 'biographicalInformation' in resp.keys():
    bio_df = pd.DataFrame(resp['biographicalInformation'])
  return bio_df

bio_df = get_biography_df(resp)
bio_df.head()
@language @value
0 de Leonardo da...
1 no Leonardo di...
2 hi लिओनार्दो द...
3 fi Leonardo di...
4 be Леана́рда д...

We can access the biography in English for instance in the following way

bio_df['@value'].loc[bio_df['@language'] == 'en'].values[0]
'Leonardo di ser Piero da Vinci (Italian pronunciation: [leoˈnardo da vˈvintʃi] About this sound pronunciation ; April 15, 1452 – May 2, 1519, Old Style) was an Italian Renaissance polymath: painter, sculptor, architect, musician, mathematician, engineer, inventor, anatomist, geologist, cartographer, botanist, and writer. His genius, perhaps more than that of any other figure, epitomized the Renaissance humanist ideal.Leonardo has often been described as the archetype of the Renaissance Man, a man of "unquenchable curiosity" and "feverishly inventive imagination". He is widely considered to be one of the greatest painters of all time and perhaps the most diversely talented person ever to have lived. According to art historian Helen Gardner, the scope and depth of his interests were without precedent and "his mind and personality seem to us superhuman, the man himself mysterious and remote". Marco Rosci states that while there is much speculation about Leonardo, his vision of the world is essentially logical rather than mysterious, and that the empirical methods he employed were unusual for his time.Born out of wedlock to a notary, Piero da Vinci, and a peasant woman, Caterina, in Vinci in the region of Florence, Leonardo was educated in the studio of the renowned Florentine painter Verrocchio. Much of his earlier working life was spent in the service of Ludovico il Moro in Milan. He later worked in Rome, Bologna and Venice, and he spent his last years in France at the home awarded him by Francis I.Leonardo was, and is, renowned primarily as a painter. Among his works, the Mona Lisa is the most famous and most parodied portrait and The Last Supper the most reproduced religious painting of all time, with their fame approached only by Michelangelo's The Creation of Adam. Leonardo's drawing of the Vitruvian Man is also regarded as a cultural icon, being reproduced on items as varied as the euro coin, textbooks, and T-shirts. Perhaps fifteen of his paintings have survived, the small number because of his constant, and frequently disastrous, experimentation with new techniques, and his chronic procrastination. Nevertheless, these few works, together with his notebooks, which contain drawings, scientific diagrams, and his thoughts on the nature of painting, compose a contribution to later generations of artists rivalled only by that of his contemporary, Michelangelo.Leonardo is revered for his technological ingenuity. He conceptualised flying machines, a tank, concentrated solar power, an adding machine, and the double hull, also outlining a rudimentary theory of plate tectonics. Relatively few of his designs were constructed or were even feasible during his lifetime, but some of his smaller inventions, such as an automated bobbin winder and a machine for testing the tensile strength of wire, entered the world of manufacturing unheralded. He made important discoveries in anatomy, civil engineering, optics, and hydrodynamics, but he did not publish his findings and they had no direct influence on later science.'

Now, let’s say that we want to find the biography for all the entities returned by entity.search. We can encapsulate the previous steps into a function that can be applied to the DataFrame reulting from entity.search:

def get_bio_uri(uri):
  id = int(uri.split('/')[-1])
  resp = apis.entity.retrieve(
    TYPE = 'agent',
    IDENTIFIER = id,
  )

  bio_df = get_biography_df(resp)
  bio = bio_df['@value'].loc[bio_df['@language'] == 'en'].values[0]
  return bio

df['bio'] = df['id'].apply(get_bio_uri)
df.head()
prefLabel.en altLabel.en id type dateOfBirth dateOfDeath bio
0 Leonardo da... [Leonardo d... http://data... Agent 1452-04-15 1519-05-02 Leonardo di...
1 Leonardo Leo [Leo, Leona... http://data... Agent 1694-08-05 1744-10-31 Leonardo Le...
2 Leonardo Sc... [Sciascia, ... http://data... Agent 1921-01-08 1989-11-20 Leonardo Sc...
3 Leonardo Pa... [Padura Fue... http://data... Agent 1955 NaN Leonardo Pa...
4 Bruno Leona... [Gelber, Br... http://data... Agent 1941-03-19 NaN Bruno Leona...

The biography in English has been added for each entity. Great!

Something of interest can be the place of birth and death of the agents. We can create a function as:

def get_place_resp(resp, event):

  if event == 'birth':
    if 'placeOfBirth' not in resp.keys():
      return
    place = resp['placeOfBirth']

  elif event == 'death':
    if 'placeOfDeath' not in resp.keys():
      return
    place = resp['placeOfDeath']

  if not place:
    return

  place = list(place[0].values())[0]

  if place.startswith('http'):
     place = place.split('/')[-1].replace('_',' ')
  return place



resp = apis.entity.retrieve(
   TYPE = 'agent',
   IDENTIFIER = 146741,
)
get_place_resp(resp, 'birth')
'Republic of Florence'

Note

The function above parses the URI and extracts the name of the places of birth and date. In reality we should use either the resolve method of the Entity API, if the URI is that of an entity in Europeana’s Entity Collection, or seek to de-reference it using (Linked Data) content negotiation, if it is not known in the Entity Collection.

Now we can add this information to the original DataFrame:

def get_place(uri,event):
  id = int(uri.split('/')[-1])
  resp = apis.entity.retrieve(
    TYPE = 'agent',
    IDENTIFIER = id,
  )
  return get_place_resp(resp,event)


df['placeOfBirth'] = df['id'].apply(lambda x: get_place(x,'birth'))
df['placeOfDeath'] = df['id'].apply(lambda x: get_place(x,'death'))
df.head()
prefLabel.en altLabel.en id type dateOfBirth dateOfDeath bio placeOfBirth placeOfDeath
0 Leonardo da... [Leonardo d... http://data... Agent 1452-04-15 1519-05-02 Leonardo di... Republic of... Kingdom of ...
1 Leonardo Leo [Leo, Leona... http://data... Agent 1694-08-05 1744-10-31 Leonardo Le... None Naples
2 Leonardo Sc... [Sciascia, ... http://data... Agent 1921-01-08 1989-11-20 Leonardo Sc... Racalmuto Sicily
3 Leonardo Pa... [Padura Fue... http://data... Agent 1955 NaN Leonardo Pa... Cuba None
4 Bruno Leona... [Gelber, Br... http://data... Agent 1941-03-19 NaN Bruno Leona... Buenos Aires None

The previous pipeline can be applied to any other agent:

resp = apis.entity.suggest(
   text = 'Marguerite Gérard',
   TYPE = 'agent',
)

df = pd.json_normalize(resp['items'])
df = df.drop(columns=[col for col in df.columns if 'isShownBy' in col])
df['bio'] = df['id'].apply(get_bio_uri)
df['placeOfBirth'] = df['id'].apply(lambda x: get_place(x,'birth'))
df['placeOfDeath'] = df['id'].apply(lambda x: get_place(x,'death'))
df.head()
id type dateOfBirth dateOfDeath prefLabel.en altLabel.en bio placeOfBirth placeOfDeath
0 http://data... Agent 1761-01-28 1837-05-18 Marguerite ... [Marguerite... Marguerite ... France Paris

Finally, we can use the method resolve for obtaining the entity matching a an external URI when it is present as entity in the Europeana Entity Collection. Find more information in the documentation of the Entity API

resp = apis.entity.resolve('http://dbpedia.org/resource/Leonardo_da_Vinci')
resp.keys()
dict_keys(['@context', 'id', 'type', 'isShownBy', 'prefLabel', 'altLabel', 'dateOfBirth', 'end', 'dateOfDeath', 'placeOfBirth', 'placeOfDeath', 'biographicalInformation', 'identifier', 'sameAs'])

Places

One of the types of entities we can work with are places. Let’s get the place of death of the previous agent

place_of_death = df['placeOfDeath'].values[0]
place_of_death
'Paris'

We can now search the entity corresponding to this place by using the suggest method using place as the TYPE argument.

resp = apis.entity.suggest(
   text = place_of_death,
   TYPE = 'place',

)
place_df = pd.json_normalize(resp['items'])
cols = place_df.columns.tolist()
cols = cols[-1:]+cols[:-1]
place_df = place_df[cols]
place_df.head()
prefLabel.en id type isPartOf
0 Paris http://data... Place [{'id': 'ht...
1 La Defense http://data... Place [{'id': 'ht...
2 Jõelähtme P... http://data... Place [{'id': 'ht...
3 Vihula Parish http://data... Place [{'id': 'ht...
4 Põlva Parish http://data... Place [{'id': 'ht...

Let’s use the first uri with the retrieve method

uri = place_df['id'].values[0]
IDENTIFIER = uri.split('/')[-1]

resp = apis.entity.retrieve(
   IDENTIFIER = IDENTIFIER,
   TYPE = 'place',
)
resp.keys()
dict_keys(['@context', 'id', 'type', 'prefLabel', 'altLabel', 'lat', 'long', 'isPartOf', 'sameAs'])

We can reuse the function get_name_df for places as well, as the response has a similar data structure as for agent

name_df = get_name_df(resp)
name_df.head()
language name
0 Paris
1 de Paris
2 en Paris
3 es Paris
4 fr Paris

The response include the field isPartOf, which indicates an entity that the current entity belongs to, if any

is_part_uri = resp['isPartOf'][0]
is_part_uri
'http://data.europeana.eu/place/base/42377'

Let’s see what this misterious uri refers to using the retrieve method

is_part_id = is_part_uri.split('/')[-1]
resp = apis.entity.retrieve(
   IDENTIFIER = is_part_id,
   TYPE = 'place',
)

name_df = get_name_df(resp)
name_df.head()
language name
0 Île-de-France
1 de Île-de-France
2 en Île-de-France
3 es Isla de Fra...
4 fr Région pari...

It had to be the emblematic Île-de-France, of course! And its coordinates are:

f"lat: {resp['lat']}, long: {resp['long']}"
'lat: 48.7, long: 2.5'

Concepts

Let’s query for all concepts

resp = apis.entity.suggest(
   text = 'war',
   TYPE = 'concept',
)

resp['total']
3

We build a table containing the field items, were we can see the name and uri of the different concepts

df = pd.json_normalize(resp['items'])
df = df.drop(columns=[col for col in df.columns if 'isShownBy' in col])
df.head()
id type prefLabel.en
0 http://data... Concept World War I
1 http://data... Concept War photogr...
2 http://data... Concept Raku ware

Do we want to know more information about the first concept of the list? We got it

concept_uri = df['id'].values[0]
concept_uri
'http://data.europeana.eu/concept/base/83'
concept_id = concept_uri.split('/')[-1]
resp = apis.entity.retrieve(
   IDENTIFIER = concept_id,
   TYPE = 'concept',
)

name_df = get_name_df(resp)
name_df.loc[name_df['language'] == 'en']
language name
11 en World War I

The concept is World War I. We can get some related concepts from dbpedia

resp['related'][:5]
['http://dbpedia.org/resource/Category:Wars_involving_Nicaragua',
 'http://dbpedia.org/resource/Category:Wars_involving_the_United_Kingdom',
 'http://dbpedia.org/resource/Category:Wars_involving_Greece',
 'http://dbpedia.org/resource/Category:Wars_involving_Sri_Lanka',
 'http://dbpedia.org/resource/Category:Wars_involving_Czechoslovakia']

The field note contains a multilingual description of the concept

note_df = pd.json_normalize([{'lang':k,'note':v[0]} for k,v in resp['note'].items()])
note_df.head()
lang note
0 ar الحرب العال...
1 az Birinci dün...
2 be Першая сусв...
3 bg Първата све...
4 bs Prvi svjets...

We can obtain the description for a particular language as

note_df['note'].loc[note_df['lang'] == 'en'].values[0]
"World War I (WWI or WW1), also known as the First World War, was a global war centred in Europe that began on 28 July 1914 and lasted until 11 November 1918. From the time of its occurrence until the approach of World War II, it was called simply the World War or the Great War, and thereafter the First World War or World War I. In America, it was initially called the European War. More than 9 million combatants were killed; a casualty rate exacerbated by the belligerents' technological and industrial sophistication, and tactical stalemate. It was one of the deadliest conflicts in history, paving the way for major political changes, including revolutions in many of the nations involved.The war drew in all the world's economic great powers, which were assembled in two opposing alliances: the Allies (based on the Triple Entente of the United Kingdom, France and the Russian Empire) and the Central Powers of Germany and Austria-Hungary. Although Italy had also been a member of the Triple Alliance alongside Germany and Austria-Hungary, it did not join the Central Powers, as Austria-Hungary had taken the offensive against the terms of the alliance. These alliances were both reorganised and expanded as more nations entered the war: Italy, Japan and the United States joined the Allies, and the Ottoman Empire and Bulgaria the Central Powers. Ultimately, more than 70 million military personnel, including 60 million Europeans, were mobilised in one of the largest wars in history.Although a resurgence of imperialism was an underlying cause, the immediate trigger for war was the 28 June 1914 assassination of Archduke Franz Ferdinand of Austria, heir to the throne of Austria-Hungary, by Yugoslav nationalist Gavrilo Princip in Sarajevo. This set off a diplomatic crisis when Austria-Hungary delivered an ultimatum to the Kingdom of Serbia, and international alliances formed over the previous decades were invoked. Within weeks, the major powers were at war and the conflict soon spread around the world.On 28 July, the Austro-Hungarians fired the first shots in preparation for the invasion of Serbia. As Russia mobilised, Germany invaded neutral Belgium and Luxembourg before moving towards France, leading Britain to declare war on Germany. After the German march on Paris was halted, what became known as the Western Front settled into a battle of attrition, with a trench line that would change little until 1917. Meanwhile, on the Eastern Front, the Russian army was successful against the Austro-Hungarians, but was stopped in its invasion of East Prussia by the Germans. In November 1914, the Ottoman Empire joined the war, opening fronts in the Caucasus, Mesopotamia and the Sinai. Italy and Bulgaria went to war in 1915, Romania in 1916, and the United States in 1917.The war approached a resolution after the Russian government collapsed in March, 1917, and a subsequent revolution in November brought the Russians to terms with the Central Powers. On 4 November 1918, the Austro-Hungarian empire agreed to an armistice. After a 1918 German offensive along the western front, the Allies drove back the Germans in a series of successful offensives and began entering the trenches. Germany, which had its own trouble with revolutionaries, agreed to an armistice on 11 November 1918, ending the war in victory for the Allies.By the end of the war, four major imperial powers—the German, Russian, Austro-Hungarian and Ottoman empires—ceased to exist. The successor states of the former two lost substantial territory, while the latter two were dismantled. The map of Europe was redrawn, with several independent nations restored or created. The League of Nations formed with the aim of preventing any repetition of such an appalling conflict. This aim failed, with weakened states, renewed European nationalism and the German feeling of humiliation contributing to the rise of fascism and the conditions for World War II."

Tips for using entities with the Search API

Once we know the identifier for a certain entity we can use the Search API to obtain objects containing it.

For instance we can query objects containing the entity “Painting” using its uri http://data.europeana.eu/concept/base/47

concept_uri = 'http://data.europeana.eu/concept/base/47'
resp = apis.search(
    query = f'"{concept_uri}"'
)

resp['totalResults']
120708

Notice that in order to use a uri as a query we need to wrap it in quotation marks ““.

We might want to query for object belonging to more than one entity. We can simply do that by using logical operators in the query. Querying for paintings from the 16th century:

resp = apis.search(
    query = '"http://data.europeana.eu/timespan/16" AND "http://data.europeana.eu/concept/base/47"',
    media = True,
    qf = 'TYPE:IMAGE'
)

resp['totalResults']
300

Queyring for paintings with some relation to Paris

resp = apis.search(
    query = '"http://data.europeana.eu/place/base/41488" AND "http://data.europeana.eu/concept/base/47"',
    media = True,
    qf = 'TYPE:IMAGE'
)

resp['totalResults']
14

When querying for entities uris, the objects returned are those that have the requested uris in the metadata.

However, not all objects contain this information and instead many of them contain the name of the entity. It is always a good idea to query for the name of the entities as well, as there might be more objects:

resp = apis.search(
    query = 'Paris AND Painting',
    media = True,
    qf = 'TYPE:IMAGE'
)

resp['totalResults']
3612

Conclusions

In this tutorial we learned:

  • What types of entities are available in the Europeana Entity API

  • To use the suggest method for obtaining entities of a certain type matching a text query

  • To use the retrieve method for obtaining information about an individual entity of a certain type

  • To use the method resolve for obtaining entities that match a query url

  • To process some of the fields contained in the responses of the methods above and convert the responses to Pandas dataframes

  • To query for entities using Europeana Search API