This blog post on Polimedia, written by Max Kemman, is the third in our series of blog posts from the Veni Competition submissions. You can vote in the People’s Choice for Polimedia. Polimedia has been shortlisted in the final eight.
How do media cover political debates? A difficult question to investigate.
Answering this question requires cross-media analysis of the minutes of the political debates, newspaper articles and radio bulletins. However, such an undertaking is not straightforward as it requires collecting material from several archives, and scrutinizing a lot of material to find interesting content. Up until now, the focus of students has been on doing manual and qualitative research, as newspaper articles have only been available in analogue format. Other media types such as radio bulletins have been neglected even more since these were hardly available to students. In recent years, archives of major Dutch newspapers, radio bulletins and the transcripts of the Dutch parliament, have been digitised and made available as open datasets. This contains an enormous advantage, as material can now be accessed from the comfort of one’s own desk. However, since the available data is very large, another challenge arises; it is a cumbersome and challenging task for students to analyse media items from different datasets since all digital archives and datasets have different interfaces. Via www.polimedia.nl we aimed to link the Dutch Hansard (1945-1995) to the databases of historical newspapers (1945-1995) and ANP radio bulletins (1945-1984) to allow cross-media analysis of coverage in a uniform search interface, and make it easier to gain an overview of coverage related to politics.
Linking Open Data to facilitate cross-media analysis
The links were made by using the speeches in political debates; fragments from single speakers. For each speech, we extracted relevant information: from the metadata we used the speaker and the date, while via Named Entity Recognition we extracted important term from its content and from the description of the complete debate. This information was then used to search the archives of newspapers and radio bulletins. When media items were retrieved, a link was created between the speech and the media item using semantic web technologies. We evaluated a set of links and found a recall of 62% and a precision of 75%, in which links were deemed relevant when the media item referred to the speech or to the debate.
Helping researchers and students use the data
To navigate these links, a search user interface was developed based on a requirements study with five scholars in history and political communication and a meeting with a UI-designer. This search interface allows searching the Dutch Hansard in full-text, where the debates and speeches are then represented with links to the original locations of the media items. The searcher can refine the search results by time period, politician and political party. This search interface was evaluated in an eye tracking study with 24 scholars, from which it was found that the search interface enabled searchers to perform both searches to specific events in debates, as well as perform exploratory searches to analyse a topic over time.
Although PoliMedia already make cross-media analysis easier, it’s not yet finished. First, the search algorithm could potentially be improved to gain better recall and precision scores. Second, links to audiovisual archives to include television material would be an interesting undertaking. Moreover, while we created a user-friendly search interface, the links can also be put to use by technical researchers to link to even more databases, or perform quantitative analyses. Therefore, a sparql endpoint is under development so that the links from the PoliMedia project can be put to further use.