More than 4,200 plenary sessions were held in the German Parliament since its founding in 1949. All of these sessions have been protocolled and the documents are available to the public. However, they have never been evaluated. The goal of this project was to turn the protocols into machine readable formats to perform a detailed speech analysis.
To get an overview of the massive dataset and to find interesting stories for the article, some research and prototyping was necessary. This resulted in a small tool and API which we open-sourced later on: Smart Wordcounter. The tool does some normalization on the input text, like removing stop-words and then counts the word frequencies. You can find a more detailed explanation of the wordcounter in this blog post.
Using the wordcounter, we were able to build another prototype of an API, which allowed us to search for a word in the parliament protocols and get the time series frequencies of the particular word. This research tool was used to find interesting words for the article and as a proof of concept.
For visualizing the topics over time, a simple area chart visualization was chosen. The area chart is used overall the article so that users can easily understand the data that is visualized.
Besides that some other visual elements were developed, like video loops of the moment a particular word was said for the first time.
We supported the Zeit Online team in multiple stages of this project and helped to process, analyze and visualize the data.