Zeit Online

What the Bundestag Is Talking About

An in-depth analysis of all speeches held in the German federal parliament since its foundation in 1949

Natural Language Processing
Data Analysis
API Development
Data Visualization
Kai Biermann
Paul Blickle
Ron Drongowski
Annick Ehmann
and more...
More than 4,200 plenary sessions have been held in the German Parliament since its founding in 1949. All of these sessions have been transcribed and are available to the public. However, they have never been evaluated in full. The goal of this project was to turn the transcripts into machine readable formats to allow for more extensive study such as a detailed speech analysis.
The application lets you search for every word ever spoken in the parliament
The application lets you search for every word ever spoken in the parliament
To get an overview of the massive dataset and to find interesting stories for the article, some research and prototyping was necessary. This resulted in a small tool and API which we open-sourced later on: Smart Wordcounter. The tool does some normalization on the input text, like removing stop-words and then counts word frequencies. You can find a more detailed explanation of the wordcounter in this blog post. Using the wordcounter, we were able to build another prototype of an API, which allowed us to search for a word in the parliament transcripts and get the time series frequencies of this particular word. This research tool was used to find interesting words for the article and as a proof of concept.
Visualization of word frequencies
Visualization of word frequencies
For visualizing the topics over time, a simple area chart visualization was chosen. The area chart is used throughout the article so that users can easily understand the data that is visualized.
Additionally, some other visual elements were developed, like video loops of the moment a particular word was said for the first time.
Topics that have characterized a decade
Topics that have characterized a decade
We supported the Zeit Online team in multiple stages of this project and helped to process, analyze and visualize the data.
Technology Stack
Python NLTK
Are you interested in a collaboration?
Contact us
Featured Projects
webkid logo
webkid GmbH
Kohlfurter Stra├če 41/43
10999 Berlin
+49 30 232 575 450