To get an overview of the massive dataset and to find interesting stories for the article, some research and prototyping was necessary. This resulted in a small tool and API which we open-sourced later on:
Smart Wordcounter. The tool does some normalization on the input text, like removing stop-words and then counts word frequencies. You can find a more detailed explanation of the wordcounter in this
blog post.
Using the wordcounter, we were able to build another prototype of an API, which allowed us to search for a word in the parliament transcripts and get the time series frequencies of this particular word. This research tool was used to find interesting words for the article and as a proof of concept.