To get an overview of the massive dataset and to find interesting stories for the article, some research and prototyping was necessary. This resulted in a small tool and API which we open-sourced later on: Smart Wordcounter
. The tool does some normalization on the input text, like removing stop-words and then counts word frequencies. You can find a more detailed explanation of the wordcounter in this blog post
Using the wordcounter, we were able to build another prototype of an API, which allowed us to search for a word in the parliament transcripts and get the time series frequencies of this particular word. This research tool was used to find interesting words for the article and as a proof of concept.