We recently worked on a project with Zeit Online which is analyzing the frequency of words in the Bundestag’s (the german parliament) speeches. For research purposes we built a tool for counting words using NLP techniques. The tool removes stop words and transforms the words to a common base (lemmatize) before actually counting the words. It is open source and you can try it here:
Python was the language of choice because it is one of the most prolific languages for NLP, mostly because of the large ecosystem of stable and complete libraries, like:
We decided to start from the ground up, that’s why we choose to try out what’s possible using the NLTK library.
Our word-counting tool performs the following operations: