We built a simple NLP tool for counting word frequency

We recently worked on a project with Zeit Online which is analyzing the frequency of words in the Bundestag’s (the german parliament) speeches. For research purposes we built a tool for counting words using NLP techniques. The tool removes stop words and transforms the words to a common base (lemmatize) before actually counting the words. It is open source and you can try it here:

How does it work ?

Python was the language of choice because it is one of the most prolific languages for NLP, mostly because of the large ecosystem of stable and complete libraries, like:

  • NLTK: a widely adopted toolkit for natural language processing
  • spacy: a complete and deep-learning powered library
  • TextBlob: a very simple API for NLP operations

We decided to start from the ground up, that’s why we choose to try out what’s possible using the NLTK library.

Our word-counting tool performs the following operations:

The tool can be used over HTTP thanks to a small Flask API server.

View the full source for the frontend here and for the backend here.