Open Source
September 17, 2019

We built a simple NLP tool for counting word frequency

Giuseppe Di Vincenzo
We recently worked on a project with Zeit Online which is analyzing the frequency of words in the Bundestag's (the german parliament) speeches. For research purposes we built a tool for counting words using NLP techniques. The tool removes stop words and transforms the words to a common base (lemmatize) before actually counting the words. It is open source and you can try it here:

How does it work ?

Python was the language of choice because it is one of the most prolific languages for NLP, mostly because of the large ecosystem of stable and complete libraries, like:
  • NLTK: a widely adopted toolkit for natural language processing
  • spacy: a complete and deep-learning powered library
  • TextBlob: a very simple API for NLP operations
We decided to start from the ground up, that's why we choose to try out what's possible using the NLTK library.
Our word-counting tool performs the following operations:
The tool can be used over HTTP thanks to a small Flask API server.
View the full source for the frontend here and for the backend here.
Further Reading
webkid logo
webkid GmbH
Kohlfurter Straße 41/43
10999 Berlin
+49 30 983 227 20