PyData Eindhoven 2022

FuzzyTM: a Python package for fuzzy topic models
12-02, 10:55–11:25 (Europe/Amsterdam), Planck

We present FuzzyTM, a Python library for training fuzzy topic models and creating topic embeddings for downstream tasks. Its modular design allows researchers to modify each software element and for future methods to be added. Meanwhile, the user-friendly pipelines with default values allow practitioners to train a topic model with minimal effort.


The volume of data/information created is growing exponentially and forecasted to reach 181 zettabyte by 2025. Approximately 80% of today’s data is composed of unstructured or semi-structured data. Analyzing all this data is time intensive and costly in many cases. One technique to systematically analyze large corpora of texts is topic modeling, which returns the latent topics present in a corpus. Recently, several fuzzy topic modeling algorithms have been proposed and have shown superior results over the existing algorithms. Although various Python libraries offer topic modeling algorithms, none includes fuzzy topic models. Therefore, we present FuzzyTM, a Python library for training fuzzy topic models and creating topic embeddings for downstream tasks. Its modular design allows researchers to modify each software element and for future methods to be added. Meanwhile, the user-friendly pipelines with default values allow practitioners to train a topic model with minimal effort.


Prior Knowledge Expected

No previous knowledge expected

PhD candidate at the Jheronimus Academy of Data Science/ Eindhoven University of Technology in Natural Language Processing for mental health care.