Bayesian Topic Modelling

Conference

Using non-parametric models and variational inference in text analysisimage: featured.jpeg

Author

Romy R. Ravines

Published

December 5, 2020

Event

Charla en III CONFERENCIA INTERNACIONAL DE PROCESOS ESTOCÁSTICOS, FENÓMENOS ALEATORIOS Y SUS APLICACIONES

Location

Lima, Perú

Slides Video

In the business context, understanding customer feedback in surveys, identifying complaints among many emails, identifying urgent interventions in maintenance reports, monitoring reputational risk on social networks, and identifying business opportunities are some of the cases where the use of Topic Modeling is proving to be a differentiating factor.

Text analytics allows us to uncover ideas, concepts, and relationships that are “hidden” within words. Topic modeling is a specific method of text analytics, where the goal is to group documents based on their topics. In other words, it involves cluster analysis of text data and requires working with high-dimensional matrices—which represent the probabilities of word occurrences within the text.

In this talk, we present the application of a nonparametric model with a Bayesian approach to the classification of a document collection. We use variational inference because it allows us to approximate the posterior distribution with lower computational cost than that required by an MCMC algorithm. This approach, presented in Dunson and Xing [2009] and Ahlmann-Eltze and Yau [2018], has been used in the clustering of high-dimensional categorical data in fields other than text analytics. We compare the results with those obtained from machine learning algorithms commonly used in this area, discuss potential improvements to the model, and reflect on the wider application of this type of approach in the business environment.

Keywords: Bayesian, Clustering, Variational Inference, Text Analytics, Topic Modeling, MixDir, R

Lecture given at the III INTERNATIONAL CONFERENCE ON STOCHASTIC PROCESSES, RANDOM PHENOMENA AND THEIR APPLICATIONS. Organized by the National University of Engineering (UNI) of Lima, Peru. The event took place from December 3-5, 2020.