Bayesian Topic Modelling
Using non-parametric models and variational inference in text analysisimage: featured.jpeg
In the business context, understanding customer feedback in surveys, identifying complaints among many emails, identifying urgent interventions in maintenance reports, monitoring reputational risk on social networks, and identifying business opportunities are some of the cases where the use of Topic Modeling is proving to be a differentiating factor.
Text analytics allows us to uncover ideas, concepts, and relationships that are “hidden” within words. Topic modeling is a specific method of text analytics, where the goal is to group documents based on their topics. In other words, it involves cluster analysis of text data and requires working with high-dimensional matrices—which represent the probabilities of word occurrences within the text.
In this talk, we present the application of a nonparametric model with a Bayesian approach to the classification of a document collection. We use variational inference because it allows us to approximate the posterior distribution with lower computational cost than that required by an MCMC algorithm. This approach, presented in Dunson and Xing [2009] and Ahlmann-Eltze and Yau [2018], has been used in the clustering of high-dimensional categorical data in fields other than text analytics. We compare the results with those obtained from machine learning algorithms commonly used in this area, discuss potential improvements to the model, and reflect on the wider application of this type of approach in the business environment.
Keywords: Bayesian, Clustering, Variational Inference, Text Analytics, Topic Modeling, MixDir, R
Lecture given at the III INTERNATIONAL CONFERENCE ON STOCHASTIC PROCESSES, RANDOM PHENOMENA AND THEIR APPLICATIONS. Organized by the National University of Engineering (UNI) of Lima, Peru. The event took place from December 3-5, 2020.
