Skip to main content


Showing posts from 2023

Bi-Term topic modeling in R

As large language models (LLMs) have become all the rage recently, we can look to small scale modeling again as a useful tool to researchers in the field with strictly defined research questions that limit the use of language parsing and modeling to the bi term topic modeling procedure. In this blog post I discuss the procedure for bi-term topic modeling (BTM) in the R programming language. One indication of when to use the procedure is when there is short text with a large "n" to be parsed. An example of this is using it on twitter applications, and related social media postings. To be sure, such applications of text are becoming harder to harvest from online, but secondary data sources can still yield insightful information, and there are other uses for the BTM outside of twitter that can bring insights into short text, such as from open ended questions in surveys.   Yan et al. (2013) have suggested that the procedure of BTM with its Gibbs sampling procedure handles short t

Sentiment mining in Educational Research

One of the questions that persisted recently is whether to mine public sentiments over current events that affect the education community in Houston and the greater world. Recently I mined the affect of covid-19, and the decision to go online versus staying in schools and teaching methodologies (the article can be found at: ), and this proved to be an essential scientific journey as it found that there were several contentions at play. The sentiment also indicated emotional divisions between groups as well.  But is such an exercise important, and what does it mean to do it? With the recent push in politics to have parental voices push back on the curriculum, there has not been a more important time in  vox populi  as it affects what might be included and excluded in the curriculum. Parental voices have reached a tipping point in what goes on in some states. While this blog and its writer stay neutral on what side of the politics the result h