Skip to main content


Bi-Term topic modeling in R

As large language models (LLMs) have become all the rage recently, we can look to small scale modeling again as a useful tool to researchers in the field with strictly defined research questions that limit the use of language parsing and modeling to the bi term topic modeling procedure. In this blog post I discuss the procedure for bi-term topic modeling (BTM) in the R programming language. One indication of when to use the procedure is when there is short text with a large "n" to be parsed. An example of this is using it on twitter applications, and related social media postings. To be sure, such applications of text are becoming harder to harvest from online, but secondary data sources can still yield insightful information, and there are other uses for the BTM outside of twitter that can bring insights into short text, such as from open ended questions in surveys.   Yan et al. (2013) have suggested that the procedure of BTM with its Gibbs sampling procedure handles short t
Recent posts

Sentiment mining in Educational Research

One of the questions that persisted recently is whether to mine public sentiments over current events that affect the education community in Houston and the greater world. Recently I mined the affect of covid-19, and the decision to go online versus staying in schools and teaching methodologies (the article can be found at: ), and this proved to be an essential scientific journey as it found that there were several contentions at play. The sentiment also indicated emotional divisions between groups as well.  But is such an exercise important, and what does it mean to do it? With the recent push in politics to have parental voices push back on the curriculum, there has not been a more important time in  vox populi  as it affects what might be included and excluded in the curriculum. Parental voices have reached a tipping point in what goes on in some states. While this blog and its writer stay neutral on what side of the politics the result h

Thoughts on publishing in a mega-journal

Results of publishing my latest paper  Just recently I published my first scientific (as opposed to anthropological) article in a mega-journal (See article at ). I searched for a home for the article in two different journals prior, but it had been rejected after one journal editor noted that the article was technically sound. Another editor stated that the topic modeling would be of no interest to readers. The First editor stated that the article was too narrow a topic generally for their readership to be published in their journal. Considering that there were only less than a handful of journals existing that would handle this article, I was given the suggestion by one of the editors to consider the present home for the article, Heliyon .  The mega-journal Heliyon I found  Heliyon  having taken the editor's advice, knowing very little about the journal other than it being run by Elsevier. I could see that it published articles in diff

Getting intersectional with methodologies: Going reactive, getting archival, getting big, with data

Considering Newer Research Methodologies Presently most researchers will consider themselves in one preset methodologies set forth decades ago in the mid 2000s. These are usually divided into qualitative, quantitative, or mixed methods.  This is fine and good, and allows researchers to fall back on traditions that have been years in the making. This is how we expand precedent and appeal to previous logic to ground the case that our data collection is sound. However, what about making the case that that it is time for new methodologies more intersectional than mixed? Can we add richness to research methodologies and take on some of the emerging issues in education when we invite transdisciplinary involvement with research data? One methodology that I have considered recently is traversing qualitative research with content analysis, and digital humanities methods. I argued that starting from the archive, staging, and preliminary analysis, borrowing from data science, gives researchers co

Automating GPA and Hours for administrative purposes, University of Houston: the 'coogs' package

  In the realm of institutional effectiveness, it is often necessary to batch process the hours earned and gpas of both the content area and cumulative area for undergraduates that are applying for particular majors in certain programs of study. Such calculations involve many students applying at one time for majors. Therefore, one can either calculate tens to hundreds of students at a time or automate the process. To ease the process through automation, I have created a function in R called 'bulkgpa' in the 'coogs' package, available to the institutional effectiveness community at the College of Education at the University of Houston.  The function is a hard worker. It takes three raw files directly from peoplesoft queries and cleanses them by eliminating unneeded columns, duplicated rows, and eliminates classes that have drop dates associated with them.  Argument slots are created for raw data excel spreadsheets including transfer classes, transfer hours, UH course

Getting past the two column PDF to extract text into RQDA: Literature reviews await

One of the great promises of working with RQDA is conceiving of it as computer assisted literature review software. This requires balancing the right amount of coding with text that can be used as warrants and backing in arguments. In theory it is a great idea--using computer assisted qualitative data analysis software (CAQDAS) for literature reviews, but how do you get the article PDFs into R and RQDA in a human readable format? By this I mean that many empirical articles are written in two column formats, and text extraction with standard tools produces text on the diagonal. Extracting PDF texts under this circumstance can be daunting when using some R packages such as 'pdftools', either with or without the assistance of  the 'tesseract' package. If you are working on a windows based computer, you can install three packages and Java to do the trick. First gather the literature articles that you would like to mark up in RQDA. Put them into a folder, and away you go.

Importing excel-based data into RQDA for a literature review

One of the great difficulties surrounding how to batch load texts into RQDA is that the package requires imports from .txt files (Huang, 2018). I recently had an issue with this procedure as I was starting a literature review by way of reviewing abstracts in a .bib format. I had taken abstracts from the EBSCO search engine, and from 'rectangulate()' function in the r7283 package, had transformed a bibtext file into an excel spreadsheet with a tidy display of one row per abstract and citiaton information for 140 empirical articles. While the excel file makes for an easy upload into proprietary software, I had never attempted to batch upload this format of text into RQDA. That is, until I came across a how-to artifact from an unknown workshop (click the link to find the original set of instructions) (Rstudio pubs, n.d.). The main pieces of help came from the idea that users could take a spreadsheet, list and make as characters each in a series of column cells, and then identif