Skip to main content

Getting intersectional with methodologies: Going reactive, getting archival, getting big, with data

Considering Newer Research Methodologies

Presently most researchers will consider themselves in one preset methodologies set forth decades ago in the mid 2000s. These are usually divided into qualitative, quantitative, or mixed methods.  This is fine and good, and allows researchers to fall back on traditions that have been years in the making. This is how we expand precedent and appeal to previous logic to ground the case that our data collection is sound. However, what about making the case that that it is time for new methodologies more intersectional than mixed? Can we add richness to research methodologies and take on some of the emerging issues in education when we invite transdisciplinary involvement with research data?

One methodology that I have considered recently is traversing qualitative research with content analysis, and digital humanities methods. I argued that starting from the archive, staging, and preliminary analysis, borrowing from data science, gives researchers control over what happens downstream with content analysis in smaller chunks. We live in a time where data can be almost 100% completely captured to examine the entirety of a phenomenon. Take the case of microblogging, or Facebook postings, or other content sourcing. It it possible to collect all document traces related to a phenomenon and capture it and meta tag it as big data sources. This possibility means a need for change in research approaches.

Build the archive

In a movement of large gestures, the researcher is interested in collecting the archive. This means defining the research questions so that the capture is large enough to ask large, sweeping questions that big data can help answer. For example, a researcher might use the question, "What is Facebook user sentiment towards Covid-19?" or, "How does love get defined in the 18th century British novel?" The archive becomes the data that helps build the answers to the questions that you have posed. There are two requirements at this stage of building the archive: (1) The archive should include the whole of the phenomenon, (2) the documents must be scalable with some kind of search aid, (3) The archive must be viewable.


Display the archive

The fact that the archive is built first is not lost on other procedurals coming after. On the contrary, part of its sole purpose is to display what has been gathered of the totality of the phenomenon, or in the case of microblogging or Facebook posts, the discourse. Here is where the archive takes a dramatic turn. For those that would say researchers can physically cut and paste materials from the archive to search it, contour it, and draw conclusions from it, there are more advanced ways of displaying the archive at hand. The dashboard is one such way of presenting advanced ways of seeing the data in the archive. It has only been recently that dashboards have been around and perhaps for the first time that the dashboard has been suggested as a way to build and display the archive. With the dashboard comes the idea that the archive can be built reactively. 

Build a Reactive archive 

One way to build the archive is to do it reactively (Wickham, 2021). To assist with this, it is possible to use shiny apps that can help ease one's way through the data. First, the shiny app represents a webpage that holds the data in either one or more data frames that become accessible to the user on the client side of the page. Underneath the code on the server side, is the interactive code that allows R to react to selections made from drop down menus, slider menus, and so on, so that distinct views can be made of the data. The code below shows how one can create the server side and the reactive input as ways to navigate the webpage on a bi-term topic model as fed to the algorithm by data in the beginning of the function.
Figure 1: The Reactive Archive and the Server

The output is then a series of topics that are selected by the user selecting a number from a slider, resulting in probabilistic terms from topic model showing in a data frame on the screen.

Slicing as showing data and encouraging cutting through data

Shiny apps are a way to communicate results or perspectives clearly to uninitiated or novice R users who require quick snapshots of summarized views into the data or models that have been constructed. In the example here, the archive can be sliced with important views shown in many different ways. For example, perhaps a verbs list has been shown for the most widely used action verbs in the texts, or sentiment analysis has been calculated to determine positive affect in the archive. What we expect is insights coming from different directions, different calculations, all laying bare significant insights on the texts at hand that are not necessarily viewable from the archive, but are viewable with summary dashboards. 

Writing Through the Universe by Sampling the Archive

In this methodology, some quantitative skills are at play. For example, as the Archive is treated as the universe of the phenomenon in play, a randomizer could be called upon to query the archive, which would then allow for conclusions to be made about the data evidence called upon in the randomized evidence. It would be assumed that the archive holds tremendous conclusions and multiple perspectives waiting to be discovered. It is also assumed that some of the conclusions might be at odds with one another, or outright contradicting one another. 

Does Sampling the Archive make it Quantitative?


The immediacy of falling back on what is already known (the qualitative /quantitative debate) might be jarring to those who are comfortably situated in their research tradition. However, the idea of sampling the voicings within an archive would allow for themes to come forth that might otherwise remain dormant. This is not to say that the archive is probabilistic (which it might be); rather, it is open to possibilities for what we might find in it through sampling.

Conclusion: Newer Methodologies for newer data assumptions

In the early to the late 2000s it was common to thematicize a group of qualitative findings after interviewing a handful of participants through qualitative research means and write about it deeply and with thick description. These few accounts would be made to be reachable to readers that found resonance in them. However, as the information age wears on, it is possible to find a totality of the phenomenon expressed. In many cases, these social phenomena can be collected and preserved for the study, and then for the study analysis that would come after, through a combination of digital humanities, content analysis, and qualitative research methodologies. Transecting these methodologies, when appropriate, forges a way forward to create an archive that is displayable, discoverable, and acting as source material for potentially numerous researchers to draw conclusions with either natural language processing methods or with smaller scale qualitative research. 

References


Wickham, H. (2021). Mastering Shiny: build interactive apps, reports, and dashboards powered by r.O'Reilly Media.


Popular posts from this blog

Digital Humanities Methods in Educational Research

Digital Humanities based education Research This is a backpost from 2017. During that year, I presented my latest work at the 2017  SERA conference in Division II (Instruction, Cognition, and Learning). The title of my paper was "A Return to the Pahl (1978) School Leavers Study: A Distanced Reading Analysis." There are several motivations behind this study, including Cheon et al. (2013) from my alma mater .   This paper accomplished two objectives. First, I engaged previous claims made about the United States' equivalent of high school graduates on the Isle of Sheppey, UK, in the late 1970s. Second, I used emerging digital methods to arrive at conclusions about relationships between unemployment, participants' feelings about their  (then) current selves, their possible selves, and their  educational accomplishm ents. I n the image to the left I show a Ward Hierarchical Cluster reflecting the stylometrics of 153

Bi-Term topic modeling in R

As large language models (LLMs) have become all the rage recently, we can look to small scale modeling again as a useful tool to researchers in the field with strictly defined research questions that limit the use of language parsing and modeling to the bi term topic modeling procedure. In this blog post I discuss the procedure for bi-term topic modeling (BTM) in the R programming language. One indication of when to use the procedure is when there is short text with a large "n" to be parsed. An example of this is using it on twitter applications, and related social media postings. To be sure, such applications of text are becoming harder to harvest from online, but secondary data sources can still yield insightful information, and there are other uses for the BTM outside of twitter that can bring insights into short text, such as from open ended questions in surveys.   Yan et al. (2013) have suggested that the procedure of BTM with its Gibbs sampling procedure handles sho

Persisting through reading technical CRAN documentation

 In my pursuit of self learning the R programming language, I have mostly mastered the art of reading through CRAN documentation of R libraries as they are published. I have gone through everything from mediocre to very well documented sheets and anything in between. I am sharing one example of a very good function that was well documented in the 'survey' library by Dr. Thomas Lumley that for some reason I could not process and make work with my data initially. No finger pointing or anything like that here. It was merely my brain not readily able to wrap around the idea that the function passed another function in its arguments.  fig1: the  svyby function in the 'survey' library by Thomas Lumley filled in with variables for my study Readers familiar with base R will be reminded of another function that works similarly called the aggregate  function, which is mirrored by the work of the svyby function, in that both call on data and both call on a function towards t