Skip to main content

Digital Humanities Methods in Educational Research

Digital Humanities based education Research



This is a backpost from 2017. During that year, I presented my latest work at the 2017 SERA conference in Division II (Instruction, Cognition, and Learning). The title of
my paper was "A Return to the Pahl (1978) School Leavers Study: A Distanced Reading Analysis." There are several motivations behind this study, including Cheon et al. (2013) from my alma mater. 

This paper accomplished two objectives. First, I engaged previous claims made about the United States' equivalent of high school graduates on the Isle of Sheppey, UK, in the late 1970s. Second, I used emerging digital methods to arrive at conclusions about relationships between unemployment, participants' feelings about their (then) current selves, their possible selves, and their educational accomplishments.





In the image to the left I show a Ward Hierarchical Cluster reflecting the stylometrics of 153 essays (red-boxed p-values indicating statistically significant clusters at p > .05).  This goes to answering the question of plagiarism (a question not asked in my 2017 paper). Very little information is available concerning efforts to protect the original Pahl (1978) study's external validity, a question first posed by Lyon et al. (2012). The cluster model homes in on document similarities with high statistical significance. Such models are increasingly possible with the R environment, where statistical learning and automated content analysis are being ushered into a new era. 





In the image to the right, I display the median sentiment scores (measured on a scale of positive/negative) in the essays of three age groups (15, 16, 17+) of participants in the Pahl (1978) study. This is again, a question not explored in the upcoming paper, but rather, a snapshot of how the median scores differ across age groups. In interpreting the display of medians, the essay evidence points to those participants who left school at a later age worrying about the availability of future work as highly trained professionals. This reading is supported by cases that describe increased competition in the labor market necessitating the delay of work entry combined with the stress of passing O-levels with high marks for increased opportunity. At issue is whether taking the extra time for studies payed off monetarily. This doubt is seen in the lower sentiment scores among the 17+ group. In contrast, the 16 age group described the prospects of getting out into the workforce and beginning their lives post-schooling.  The slightly higher scores reflect some of this meaning contained in the 16 age group essays. Finally, the single case of a 15 year old student reflects the data as it was collected in 1978, recalling that the present one is a descriptive study. This kind of fine grain microanalysis is possible through the combined use of digital research methods such as concordances and sentiment analysis.




The skills involved in parsing text and arriving at conclusions to educational research questions are complex, interdisciplinary, and hold the potential for creating novel algorithms that assist the assessment of writing at multiple grade levels. To flesh out this last idea, I am currently working with a Machine Learning-based classification algorithm.  An important component of the work is obtaining a large data set of student essays to work upon so that the algorithm "teaches" itself.

References

Cheon, J. Lee, S., Smith, W., Song, J., & Kim, Y.. (2013). The Determination of Children's      Knowledge of Global Lunar Patterns from Online Essays Using Text Mining Analysis. Research In Science Education, 43(2), 667-686. Retrieved from http://dx.doi.org/10.1007/s11165-012-9282-5

Lyon, D., Morgan Brett, B., & Crow, G. (2012). Working with material from the sheppey archive. The International Journal of Social Research Methodology, 15(4), 301 - 309. Retrieved from http://dx.doi.org/10.1080/13645579.2012.688314

Pahl, R. (1984). School leavers study, 1978. [data collection]. 2nd edition. UK Data Service. SN 4867.

Popular posts from this blog

Creating Examination Question Banks for ESL Civics Students based on U.S. Form M-638

R and Latex Code in the Service of Exam Questions   The following webpage is under development and will grow with more information. The author abides by the GPL (>= 2) license provided by the "ProfessR" package by showing basic code, but not altering it. The code that is provided here is governed by the MIT license, copyright 2018, while respecting the GPL (>=2) license. Rationale Apart from the limited choices of open sourced, online curriculum building for adult ESL students (viz. elcivics.com), there is a current need to create open-sourced assessments for various levels of student understandings of the English language. While the U.S. Citizenship and Immigration Services (https://www.uscis.gov/citizenship) has valuable lessons for beginning and intermediate ESL civics learners, there exists a need to provide more robust assessments, especially for individuals repeating ESL-based civics courses. This is because the risks and efforts involved in applying

Bi-Term topic modeling in R

As large language models (LLMs) have become all the rage recently, we can look to small scale modeling again as a useful tool to researchers in the field with strictly defined research questions that limit the use of language parsing and modeling to the bi term topic modeling procedure. In this blog post I discuss the procedure for bi-term topic modeling (BTM) in the R programming language. One indication of when to use the procedure is when there is short text with a large "n" to be parsed. An example of this is using it on twitter applications, and related social media postings. To be sure, such applications of text are becoming harder to harvest from online, but secondary data sources can still yield insightful information, and there are other uses for the BTM outside of twitter that can bring insights into short text, such as from open ended questions in surveys.   Yan et al. (2013) have suggested that the procedure of BTM with its Gibbs sampling procedure handles sho