Skip to main content

Posts

The 'jstor_ocr' function in the 'r7283' package for concatenating ocr and metadata from JSTOR's Data for Research

Digital Text Investigations The digital humanities continues to change the ways in which we draw conclusions about social phenomena. This condition starts from the understanding that for the first time in history, humans can potentially scale the totality of a social phenomenon's appearing. This continuous evolution of study provides new ways to examine data. A key idea in this evolution is the ability to pull together unstructured data and their accompanying metadata as a rejoinder to older forms of content analysis and its related approaches. The JSTOR Data for Research (DfR) arrangement presents such a unique development to work with unstructured data. Subscribers can request large, carefully delineated, corpora for academic investigations. At time of writing there are two options for data requests. The first option allows the subscriber to create search terms, and without a signed contract, scale down the results, and download n-grams (roughly 1-3 combinations are avail...

The Matrix Literature Review and the 'rectangulate' Function from the r7283 Package

Matrices and Literature Reviews Pulling together a strong literature review continues to be the very foundation of  positioning an education researcher's novel contribution to the field. Yet, reviewing literature can be daunting at the outset. This is because organizing the literature review results requires itemizing, tagging, and keeping track of the relevant articles. Organizing 100 + articles takes time, commitment, and can ultimately distract from the task at hand, which is getting a grip on the state of knowledge. To make the task of organizing the literature more straightforward, I have created a computational function that helps lift some of the burden of organizing literature.  It takes an exported bibliographic research file (.bib) exported from EBSCO and widens it into a matrix. Transposing the .bib file into a matrix allows the researcher to jump right into the matrix literature review style of reading articles. A matrix literature function for education ...

Text mining, unruly text, XML, TEI, and R: Go with conventional architecture, or make your own?

Many educational researchers will inevitably work with text as data. It is unavoidable, as reflective practice (almost universally required by teacher preparation programs) requires conveying meaning through words, and retaining a corpus of reflections throughout a semester, or even a year. Finding patterns in teaching strategies will inevitably require text parsing. Student writing assessment naturally lends itself to text analytics, so educational researchers can gain data on student learning through reading student responses to writing prompts. Further still, professional educational researchers stand to gain much by taking in large amounts of text, searching for patterns, and reporting on their findings. The more skill at working with text, the greater the opportunities abound for educational researchers. Working with text requires effort and copious patience, mostly because text is, relative to numbers, messy. The Curse of Messiness Messiness is essentially the observatio...

Design-Based Survey Analysis

One of the persisting problems for secondary analysis-based researchers is generating a statistical model from data that is generalizable only to a fixed population (Lumley, 2010). A key difference between creating statistical inferences towards similar populations and estimating the results of a sample towards a fixed population is using several preemptive steps to guarantee that design-based sampling is replicated. Bell, Onwuegbuzie, Ferron, Jiao and Kromey (2012) have reported on the lack of clarity in remaining faithful to survey designs "Case Damascus Barlow Knife" by Michael E. Cumpston CC-BY-SA 3.0 by many investigators relying on large survey data covering adolescent health. However, reporting on international survey data suffers from the same issues, as sampling weights are not included in investigatory analysis, or they are not discussed thoroughly in methodology sections of investigation reports. While the rationales for incomplete discussions are not d...