Skip to main content

The Matrix Literature Review and the 'rectangulate' Function from the r7283 Package

Matrices and Literature Reviews

Pulling together a strong literature review continues to be the very foundation of  positioning an education researcher's novel contribution to the field. Yet, reviewing literature can be daunting at the outset. This is because organizing the literature review results requires itemizing, tagging, and keeping track of the relevant articles. Organizing 100 + articles takes time, commitment, and can ultimately distract from the task at hand, which is getting a grip on the state of knowledge.

To make the task of organizing the literature more straightforward, I have created a computational function that helps lift some of the burden of organizing literature.  It takes an exported bibliographic research file (.bib) exported from EBSCO and widens it into a matrix. Transposing the .bib file into a matrix allows the researcher to jump right into the matrix literature review style of reading articles.

A matrix literature function for education researchers

The function is purposefully designed to accompany the sharing options of an EBSCO search result, which can be exported  to an email account. EBSCO is one of the premier search engines/databases that education researchers use, as it houses education research complete, education abstracts, and ERIC (to name a few), and is therefore, a first stop for most education researchers.  Incorporating the sharing feature associated with EBSCO (although it may be used with other databases, and can certainly be used when the researcher has exported a given bibliography into the .bib format) into a matrix literature review was taken from a problem-in-practice philosophy that I encountered when performing my own literature reviews.

How "rectangulate" is different from similar functions

I was motivated to create the "rectangulate" function after I tried a similar package called "bib2df" (Ottolinger, 2019). The "bib2df" function from the self-same package will produce a "tibble," or truncated data frame, with roughly 38 different fields (columns), to assist the user with "analysis or visualization" (Ottlinger, 2019). Understanding that function follows philosophy, the "bib2df" function accomplishes the objective of setting up the data for text analytics, bibliometrics, or visualizations. In order to use "bib2df" for a matrix literature review, the user has to think outside of the function's originally intended philosophy. I started using "bib2df" in order to create my own literature review matrices, but then I realized that I had to create columns and subtract out columns as the function parsed out too much information that would not be relevant to the literature review. I also encountered the fact that "bib2df" was undergoing revision for CRAN and the rOpenSci initiative, and when I downloaded it, the package dependencies were also under revision. This meant that I could not use the function.

Figure 1: Tibble from bib2df

I created "rectangulate" because I realized that too many dependencies can crash a useful function. The verbosity of functions in calling too many dependencies is something to be guaranteed against, in my opinion, and where possible, base R (or other languages like C++) should be called upon instead of stacking R libraries atop one another (the only exception being that a library becomes static and well-developed). I created "rectangulate" from base R, with for-loops, and perl/posix regular expressions independent of intermediary R packages. As there are no dependencies, "rectangulate" is free to move about independent of other developers' ambitions to augment and change their code.

Results

The "rectangulate" function takes a .bib file and parses only the basic information contained in it that would be useful for a researcher to (1) extract evidence and record the article's claims, (2) relocate a file based on bibliographic information, (3) have the abstract guide the researcher while distilling information from an article. In addition to these benefits, "rectangulate" maintains the matrix in the indexed format provided by EBSCO. This means that the researcher can save the search, return to it, and each item in the matrix will match the EBSCO indexed results. As opposed to the 38 saved fields returned by "bib2df," the "rectangulate" function returns only 15, with 6 of the fields held as placeholders for the researcher to populate with each paper's evidence and conclusions. 

Figure 2: Tibble from rectangulate

Conclusions

Rectangulate is a function from the r7283 package that automates the creation of a matrix based on the exported .bib file from an EBSCO shared search result. It transposes the .bib file into a practical data frame that can be exported to an excel file, which keeps important traces of the indexed EBSCO search result for education researchers. It is distinctly different from the "bib2df" package in two major ways. First, "rectangulate" simplifies the parsed .bib output to maintain only what is needed for the matrix literature review. Second, it is built upon base R, which means that it is not a verbose package. 

References

Martinez, M. (2019). r7283: Rectangulate, and other Tools for the Matrix Literature Review. https://github.com/cownr10r/r7283.

Ottolinger, P. (2019). bib2df: Parse a BibTeX File to a Data Frame. R package version 1.1.1. https://github.com/ropensci/bib2df. 

Popular posts from this blog

Digital Humanities Methods in Educational Research

Digital Humanities based education Research This is a backpost from 2017. During that year, I presented my latest work at the 2017  SERA conference in Division II (Instruction, Cognition, and Learning). The title of my paper was "A Return to the Pahl (1978) School Leavers Study: A Distanced Reading Analysis." There are several motivations behind this study, including Cheon et al. (2013) from my alma mater .   This paper accomplished two objectives. First, I engaged previous claims made about the United States' equivalent of high school graduates on the Isle of Sheppey, UK, in the late 1970s. Second, I used emerging digital methods to arrive at conclusions about relationships between unemployment, participants' feelings about their  (then) current selves, their possible selves, and their  educational accomplishm ents. I n the image to the left I show a Ward Hierarchical Cluster reflecting the stylometrics of 153 essay

Creating Examination Question Banks for ESL Civics Students based on U.S. Form M-638

R and Latex Code in the Service of Exam Questions   The following webpage is under development and will grow with more information. The author abides by the GPL (>= 2) license provided by the "ProfessR" package by showing basic code, but not altering it. The code that is provided here is governed by the MIT license, copyright 2018, while respecting the GPL (>=2) license. Rationale Apart from the limited choices of open sourced, online curriculum building for adult ESL students (viz. elcivics.com), there is a current need to create open-sourced assessments for various levels of student understandings of the English language. While the U.S. Citizenship and Immigration Services (https://www.uscis.gov/citizenship) has valuable lessons for beginning and intermediate ESL civics learners, there exists a need to provide more robust assessments, especially for individuals repeating ESL-based civics courses. This is because the risks and efforts involved in applying for U

Getting past the two column PDF to extract text into RQDA: Literature reviews await

One of the great promises of working with RQDA is conceiving of it as computer assisted literature review software. This requires balancing the right amount of coding with text that can be used as warrants and backing in arguments. In theory it is a great idea--using computer assisted qualitative data analysis software (CAQDAS) for literature reviews, but how do you get the article PDFs into R and RQDA in a human readable format? By this I mean that many empirical articles are written in two column formats, and text extraction with standard tools produces text on the diagonal. Extracting PDF texts under this circumstance can be daunting when using some R packages such as 'pdftools', either with or without the assistance of  the 'tesseract' package. If you are working on a windows based computer, you can install three packages and Java to do the trick. First gather the literature articles that you would like to mark up in RQDA. Put them into a folder, and away you go.