Skip to main content


Showing posts from April, 2020

Getting past the two column PDF to extract text into RQDA: Literature reviews await

One of the great promises of working with RQDA is conceiving of it as computer assisted literature review software. This requires balancing the right amount of coding with text that can be used as warrants and backing in arguments. In theory it is a great idea--using computer assisted qualitative data analysis software (CAQDAS) for literature reviews, but how do you get the article PDFs into R and RQDA in a human readable format? By this I mean that many empirical articles are written in two column formats, and text extraction with standard tools produces text on the diagonal. Extracting PDF texts under this circumstance can be daunting when using some R packages such as 'pdftools', either with or without the assistance of  the 'tesseract' package. If you are working on a windows based computer, you can install three packages and Java to do the trick. First gather the literature articles that you would like to mark up in RQDA. Put them into a folder, and away you go.

Importing excel-based data into RQDA for a literature review

One of the great difficulties surrounding how to batch load texts into RQDA is that the package requires imports from .txt files (Huang, 2018). I recently had an issue with this procedure as I was starting a literature review by way of reviewing abstracts in a .bib format. I had taken abstracts from the EBSCO search engine, and from 'rectangulate()' function in the r7283 package, had transformed a bibtext file into an excel spreadsheet with a tidy display of one row per abstract and citiaton information for 140 empirical articles. While the excel file makes for an easy upload into proprietary software, I had never attempted to batch upload this format of text into RQDA. That is, until I came across a how-to artifact from an unknown workshop (click the link to find the original set of instructions) (Rstudio pubs, n.d.). The main pieces of help came from the idea that users could take a spreadsheet, list and make as characters each in a series of column cells, and then identif