Skip to main content

Importing excel-based data into RQDA for a literature review

One of the great difficulties surrounding how to batch load texts into RQDA is that the package requires imports from .txt files (Huang, 2018). I recently had an issue with this procedure as I was starting a literature review by way of reviewing abstracts in a .bib format. I had taken abstracts from the EBSCO search engine, and from 'rectangulate()' function in the r7283 package, had transformed a bibtext file into an excel spreadsheet with a tidy display of one row per abstract and citiaton information for 140 empirical articles. While the excel file makes for an easy upload into proprietary software, I had never attempted to batch upload this format of text into RQDA.

That is, until I came across a how-to artifact from an unknown workshop (click the link to find the original set of instructions) (Rstudio pubs, n.d.). The main pieces of help came from the idea that users could take a spreadsheet, list and make as characters each in a series of column cells, and then identify those entries (in this case abstracts) by another column to make a list of entries. These would be batch loaded into RQDA.

Fig 1. Code from unnamed source to batch import text from excel, with modifications

Create the new project

While the original file imports from a .csv file, I prefer to import from an .xlsx file as the latter are more stable, particularly with text. In line 6, I comment that the user must first create a new project with RQDA, which requires opening RQDA (see lines 8 and 10), and then clicking on the new project tab, and giving the project a name. I comment in line 6 that the user should save the RQDA new project and remember the path, as it will become important during the process.

Prepare the data

The next step involved is reading in the data file from its location along the path that I stored it. I have used Wickham and Bryan's (2019) 'readxl()' function to read in the textual cases as noted on line 12, and then followed the RQDA function 'openProject()' to open the newly created project according to the unnamed, how-to artifact. These two steps constituted the preparation for RQDA receiving the data.


Line 16 shows the conversion from the R "data" object towards a list of string characters from the abstract in the original excel sheet. I have identified the "Cite.Key" from the abstracts (back from the original conversion from the r7283::rectangulate() function towards an excel sheet) as the names for cases. This is for a particular reason. After coding the abstracts, when I find something of interest while drafting the literature review, I can simply use the .bib file and the apa.csl file to take the cite key from the abstract, and cite as I write (for more information, look up citations in R markdown). The 'options()' function on line 18 makes sure that RQDA and R do not convert the .bib id number into scientific notation (this way, I can cut and paste and get the citation in R markdown). Finally, the 'names()' function is associated with the Cite.Key information in line 20.

Upload the data

Line 22 is where the action lies. This deceptively simple command takes the object "q", with all of its abstracts, and writes the files to RQDA, where the process of computer assisted qualitative data analysis (CAQDAS) coding can begin. The closeProject() command closes the gateway into the .rqda file that holds all the case information.

Fig 2. Abstracts for bib Cite.Key, the final result


Fig 2. shows the final result with the main toolbar of the project, holding over 100 abstracts is held and ready to be coded. The pane to the right is an example abstract.



References

  Hadley Wickham and Jennifer Bryan (2019). readxl: Read Excel Files. R package version 1.3.1.
  https://CRAN.R-project.org/package=readxl

  Huang, Ronggui. (2018). RQDA: R-based Qualitative Data Analysis. R package version 0.3-1. URL http://rqda.r-forge.r-project.org.

______. (n.d.). Qualitative data analysis with RQDA. http://rstudio-pubs-static.s3.amazonaws.com/2910_c6ec7d53cc37473a81924554bf93b154.html


Popular posts from this blog

Digital Humanities Methods in Educational Research

Digital Humanities based education Research This is a backpost from 2017. During that year, I presented my latest work at the 2017  SERA conference in Division II (Instruction, Cognition, and Learning). The title of my paper was "A Return to the Pahl (1978) School Leavers Study: A Distanced Reading Analysis." There are several motivations behind this study, including Cheon et al. (2013) from my alma mater .   This paper accomplished two objectives. First, I engaged previous claims made about the United States' equivalent of high school graduates on the Isle of Sheppey, UK, in the late 1970s. Second, I used emerging digital methods to arrive at conclusions about relationships between unemployment, participants' feelings about their  (then) current selves, their possible selves, and their  educational accomplishm ents. I n the image to the left I show a Ward Hierarchical Cluster reflecting the stylometrics of 153 essay

Creating Examination Question Banks for ESL Civics Students based on U.S. Form M-638

R and Latex Code in the Service of Exam Questions   The following webpage is under development and will grow with more information. The author abides by the GPL (>= 2) license provided by the "ProfessR" package by showing basic code, but not altering it. The code that is provided here is governed by the MIT license, copyright 2018, while respecting the GPL (>=2) license. Rationale Apart from the limited choices of open sourced, online curriculum building for adult ESL students (viz. elcivics.com), there is a current need to create open-sourced assessments for various levels of student understandings of the English language. While the U.S. Citizenship and Immigration Services (https://www.uscis.gov/citizenship) has valuable lessons for beginning and intermediate ESL civics learners, there exists a need to provide more robust assessments, especially for individuals repeating ESL-based civics courses. This is because the risks and efforts involved in applying for U

Getting past the two column PDF to extract text into RQDA: Literature reviews await

One of the great promises of working with RQDA is conceiving of it as computer assisted literature review software. This requires balancing the right amount of coding with text that can be used as warrants and backing in arguments. In theory it is a great idea--using computer assisted qualitative data analysis software (CAQDAS) for literature reviews, but how do you get the article PDFs into R and RQDA in a human readable format? By this I mean that many empirical articles are written in two column formats, and text extraction with standard tools produces text on the diagonal. Extracting PDF texts under this circumstance can be daunting when using some R packages such as 'pdftools', either with or without the assistance of  the 'tesseract' package. If you are working on a windows based computer, you can install three packages and Java to do the trick. First gather the literature articles that you would like to mark up in RQDA. Put them into a folder, and away you go.