1. Gather

My first step for the computational text analysis was to gather the digital files from the Seventh-day Adventist Office of Archives, Statistics, and Research. These digitized historic documents are published and maintained by the church and made available to the general public. While the online interface is not optimized for batch downloading, the regular URL structure and their accessiblity on the open web make this collection of documents a good use case for web scraping.

These processes of downloading the files and extracting the textual data were among the first computational tasks I undertook for the dissertation. While such tasks can be done manually, using scripts to automate the processes greatly decreases the time required and the amount of content that can be efficiently collected. As early work, the methods I initially used were very basic. These notebooks offer a revised version of those processes.

Notebook Files

Downloading Corpus Files
Extracting Text from PDFs