3. Describe
Working with textual data requires knowledge of the coverage and quality of the corpus. This context provides the basis for making claims about the literature, revealing the strengths and weakness of the dataset and thereby the limitations of its representativeness. Information such as whether a title is only sporadically available in the corpus (such as is the case with the Youths Instructor), or a particular title has on the whole very poor OCR quality, is important information for determining how best to approach the data, which forms of analysis are likely to be most productive, and how those results should be utilized.
For the SDA periodicals, the information that was most valuable for evaluating the corpus was the coverage in the digital record of titles per year, the number of words in each document (which corresponded to a periodical page), and the error rates for each page, as determined in comparison to a central wordlist, as documented in the second notebook collection.