In my last blog post, I discussed case files—often voluminous records that contain sensitive, personal information about individuals. Archivists have several appraisal options to consider when reviewing case files:

Retain all records permanently.
Retain only key documents from the files.
Take a sample or selection of the records.
Take an example of the records.
Refuse to accept the records.
Destroy all records.

Sample vs. Selection

There is a difference between a sample and a selection. Sampling records allows archivists to choose items or files from a series in such a way that the items or files chosen are a reliable representation of the whole. Selecting records helps archivists choose individual items from a series to obtain a qualitative reflection of some predetermined significant characteristic of the whole.

Case files, because they are bulky and voluminous, are often seen as candidates for sampling. Yet, they are also a reflection of individual voices and society’s interactions with the individual. Thus, they are interesting for legal rights protection, genealogical research, longitudinal data, and documenting public policy.

Types of Sampling

Statistical sampling allows archivists to save a portion that represents the whole, based on the idea that every file has an equal chance of being chosen. Sampling can be systematic, such as every nth file, assuming that the record population is already randomized. Archivists may use random number tables, with every file assigned a unique number.

Random tables are awkward to use. Systematic sampling is easier to conduct, so it is chosen more frequently. Many think that systematic sampling is statistically equally valid in most cases. Statistical sampling requires that the files be homogenous. Since archivists need to calculate the amount to select, relative to the size of the series, open-ended series are problematic.

Most archivists are not statisticians, so may not be comfortable with criteria to use in assessing sampling issues. These include variance (determining homogeneity), specified degree of accuracy (determining the required degree of accuracy), and level of confidence (degree of certainty that the specified degree of accuracy will be achieved).

Congressional constituent correspondence has often been sampled in this fashion. These tend to be huge files with repetitive types of letters. In general, constituent mail consists of service cases in which a Congressional representative has intervened with a federal agency for a constituent, requests for information and publications, and correspondence regarding pending legislation or current issues.

Many archives decide that retaining a portion of those files retains the records’ characteristics with minor loss, especially since constituent correspondence files are often quite similar from state to state.

Purposive sampling is a technique of selecting a limited number of typical items to represent the larger group. For example, the records of one regional office may represent them all. The selection of the portion to retain is made on a non-mathematical basis. This approach is appropriate for use in cases where statistical reliability is not an issue, and there are various ways to define what to retain.

Systematic sampling is a technique for selecting items from a group based on some formal characteristic without regard to the content of the items. Examples of systematic sampling include pulling all files of a given size (assuming that larger files contain the most interesting, complex information) and pulling all files in which the surname begins with a given letter. Although easy to implement, this method is not statistically valid.

Exceptional sampling is a technique of selecting a subset based on unusual or essential qualities. Examples of criteria used for exceptional sampling include controversial subjects, notorious or famous individuals, and “firsts.” Although not statistically valid, exceptional sampling can frequently capture materials commonly requested by patrons.

Illustrative sampling is a method that selects a portion from a series based on the archivist’s judgment, which specific criteria may inform. This methodology might occur when an archivist knows she needed examples that made certain points such as for exhibit purposes, for example.

Appraisal First

It is also important to remember that sampling should come after appraisal. Archivists do not want to appraise the sample; they want to sample based on the appraisal decision. Archives also need to document what they did and the choices they made in order for researchers to know what they can do with the data.

The blog was originally published on Lucidea's blog.