The Sampling View covers the methodology and process for drawing a sample from a designated sample frame. It provides the ability to describe sample designs from simple single stage samples to those with multiple stages and ones with more than one frame (necessitating drawing subsets of the sample from each frame). Probability sampling, including model-based, and non-probability sampling can each be described. The descriptive information of the methodologies used, design of the sample, algorithms to support the drawing of the sample, goals, sample frames used, and guidance on the use of the resulting sample (for defining weights, limitations, etc.) are supported. This includes the use of the Workflows which can define the process of creating simple or complex samples on a step-by-step basis. Can be used as metadata to drive sample selection.

Use Cases:

Defining Sample Frames for prospective use.

Describing the process of simple or complex sampling procedures to inform the automated creation of a sample from a sample frame.

Describe a sample to provide the information necessary to create sampling weights and estimation formulae.

Describe the processing of sampling for the purpose of informing the user of limitations or strengths of a particular sampling procedure and to provide guidance on the use of the procedure or the result of the procedure.

Target Audiences:

Organizations defining sample frames and sampling procedures for execution.
Users requiring information on the sampling process and its limitations.

Restricted Classes:

Parameter: valueRepresentation: ValueDomain: some subtypes not supported: SentinelValueDomain.

Precondition: basedOnPriorResult: Result: some subtypes not supported: SegmentDefintionResult, QualitativeCodingResult.

SamplingDesign: expressesAlgorithm: AlgorithmOverview: some subtypes not supported: SegmentDefinitionAlgorithm, QualitativeCodingAlgorithm, BusinessAlgorithm.

SamplingDesign: implementedBy: WorkflowProcess: all subtypes not supported.

SamplingProcedure: hasDesign: DesignOverview: some subtypes not supported: SegmentDefinitionDesign, QualitativeCodingDesign.

SamplingProcedure: isExpressedBy: AlgorithmOverview: some subtypes not supported: SegmentDefinitionAlgorithm, QualitativeCodingAlgorithm, BusinessAlgorithm.

SamplingProcedure: componentMethodology: MethodologyOverview: some subtypes not supported: SegmentDefinitionMethod, QualitativeCodingMethod.

Split: executeConcurrently: WorkflowStep: some subtypes not supported: Act, ConcurrentControlConstruct, SplitJoin.

SplitJoin: executeConcurrently: WorkflowStep: some subtypes not supported: Act, ConcurrentControlConstruct, Split.

Code is extended by ClassificationItem.
CodeList is extended by GeographicUnitClassification, GeographicUnitTypeClassification and StatisticalClassification.
Concept is extended by ConceptualVariable, RepresentedVariable and InstanceVariable.
ExternalMaterial is extended by ExternalAid.
WorkflowStepSequence is extended by StructuredWorkflowSteps.

General documentation:

In socio-economic statistics, samples are used to conduct surveys within time, accuracy, and expense constraints. When it comes time to conduct a survey, the sample is selected from a larger set, called the frame or sampling frame. Frames are an enumeration of the population the survey is designed to estimate, but they don’t contain the variables a survey or experiment might collect. The sample is a subset of the frame, and it is designed (in theory) to be representative of the population as a whole. In practice, for any selected sample, this might not be the case.

Samples can be based on probability or not. For a probability based sample, each element in the frame is assigned a probability it is selected for a sample. Non-probability samples do not have this feature, and the elements are selected in some other manner. The representativeness of a probability based sample is measured as sampling error, a statistical measure. Every probability sample has a sampling error associated with it. It is an indicator of how accurate estimates are to the theoretical population values.

Several methods can be used to select a sample. In the simple case, elements are selected from the frame in one step. In the probability based case, typical techniques are simple random sampling (with or without replacement), systematic random sampling, sampling proportional to size, and simple stratified random sampling. For non-probability samples, techniques include convenience, purposive, quota, self-selected, and snowball sampling.

Sampling in scientific experiments is different than for statistical surveys usually. Non-probability sampling is often used. Self-selected samples (for instance people who volunteer for a drug trial) are often seen.

To help reduce sampling errors, improve estimates, reduce costs, and reduce the time to complete a survey, sampling is sometimes broken into parts. The first main way this is done is through the use of strata. Strata are subsets of a population with independent estimates. For example, the mean height of women is different than the mean height of men. Sampling based on stratification reduces sampling error.

The other main way samples are partitioned is through clusters. Clusters are subsets based on some convenience criteria. For example, to make sure interviews can be conducted in a short amount of time for a country-wide survey, selected sample units might need to be geographically close, or clustered. This will greatly reduce the time and cost for collecting data, but it has the reverse effect of increasing the sampling error. Representativeness is affected if only some geographical areas are used to select the sample.

Sometimes, so many clusters are identified that they are sampled themselves. Then within those selected, the units for collection are selected. This is a 2-stage sample, and it should not be hard to imagine more than 2 steps to reduce clusters to a manageable size before the ultimate sampling units are selected. Some statistical surveys in the US have 4 stages or more.

It is possible that for some survey, more than one frame is needed. For instance, in random dialing telephone surveys, a bank of valid telephone numbers is the frame, and some are selected for contact. However, the bank of valid telephone numbers might come from the telephone company, and they are all land-lines. To get full coverage of the population, one must also select from a bank of cellphone numbers. Thus, 2 frames are needed, 2 samples are selected for full coverage (representativeness) to be achieved.

Include in build?: 

Graph for view