# Where Do Sampling Weights Come From?

Weights make it possible to form inferences based on a sample that does not look exactly like the population from which it was drawn. There are multiple reasons why the sample may not exactly reflect the population. These include:

- The use of stratification and/or cluster sampling causes certain elements to have a higher probability of selection into the sample than others.
- Some respondents are systematically less likely to respond to an invitation to participate in a survey.
- Even after correcting for the first two issues, the weighted sample distribution may still often fail to correspond to a known population distribution (obtained from, for example, Census data).

Survey organizations therefore create *sampling weights* to correct for these systematic differences in selection probabilities. The actual steps to weighting will vary from survey to survey, but the following are usually present.

- First, a weight equal to the reciprocal of the selection probability is created. The selection probability will depend on the sampling design, as stratification or clustering can increase the probability that a particular element is chosen. If the selection probability is .05, then the weight would equal 20, which is akin to counting that observation twenty times. An element with a selection probability of .8, meanwhile, would only receive a weight of 1.25.
- Second, the response rates within different subgroups are examined, and an additional weight is created to account for those who were less likely to respond. For example, if only 85% of respondents under the age of 25 participated, a new weight equal to the reciprical of the response rate would be created for this subgroup. Thus, the new weight for an observed respondent under 25 would be 1/.85=1.176, with the extra weighting substituting for the information not available from the non-respondents.
- Third, the weights from the first two steps are multiplied together to create a new weight. The distribution of certain characteristics of the resulting weighted sample are compared to a known distribution. To the extent there are differences (say there are fewer young Latino males in the sample than in the population), an additional weight is created. This process of comparing the weighted sample to known population characteristics is known as
*post-stratification.* - Finally, the weights from steps one through three are multiplied together to create the final weight used in analysis.

It is important to utilize sampling weights when analyzing survey data, *especially* when calculating univariate statistics such means or proportions. The relevance of weighting for multivariate models (such as multiple regression) is somewhat more ambiguous, particularly when the model contains controls for the variables used in the weight construction. However, weighting is not the only issue that arises in complex survey analysis. Standard errors may also be incorrect unless appropriate adjustments are made, and these adjustments will require taking the weighting into account. Thus, it is always a good idea to take the weights available in a data file seriously.

Still have questions? Contact us!