Bias in Data Collection - I

by Morgan A Rennie

This is part 1 of a 4 part series, covering bias in data collection: what bias is, who data bias can affect, the importance of awareness of data bias, and ways in which we (as analysts and consultants) can attempt to mitigate bias in the collection and analysis phases.


WHAT ARE BIASES IN DATA COLLECTION?

Bias in data collection refers to the systematic error (a non-random effect on the accuracy of the data) that can be introduced into a data set as a result of how the data was collected. This can occur when the data collection method or sampling strategy results in a non-representative sample of the population being studied - effectively, discourse between what you aim to measure, and what you are actually measuring. Bias in data collection can have significant consequences, as it can lead to inaccurate results and flawed decision-making. It is important to take steps to minimize bias in data collection. Additionally, it is important to be aware of the potential sources of bias in a dataset and to take these into account when interpreting the results of analyses or models.

There are several types of bias that can affect data collection, including (but not limited to):

Selection bias: Selection bias occurs when the method used to select the sample does not result in a representative sample of the population. This can happen if the sample is not random or if certain groups are underrepresented in the sample.

Sampling bias: Sampling bias occurs when members of the population are more likely to be included in the sample than others. This can happen if the sampling method is not random or if certain groups are overrepresented in the sample.

Measurement bias: Measurement bias occurs when the way in which data is collected introduces errors or inaccuracies into the data. This can happen if the measurement instrument is flawed or if the data is collected in a way that is not consistent across all participants.

Reporting bias: Reporting bias occurs when the way in which data is reported or recorded introduces errors or inaccuracies into the data. This can happen if the data is self-reported, if the data is recorded in a way that is not consistent across all participants, or if the data is recorded in a way that is influenced by the researcher or the participant.

These differing forms of inherent and unmitigated bias can have major effects on not only your analysis, but on groups of people that the research is often trying to help. Without mitigation, the analysis of biased data at best can result in misrepresentation of results, but at worst cause harm to already marginalised groups. Section 2 of this series will further explore the reasons that it is important to understand bias in data collection and can be read here.