Basic StatisticsIntermediate· 9 min read

Sampling: How to Study a Population Without Examining Everyone

Election polls interview 2,000 people to represent 150 million. Learn how sampling makes this possible and where distortions can arise.

Renato Freitas

Updated on May 5, 2026

Population and sample

In statistics, a population is the complete set of elements about which we want to draw conclusions. A sample is a subset of that population, chosen to represent it. When a laboratory tests medicines, it cannot administer the drug to every person on the planet — a sample is used instead.

Studying the entire population (census) is expensive, slow and sometimes impossible. A blood test destroys the sample — you cannot test all of a patient's blood. A survey on the quality of life of 200 million people would take decades and cost billions. Sampling solves these constraints with mathematical rigor.

🧮 Try it yourself — CalcSim

Want more features? Download CalcSim IA app

Types of sampling

In simple random sampling, every element of the population has an equal chance of being selected — like drawing numbers from a hat. It is the purest method theoretically, but may be impractical for very large or geographically dispersed populations.

Systematic sampling selects elements at regular intervals: the first is chosen at random and then every k-th element is taken. If the population has 10,000 and we want 100 elements, k = 100 — draw the first between 1 and 100, then take the 100th, 200th, 300th, etc.

Stratified sampling divides the population into subgroups (strata) with similar characteristics — such as age group, gender or region — and draws proportionally from each stratum. This ensures all groups are represented. Cluster sampling selects entire groups (schools, cities, city blocks) instead of individuals — useful when there is no complete list of the population.

Simple random: every element has an equal chance of selection
Systematic: selection at regular intervals
Stratified: subgroups represented proportionally
Cluster: entire groups are selected

Sampling error and sample size

No sample perfectly represents the population — sampling error always exists. The margin of error indicates the interval within which the true population value probably falls. A survey with a margin of error of ±3 pp and a result of 45% means the true value is between 42% and 48% at a certain confidence level (usually 95%).

Sample size is the main determinant of margin of error. Doubling the sample size does not halve the error — it improves proportionally to the square root. To halve the margin of error, you must quadruple the sample. That is why national surveys with 2,000 respondents already have a reasonable margin of error (around ±2 pp), while smaller city surveys may need fewer respondents.

Beyond size, the selection method is crucial. A large sample that is poorly selected (with systematic bias) is worse than a small, well-selected one. Bias occurs when certain people have a much higher or lower chance of being chosen than others.

Frequently asked questions

Why do election polls sometimes miss by so much?

Usually due to sampling bias or the 'Bradley effect' (people do not tell the truth about certain preferences). Also, the margin of error applies individually to each candidate, and in close races the total uncertainty is larger. Polls are also conducted days or weeks before the election — and voting intentions can change.

What is the minimum number of respondents for a valid survey?

There is no absolute minimum — it depends on the level of variation in the population and the desired precision. In very homogeneous populations, 30 elements may be enough. For national surveys with multiple subgroups, 1,000 to 2,000 respondents are typically used for a margin of error of 2 to 3 percentage points.

What is a confidence interval?

It is the interval that, at a given confidence level (usually 95%), contains the true population parameter. A 95% CI of [42%, 48%] means that if we repeated the sampling process 100 times, we would expect 95 of those intervals to contain the true value.

Is a convenience sample a problem?

Yes. A convenience sample — such as surveying only people who walk past a university or only followers of a social media profile — has high bias and rarely represents the target population well. Its conclusions are limited and cannot be generalized with confidence.

Was this article helpful?

Rate with stars to help us improve the content.

Still have questions?

The AI Professor explains step by step

Ask a question in natural language and get a personalised explanation about Basic Statistics — or any other topic.

Prefer to solve it on your phone?

Download the free app →

Keep learning

View all articles in Basic Statistics