Sampling
The selection of cases to observe is the task of sampling. If you’re going to be collecting data from people, you might be able to talk to every person that you want your research to apply to, that is, your population. If you’re doing a study of state election commissioners, you might be able to talk to all 50 of them. In that case, you’d be conducting a census study. Often, though, we’re only able to collect data from a portion of the population, or a sample. We devise a sampling frame, a list of cases we select our sample from—ideally, a list of all cases in the population—but then which cases do we select for the sample? We select cases for our sample by following a sampling design, which comes in two basic varieties: probability sampling designs and nonprobability sampling designs.
In probability sampling designs, every case in the population has a known, greater-than-zero probability of being selected for the sample. This feature of probability sampling designs, along with the wonder of the central limit theorem and law of large numbers, allows us to do something incredibly powerful. If we’re collecting quantitative data from our sample, we can use these data to calculate statistics—quantified summaries of characteristics of the sample, like the median of a variable or the correlation between two variables. If we’ve followed a probability sampling design, we can then use statistics to estimate the parameters—the corresponding quantified characteristics of the population—with known levels of confidence and accuracy. This is what’s going on when you read survey results in the newspaper: “± 3 points at 95% confidence.” For example, if 30% of people in our sample say they’d like to work for government, then we’d be confident that if we were to repeat this survey a thousand times, 95% of the time (our level of confidence), we’d find that between 27 and 33% (because ± 3 points is our degree of accuracy) of the respondents would answer the same way. Put another way, we’d be 95% certain that 27 to 33% of the population would like to work for government.
Again, this trick of using sample statistics to estimate population parameters with known levels of confidence and accuracy only works when we’ve followed a probability sampling design. The most basic kind of probability sampling design is a simple random sample. In this design, each case in the population has a known and equal probability of being selected for the sample. When social researchers use the term random, we don’t mean haphazard. (This word has become corrupted since I was in college, when my future sister-in-law started saying stuff like “A boy I knew in kindergarten just called—that was so random!” and “I just saw that guy from ‘Saved by the Bell’ at the mall—pretty random!”) It takes a plan to be random, to give every case in the population an equal chance of being selected for a sample. If we were going to randomly select 20 state capitals, we wouldn’t just select the first 20 working from west to east or the first 20 we could think of—that would introduce sampling bias. (We’ll have more to say about bias later, but you get the gist of it for now.) To ensure all 50 capitals had an equal probability of being selected (a probability of 0.4, in fact), we could list them all out on a spreadsheet, use a random number generator to assign them all random numbers, sort them by those numbers, and select the first 20; or we could write each capital’s name on same-sized pieces of paper, put them in a bag, shake them up, and pull out 20 names. (Some textbooks still have random number tables in the back, which you’re welcome to learn how to use on your own, but they’ve become pretty obsolete.)
Selecting a simple random sample may be too much of a hassle because you just have a long, written list in front of you as your sampling frame, like a printed phonebook. Or, selecting a simple random sample may be impossible because you’re selecting from a hypothetically infinite number of cases, like the vehicles going through an intersection. In such scenarios, you can approximate a random sample by selecting every 10th or 20th or 200th or whateverth case to reach your desired sample size, which is called systematic sampling. This works fine as long as periodicity isn’t present in your population, meaning that there’s nothing odd about every 10th (or whateverth) case. If you were sampling evenings to observe college life, you wouldn’t want to select every 7th case, or you’d introduce severe sampling bias. Just imagine trying to describe campus nightlife by observing only Sunday evenings or only Thursday evenings. As long as periodicity isn’t a problem, though, systematic sampling approximates simple random sampling.
Our goal in selecting a random (or systematic) sample is to construct a sample that is like the population so that we can use what we learn about the sample to generalize to the population. What if we already know something about our population, though? How can we make use of that knowledge when constructing our sample? We can replicate known characteristics of a sample by following another probability sampling design, a proportionate stratified sampling design. Perhaps we’d like to sample students at a particular college, and we already know students’ sex, in-state versus out-of-state residency, and undergraduate versus graduate classification. We can use sex, residency, and classification as our strata and select a sample with the same proportions of male versus female, in-state versus out-of-state, and undergraduate versus graduate students as the population. If we determine that 4% of our population are male graduate students from out-of-state and we wanted a sample of 300 students, we’d select (using random sampling or systematic sampling) 12 (300*4%) male graduate students from out-of-state to be in our sample. We’d carry on similarly sampling students with other combinations of these characteristics until we had a sample proportionally representative of the population in terms of sex, residency, and classification. We probably would have gotten similar results if we had used a simple random sampling strategy, but now we’ve ensured proportionality with regard to these characteristics.
Sometimes, though, proportionality is exactly what we don’t want. What if we were interested in comparing the experiences of students who had been homeschooled to students who were not homeschooled? If we followed a simple random sampling design or a proportionate stratified sampling design, we would probably end up with very few former homeschoolers—not enough to provide a basis of comparison to the never homeschooled. We may even want half of our sample to be former homeschoolers, which would require oversampling from this group to have their representation in the sample disproportionately high compared to the population, achieved by following a disproportionate stratified sampling design. Importantly, this is still a probability sampling design. With some careful math, we can still calculate the probability of any one case in the population being selected for the sample; it’s just that for former homeschoolers, that probability would be higher than for the never homeschooled. Knowing these probabilities still permits us to use statistics to estimate parameters for the entire population of students, we just have to remember to make the responses of former homeschoolers count less and the responses of the never homeschooled count more when calculating our parameter estimates. This is done using weights, which are based on those probabilities, in our statistical calculations.
One final probability sampling design, cluster sampling design, is commonly used to sample cases that are dispersed throughout a broad geographic region. Imagine the daunting task of needing to sample 2,000 parents of kindergarteners from across the United States. There is no master list of kindergarten students or their parents to serve as a sampling frame. Constructing a sampling frame by going school to school across the country would likely consume more resources than the rest of the study itself—the thought of constructing such a sampling frame is ridiculous, really. We could, though, first randomly select, say, 20 states, and then 10 counties within each of those 20 states, and then 1 school from each of those counties, and then 10 kindergartners from each of those schools. At each step, we know the probability of each state, county, school, and kid being selected for the sample, and we can use those probabilities to calculate weights, which means we can still use statistics to estimate parameters. We’ll have to modify our definition for probability sampling designs just a bit, though. We could calculate the probability of any one case in the population being included in the study, but we don’t. Being able to calculate the probabilities of selection for each sampling unit (states, counties, schools, kids), though, does the same job, so we still count cluster sampling designs as one of the probability sampling designs. To modify our definition of probability sampling designs, we might say that every case in the population has a known or knowable, greater-than-zero probability of being selected for the sample.
Using a probability sampling design is necessary, but not sufficient, if we want to use statistics to estimate parameters. We still need an adequate sample size. How do we calculate an adequate sample size? Do we, say, select 10% of the population? It would be handy to have such an easy rule of thumb, but as it turns out, the size of the population is only one factor we have to consider when determining the required sample size. (By the way, this is probably the most amazing thing you’ll learn in this text.) In addition to population size, we also have to consider required level of confidence (something you decide yourself), required level of accuracy (something else you decide), and the amount of variance in the parameter (something you don’t get to decide; it is what it is).
As you’d probably guess, the larger the population size, the larger the required sample size. However, the relationship between population size and required sample size is not linear (thus no rule of thumb about selecting 10% or any other percent of the population for your sample). If we have a somewhat small population, we’ll need a large proportion of it in our sample. If we have a very large population, we’ll need a relatively small proportion of it in our sample. In fact, once the population size goes above around 20,000, the sample size requirement hardly increases at all (thanks again to the central limit theorem and the law of large numbers).
We also have to consider how much the parameter varies. Imagine that I’m teaching a class of 40 students, and I know that everyone in the class is the same age, I just don’t know what that age is. How big would my sample size need to be for me to get a very good (even perfect) statistic, the mean age of my students? Think. One! That’s right, just one. My parameter, the mean age of the class, has zero variation (my students are all the same age), so I need a very small sample to calculate a very good statistic. What if, though, my students’ ages were all over the place—from one of those 14-year-old child geniuses to a 90-year-old great grandmother who decided to finish her degree? I’d be very reluctant to use the mean age of a sample of 3, 4, or even 10 students to estimate the whole class’s mean age. Because the population parameter varies a lot, I’d need a large sample. The rule, then: The more the population parameter varies, the more cases I need in my sample.
The astute reader should, at this point, be thinking “Wait a sec. I’m selecting a sample so I can calculate a statistic so I can estimate a parameter. How am I supposed to know how much something I don’t know varies?” Good question. Usually, we don’t, so we just assume the worst, that is, we assume maximum variation, which places the highest demand on sample size. When we specify the amount of variation (like when using the sample size calculators I’ll say more about below), we use the percentage of one value for a parameter that takes on only two values, like responses to yes/no questions. If we wanted to play it safe and assume maximum variation in a parameter, then, we’d specify 50%; if 50% of people in a population would answer “yes” to a yes/no question, the parameter would exhibit maximum variation—it can’t vary any more than a 50/50 split. Specifying 0% or 100% would be specifying no variation, and, as it may have occurred to you already, specifying 25% would be the same as specifying 75%.
Very astute readers might have another question: “You’ve been referring to a required sample size, but required for what? What does it mean to have a required sample size? Isn’t that what we’re trying to figure out?” Another good question. Given the size of the population (something you don’t control) and the amount of variance in the parameter (something else you don’t control), a sample size is required to be at least a certain size if we want to achieve a desired level of confidence and a desired level of accuracy, the factors you do control. We saw examples of accuracy and confidence previously. We might say “I am 95% percent certain [so I have a 95% confidence level] that the average age of my class is in the 19 to 21 range [so I have a ± 1 year level of accuracy].” A clumsier way to say the same thing would be “If I were to repeat this study over and over again, selecting my sample anew each time, 95% of my samples would have average ages in the range of 19 to 21.” Confidence and accuracy go together; it doesn’t make sense to specify one without specifying the other. As I’ve emphasized, you get to decide on your levels of confidence and accuracy, but there are some conventions in social research. The confidence level is most often set at 95%, though sometimes you’ll see 90% or 99%. The level of accuracy, which is usually indicated as the range of percentage point estimates, is often set at ±1%, 3%, or 5%. If you’re doing applied research, you might want to relax these standards a bit. You might decide that a survey giving you ±6% at an 85% confidence level is all you can afford, but it will help you make decisions better than no survey at all.
So far, I’ve just said we need to “consider” these four factors—population size, parameter variation, degree of accuracy, and degree of confidence, but, really, we have to do more than just consider them, we have to plug them into a formula to calculate the required sample size. The formula isn’t all that complicated, but most people take the easy route and use a sample size calculator instead, and so will we. Several good sample size calculators will pop up with a quick internet search. You enter the information and get your required sample size in moments. Playing around with these calculators is a bit mind boggling. Try it out. What would be a reasonable sample size for surveying all United States citizens? What about for all citizens of Rhode Island? What’s surprising about these sample sizes? Play around with different levels of confidence, accuracy, and parameter variation. How much do small changes affect your required sample sizes?
And note the interplay of confidence and accuracy. For any given sample size, you can have different combinations of confidence and accuracy, which will have an inverse relationship—as one goes up, the other goes down. With the same sample, I could choose either to be very confident about an imprecise estimate or to be not-so-confident about a precise estimate. I can look over a class of undergraduates and predict with near certainty that their average age is between 17 and 23, or I can predict with 75% confidence that their average age is between 19 and 20.
It’s important to realize what we’re getting from the sample size calculator. This is the minimum sample size if we’re intending to use statistics to estimate single parameters, one by one—that is, we’re calculating univariate statistics. If, however, we’re planning to compare any groups within our sample or conduct any bivariate or multivariate statistical analysis with your data, our sample size requirements will increase accordingly (and necessitate consulting statistics manuals).
Calculating a minimum sample size based on the desired accuracy and confidence only makes sense if we’re following a probability sampling design. Sometimes, though, our goal isn’t to generalize what we learn from a sample to a population; sometimes, we have other purposes for our samples and use nonprobability sampling designs. Maybe we’re doing a trial run of our study. We just want to try out our questionnaire and get a feel for how people will respond to it, so we use a convenience sampling design, which is what it sounds like—sampling whatever cases are convenient. You give your questionnaire to your roommate, your mom, and whoever’s waiting in line with you at the coffee shop. Usually, convenience sampling is used for field testing data collection instruments, but it can also be used for exploratory research—research intended to help orient us to a research problem, to help us figure out what concepts are important to measure, or to help us figure out where to start when we don’t have a lot of previous research to build on. We know that we have to be very cautious in drawing conclusions from exploratory research based on convenience samples, but it can provide a very good starting point for more generalizable research in the future.
In other cases, it would be silly to use a probability sampling design to select your case. What if you wanted to observe people’s behavior at Green Party rallies? Would you construct a sampling frame listing all the upcoming political rallies and randomly select a few, hoping to get a Green Party rally in your sample? Of course not. Sometimes we choose our sample because we want to study particular cases. We may not even describe our case selection as sampling, but when we do, this is purposive sampling. We can also use purposive sampling if we wish to describe typical cases, atypical cases, or cases that provide insightful contrasts. If I were studying factors associated with nonprofit organizational effectiveness, I might select organizations that seem similar but demonstrate a wide range of effectiveness to look for previously unidentified differences that might explain the variation. Purposive sampling is prominent in studies built around in-depth qualitative data, including case studies, which we’ll look at in a bit.
When purposively selecting cases of interest, we should take care not to draw unwarranted conclusions from cases selected on the dependent variable, the taboo sampling strategy. Imagine we want to know whether local governments’ spending on social media advertising encourages local tourism. Our independent variable is social media advertisement spending, and our dependent variable is the amount of tourism. If we were to adopt this taboo sampling strategy, we would identify localities that have experienced large increases in tourism. We may then, upon further investigation, learn they had all previously increased spending on social media advertising and conclude that more advertising spending leads to more tourism. Can we legitimately draw that conclusion, though? It may be that many other localities had also increased their social media advertising spending but did not see an increase in tourism; the level of spending may not affect tourism at all. It’s even possible that other localities spent more on social media advertising—we do not know because we fell into the trap of selecting cases on the dependent variable.
We may wish to do probability sampling but lack the resources, potentially making a quota sampling design a good option. This is somewhat of a cross between convenience sampling design and the stratified sampling designs. Before, when we wanted to include 12 male out-of- state graduate students in our sample, we constructed a sampling frame and randomly selected them. We could, however, select the first 12 male out-of-state graduate students we stumble upon, survey them to meet our quota for that category of student, and then seek out students in our remaining categories. (This is what those iPad-carrying marketing researchers at the mall and in theme parks are doing—and why they’ll ignore you one day and chase you down the next.) We’d still be very tentative about generalizing from this sample to the population, but we’d feel more confident than if our sample had been selected completely as a matter of convenience.
One final nonprobability sampling design is useful when cases are difficult to identify beforehand, like meth users, sex workers, or the behind-the-scenes movers-and-shakers in a city’s independent music scene. What’s a researcher wanting to interview such folks to do? Post signs and ask for volunteers? Probably not. She may be able to get that first interview, though, and, once that respondent trusts her, likes her, and becomes invested in her research, she might get referred to a couple more people in this population, which could lead to a few more, and so on. This is called (regrettably, I think, because I’d hate to have the term snowball in my serious research report) a snowball sampling design or (more acceptably but less popularly) a network sampling design, and it has been employed in a lot of fascinating research about populations we’d otherwise never know much about.