Conceptualizing and operationalizing (and sometimes hypothesizing)
Research questions are an essential starting point, but they tend to be too abstract. If we’re ultimately about making observations, we need to know more specifically what to observe. Conceptualization is a step in that direction. In this stage of the research process, we specify what concepts and what relationships among those concepts we need to observe. My research question might be How does government funding affect nonprofit organizations? This is fine, but I need to identify what I want to observe much more specifically. Theory (like the crowding out theory I referred to before) and previous research help me identify a set of concepts that I need to consider: different types of government funding, the amount of funding, effects on fundraising, effects on operations management, managerial capacity, donor attitudes, policies of intermediary funding agencies, and so on. It’s helpful at this stage to write what are called nominal definitions of the concepts that are central to my study. These are definitions like what you’d find in a dictionary, but tailored to your study; a nominal definition of government subsidy would describe what I mean in this study when I use the term.
After identifying and defining concepts, we’re ready to operationalize them. To operationalize a concept is to describe how to measure it. (Some authors refer to this as the operational definition, which I find confuses students since it doesn’t necessarily look like a definition.) Operationalization is where we get quite concrete: To operationalize the concept revenue of a nonprofit organization, we might record the dollar amount entered in line 12 of their most recent Form 990 (a financial statement nonprofit organizations must file with the IRS annually). This dollar amount will be my measure of nonprofit revenue.
Sometimes, the way we operationalize a concept is more indirect. Public support for nonprofit organizations, for example, is more of a challenge to operationalize. We might write a nominal definition for public support that describes it as having something to do with the sum of individuals’ active, tangible support of a nonprofit organization’s mission. We might operationalize this concept by recording the amount of direct charitable contributions, indirect charitable contributions, revenue from fundraising events, and the number of volunteer hours entered in the respective Form 990 lines.
Note that when we operationalized nonprofit revenue, the operationalization yielded a single measure. When we operationalized public support, however, the operationalization yielded multiple measures. Public support is a broader, more complex concept, and it’s hard to think of just one measure that would convincingly represent it. Also, when we’re using measures that measure the concept more indirectly, like our measures for public support, we’ll sometimes use the word indicator instead of measure. The term indicator can be more accurate. We know that measuring something as abstract as public support would be impossible; it is, after all, a social construct, not something concrete. Our measures, then, indicate the level of public support more than actually measure it.
I just slipped in that term, social construct, so we should go ahead and face an issue we’ve been sidestepping so far: Many concepts we’re interested in aren’t observable in the sense that they can’t be seen, felt, heard, tasted, or smelled. But aren’t we supposed to be building knowledge based on observations? Are unobservable concepts off limits for empirical social researchers? Let’s hope not! Lots of important concepts (maybe all the most important concepts) are social constructs, terms that don’t have meaning apart from the meaning we, collectively, assign to them. Consider political literacy, racial prejudice, voter intent, employee motivation, issue saliency, self-esteem, managerial capacity, fundraising effectiveness, introversion, and Constitutional ideology. These terms are a shorthand for sets of characteristics that we all more or less agree “belong” to the concepts they name. Can we observe political ideology? Not directly, but we can pretty much agree on what observations serve as indicators for political ideology. We can observe behaviors, like putting bumper stickers on cars, we can see how people respond to survey items, and we can hear how people respond to interview questions. We know we’re not directly measuring political ideology (which is impossible, after all, since it’s a social construct), but we can persuade each other that our measures of political ideology make sense (which seems fitting, since, again, it’s a social construct).
Each indicator or measure—each observation we repeat over and over again—yields a variable. The term variable is one of these terms that’s easier to learn by example than by definition. The definition, though, is something like “a logical grouping of attributes.” (Not very helpful!) Think of the various attributes that could be used to describe you and your friends: brown hair, green eyes, 6’2” tall, brown eyes, black hair, 19 years old, 5’8” tall, blue eyes, and so on. Obviously, some of these attributes go together, like green eyes, brown eyes, and blue eyes. We can group these attributes together and give them a label: eye color. Eye color, then, is a variable. In this example, the variable eye color takes on the values green, brown, and blue. In many research designs, our goal in making observations is to assign values to variables for cases. Cases are the things—here, you and your friends—that we’re observing and to which we’re assigning values. In social science research, cases are often individuals (like individual voters or individual respondents to a survey) or groups of people (like families or organizations), but cases can also be court rulings, elections, states, committee meetings, and an infinite number of other things that can be observed. The term unit of analysis is used to describe cases, too, but it’s a more general term; if your cases are firefighters, then your unit of analysis is the individual.
Getting this terminology—cases, variables, values—is essential. Here are some examples of cases, variables, and values . . .
- Cases: undergraduate college students; variable: classification; values: Freshmen, Sophomore, Junior, Senior;
- Cases: states; variable: whether or not citizen referenda are permitted; values: yes, no;
- Cases: counties; variable: type of voting equipment; values: manual mark, punch card, optical scan, electronic;
- Cases: clients; variable: length of time it took them to see a counselor; values: any number of minutes;
- Cases: Supreme Court dissenting opinions; variable: number of signatories; values: a number from 0 to 4;
- Cases: criminology majors; variable: GPA; values: any number from 0 to 4.0.
Researchers have a language for describing variables. A variable’s level of measurement describes the structure of the values it can take on, whether nominal, ordinal, interval, or ratio. Nominal and ordinal variables are the categorical variables; their values divide up cases into distinct categories. The values of nominal-level variables have no inherent order. The variable sex can take on the values male and female; eye color—brown, blue, and green eyes; major— political science, sociology, biology, etc. Placing these values in one order—brown, blue, green— makes just as much sense as any other—blue, green, brown. The values of ordinal-level variables, though, have an inherent order. Classification—freshmen, sophomore, junior, senior; love of research methods—low, medium, high; class rank—first, second, . . . , 998th. These values can be placed in an order that makes sense—first to last (or last to first), least to most, best to worst, and so on. A point of confusion to be avoided: When we collect and record data, sometimes we assign numbers to values of categorical variables (like brown hair equals 1), but that’s just for the sake of convenience. Those numbers are just placeholders for the actual values, which remain categorical.
When values take on actual numeric values, the variables they belong to are numeric variables. If a numeric variable takes on the value 28, it means there are actually 28 of something—28 degrees, 28 votes, 28 pounds, 28 percentage points. It makes sense to add and subtract these values. If one state has a 12% unemployment rate, that’s 3 more points than a state with a 9% unemployment rate. Numeric variables can be either interval-level variables or ratio-level variables. When ratio-level variables take on the value zero, zero means zero—it means nothing of whatever we’re measuring. Zero votes means no votes; zero senators means no senators. Most numeric variables we use in social research are ratio-level. (Note that many ratio-level variables, like height, age, states’ number of senators, would never actually take on the value zero, but if they did, zero would mean zero.) Occasionally, zero means something else besides nothing of something, and variables that take on these odd zeroes are interval-level variables. Zero degrees means—well, not “no degrees,” which doesn’t make sense. Year zero doesn’t mean the year that wasn’t. We can add and subtract the values of interval-level variables, but we cannot multiply and divide them. Someone born in 996 is not half the age of someone born in 1992, and 90 degrees is not twice as hot as 45.
We can sometimes choose the level of measurement when constructing a variable. We could measure age with a ratio-level variable (the number of times you’ve gone around the sun) or with an ordinal-level variable (check whether you’re 0-10, 11-20, 21-30, or over 30). We should make this choice intentionally because it will determine what kinds of statistical analysis we can do with our data later. If our data are ratio-level, we can do any statistical analysis we want, but our choices are more limited with interval-level data, still more limited with ordinal-level data, and most limited with nominal-level data. (See Appendix E on equity in research for an explanation of how dummy coding can be used to helpfully transform categorical variables to ratio-level variables.)
Variables can also be described as being either continuous or discrete. Just like with the level of measurement, we look at the variable’s values to determine whether it’s a continuous or discrete variable. All categorical variables are discrete, meaning their variables can only take on specific, discrete values. This is in contrast to some (but not all!) numeric variables. Take temperature, for example. For any two values of the variable temperature, we can always imagine a case with a value in between them. If Monday’s high is 62.5 degrees and Tuesday’s high is 63.0 degrees, Wednesday’s high could be 62.75 degrees. Temperature, then, measured in degrees, is a continuous variable. Other numeric variables are discrete variables, though. Any variable that is a count of things is discrete. For the variable number of siblings, Anna has two siblings and Henry has three siblings. We cannot imagine a person with any number of siblings between two and three—nobody could have 2.5 siblings. Number of siblings, then, is a discrete variable. (Note: Some textbooks and websites incorrectly state that all numeric variables are continuous. Do not be misled.)
If we’re engaging in causal research, we can also describe our variables in terms of their role in causal explanation. The “cause” variable is the independent variable. The “effect” variable is the dependent variable. If you’re interested in determining the effect of level of education on political party identification, level of education is the independent variable, and political party identification is the dependent variable.
I’m being a bit loose in using “cause” and “effect” here. Recall the concept of underlying causal mechanism. We may identify independent and dependent variables that really represent a much more complex underlying causal mechanism. Why, for example, do people make charitable contributions? At least four studies have asked whether people are more likely to make a contribution when the person asking for it is dressed nicely. (See the examples cited in Bekkers and Wiepking’s 2010 “A Literature Review of Empirical Studies of Philanthropy,” Nonprofit and Voluntary Sector Quarterly, volume 40, p. 924, which I also recommend for its many examples of how social research explores questions of causality.) Do these researchers believe the quality of stitching might affect altruism? Sort of, but not exactly. More likely, they believe potential donors’ perceptions of charitable solicitors may shape their attitudes toward the requests, which will make them more or less likely to respond positively. It’s a bit reductionist to say charitable solicitors’ clothing “causes” people to make charitable donations, but we still use the language of independent variables and dependent variables as labels for the quality of the solicitors’ clothing and the solicitees’ likelihood of making charitable donations, respectively. Think carefully about how this might apply anytime an independent variable—sometimes more helpfully called an explanatory variable—is a demographic characteristic. Women, on average, make lower salaries than men. Does sex “cause” salary? Not exactly, though we would rightly label sex as an independent variable and salary as a dependent variable. Underlying this simple dyad of variables is a set of complex, interacting, causal factors—gender socialization, discrimination, occupational preferences, economic systems’ valuing of different jobs, family leave policies, time in labor market—that more fully explain this causal relationship.
Identifying independent variables (IVs) and dependent variables (DVs) is often challenging for students at first. If you’re unsure which is which, try plugging your variables into the following phrases to see what makes sense:
- IV causes DV
- Change in IV causes change in DV
- IV affects DV
- DV is partially determined by IV
- A change in IV predicts a change in DV
- DV can be partially explained by IV
- DV depends on IV
In the later section on formal research designs, we’ll learn about control variables, another type of variable in causal studies often used in conjunction with independent and dependent variables.
Sometimes, especially if we’re collecting quantitative data and planning to conduct inferential statistical analysis, we’ll specify hypotheses at this point in the research process as well. A hypothesis is a statement of the expected relationship between two or more variables. Like operationalizing a concept, constructing a hypothesis requires getting specific. A good hypothesis will not just predict that two (or more) variables are related, but how. So, not Political science majors’ amount of volunteer experience will be related to their choice of courses, but Political science majors with more volunteer experience will be more likely to enroll in the public policy, public administration, and nonprofit management courses. Note that you may have to infer the actual variables; hypotheses often refer only to specific values of the variables. Here, public policy, public administration, and nonprofit management courses are values of the implied variable, types of courses.