Research "Proves" - Very Little
Research proves... I wrote that in a discussion post during my first year of doctoral studies. My professor promptly corrected me. He explained that while social science research studies - such as those in education - might illustrate a finding, intimate a relationship, or imply a particular set of conditions, they cannot "prove" anything. When studies involve people, with all of their individual preferences, characteristics, cultures, and idiosyncrasies, there is no way to completely mitigate all of the threats to the validity of a study. Therefore, most research cannot be considered proof of anything.
Carmines and Zeller (1979) describe validity as the relationship between a concept and an indicator. In other words, whether or not the concrete study actually represents the intended abstract concept. They also describe generalizability, or what researchers Shadish, Cook, and Campbell (2002) refer to as external validity - the ability to translate findings from a specific sample to a broader population or from one setting to another. Think about it this way: just because a particular concept or strategy works with one particular group of students or in one particular school does not guarantee that it will work in another.
To make a causal claim that x causes y, or even to say that x proves y, the researcher would need to mitigate all of the threats to both the internal and external validity of a study. They would have to ensure that nothing could impact or cause the observed results. Educational psychologist, Lee Cronbach, argues that the challenge lies in the ability to make inferences or assumptions because of UTOS - Units, Treatments, Observations, and Setting.
Who participated in the study and how did they get selected? The sampling method plays a large role in whether or not the findings can be generalized from one group of people to another. When looking at a sample, a critical reader needs to determine how the researcher selected the group; to identify whether or not the group accurately represents a broader population; and to look at whether or not the backgrounds of the individuals in the study might affect the results.
For example, I conducted a poorly designed study as part of an assignment a few years ago. I surveyed participants at the annual Consortium of School Networking (CoSN) conference about their attitudes towards technology. Even though I randomly gathered my sample, it came from within a population of individuals who chose to attend a conference that specifically attracts school and district technology leaders. Therefore, I could not generalize the data that I collected from my survey to a broader population of educators.
Was the study itself implemented as intended and designed? Just a few weeks into my own dissertation study, I am documenting the ways in which my actual program has deviated from the original plan. Despite my best intentions to provide a standardized program, different groups will end up with slightly different experiences. This does not completely discredit my work, but I have to document each discrepancy and make sure to include it as part of my final analysis. Otherwise, I will not be able to accurately interpret the outcomes of my study.
Beyond what the researcher literally sees, what do the numbers actually say? Does a statistical significance exist? Take my own disastrous study as an example. Though I could assert that 75% of superintendents indicated that they made their technology purchasing decisions based on alignment with curriculum and pedagogy as compared to 25% of technology directors, you should question that statement upon recognizing that I only surveyed six superintendents and 15 technology directors!
Further, when comparing average scores, what is the significance of the difference? Shadish et al. (2002) describe not only the need for a large enough sample size to measure an effect, but also a research design that increases the likelihood of being able to make a claim. Here is a simple way to think about it. In one fictional study, the researchers implement a six-week program with students to boost their achievement. At the end of six weeks, they test those students and then compare the scores to a group that did not participate in the program. We see these studies all of the time. However, this is a naive design. What if the group that participated in the program already performed higher than the control group? Or, what if the group that did not participate in the program had a higher average IQ than the group that did participate? Did the researchers have a pre-test? Did they take factors such as IQ into account? All of these questions need to be addressed before making any statements about how the results could be applied to various populations.
Finally, where did the study take place? Particularly in education, a disconnect often exists between research and real-life classroom. If a study occurred in a controlled laboratory environment with a small group of students, does that mean the findings will extend to a large urban district or a small rural one?
This post is not intended to discredit research findings or researchers. However, I have read a number of editorials and articles lately that make sweeping generalizations without critically questioning the units, treatments, observations, and setting of the study. In the coming weeks, I will be collaborating on a series of posts with Joshua Eyler, Adjunct Associate Professor and Director of the Center for Teaching Excellence at Rice University, to address some wide sweeping claims about technology, note taking, and learning based on studies that may not have mitigated all of the threats to external validity. Stay tuned!
Carmines, E. G., & Zeller, R. A. (1979). Reliability and validity assessment. Thousand Oaks, California: SAGE Publications, Inc.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin.