[ The Null Hypothesis | Links ]
Many people assume that science provides clear answers to guide personal and public health policy decisions. In fact, although science is an important tool, it is often a blunt tool, capable of answering only certain narrow questions. Without an understanding of the strengths and the limitations of science, and an awareness of how both can be used and manipulated, we risk making personal and public policy decisions that will not best protect our children and ourselves.
This chapter first examines what science and scientific inquiry can and cannot do, what kinds of answers they can and cannot provide. Following is a look at animal testing and epidemiological research, two forms of scientific inquiry that are central to decision making concerning environmental reproductive toxicants. Finally, we consider quantitative risk assessment, a tool meant to improve our ability to make decisions by incorporating both scientific knowledge and uncertainty.
Science is a way of looking at the world, a system for increasing our knowledge of the world, a tool for making decisions about how to act in the world. Fundamentally, science provides a structure for exploration: the scientific method, a well-defined system that scientists worldwide recognize as the basis for scientific understanding.
The scientific method, a system of hypothesis generation and testing, is the same whether used by laboratory scientists, epidemiologists, wildlife biologists, or others. A scientist begins with a curiosity about some particular topic and a question. Based on the scientist's previous observations and experience, she develops a hypothesis about how, why, or whether a particular phenomenon occurs. Next, she develops a test, or an experiment, to see how her hypothesis holds ap in the face of the evidence. In fact she is testing two hypotheses: the second, called the null hypothesis, is essentially the negative of the hypothesis she has developed. If, for example, her hypothesis is that (1) treating rats with the solvent toluene will result in birth defects in their offspring, the null hypothesis will state that (2) treating rats with toluene will not result in birth defects. Although it may seem counterintuitive, the scientist proceeds from the assumption that the null hypothesis is true.
Many nonscientists are not aware of the role of the null hypothesis in the scientific method which can lead to misunderstandings between scientists and nonscientists. In the logic of the scientific method, a scientist looking for evidence to support a hypothesis must do so by disproving the corresponding null hypothesis. In other words if she finds in her experiment that rats treated with toluene do in fact give birth to many more offspring with birth defects than do untreated rats, then she can no longer say that treating rats with toluene will not lead to birth defects. She then rejects the null hypothesis, and her experimental results support, but do not prove, the original hypothesis.
When deciding whether she can reject the null hypothesis, the scientist will use specific statistical techniques to assess the results of her experiment to determine whether the results are statistically significant. Statistical significance is a measure of the likelihood that a result did not occur due to chance alone. If the scientist decides that her result is not statistically significant, she fails to reject the null hypothesis. She states that nothing statistically significant happened in her experiment to support her original hypothesis, and she must go on to develop another hypothesis or refine the current one.
If, however, she decides that her result is statistically significant, she rejects the null hypothesis. Now she is in a curious position: she has shown that her on nag hypothesis has been supported, but she leas riot roved her hypothesis. She has simply shown that she has not found any result that disagrees with her original hypotheses. This is the scientists' dilemma: they can only support, but never prove a hypothesis. With more research and further experimentation the original hypothesis can be refined and further supported, elevated to the status of a theory, but in the end, the truth of a hypothesis is a matter of judgement rather than h proof.
Nonscientists can be frustrated when scientists are unwilling to give absolute answers. The inability to provide such answers is one of the limitations of the scientific method. The scientist, knowing that her method of analyzing cannot absolutely prove a hypothesis, will tend to be conservative in making a statement about the truth of a hypothesis. She knows that further research may reveal that the hypothesis is inadequate in some way, and so will state that the science is incomplete. The scientist may feel fairly confident that her hypothesis is valid if there is a good body of research supporting it and numerous other researchers have confirmed her results, but she still cannot say that she is absolutely sure. Many of the difficulties in using science in the decisionmaking process thus have their roots in the very nature of the scientific method.
As we have seen, scientific studies start with the assumption that the null hypothesis is true. For example, a scientist could assume as a null hypothesis, that maternal exposure to toluene is not related to spontaneous abortion. After designing and conducting a study looking at the issue, the scientist would apply a test of statistical significance to determine whether the results justify rejecting the null hypothesis. A test of statistical significance assesses how much the results of a particular study deviate from those predicted by the null hypothesis. These statistical methods say nothing about whether a particular result is likely to be true or false. They simply indicate the probability that a certain result would be seen by chance alone, if there was really no association between the exposure and the effect in question.
One of the most commonly calculated statistics in epidemiology is the p-value, usually expressed as a percentage in decimal notation. The p-value indicates the probability of obtaining a result as extreme as the observed result if the null hypothesis is actually true. For example, a study may show many more spontaneous abortions in mothers exposed to toluene during pregnancy, and the p-value for this result may be less than 0.10. This means that if there is no systematic error in the study, and if there is really no association between miscarriage and toluene exposure, the probability of obtaining a result such as this would be less than 10 percent. To most nonscientists, this would suggest that the result supports an association between toluene exposure and miscarriage. Scientists, however, hold their research to a higher standard. By convention, scientists do not report that their results are "statistically significant" unless the p-value is less than 0.05, or 5 percent.
The decision to set the standard of statistical significance at less than 5 percent is arbitrary. One could argue that setting the level at less than 50 percent makes sense, since at that point it becomes more likely than not that a result is not due to chance alone. Alternatively, one could set the standard at less than 1 percent, arguing that only at such a high level can we be confident that the result is not due to chance. The scientific community has chosen 5 percent as its convention for a number of reasons--reasons that may not always apply in the context of public policy decision making.
Sometimes the results of a study may be presented with a "95 percent confidence interval." The confidence interval defines a range within which the true result is highly likely to fall. This statistical tool can tell a scientist that if there is no source of bias in the experiment, and she repeats it one hundred times, the results are likely to fall within this range ninety-five times. If a study finds that toluene triples the risk of spontaneous abortion, the investigators may report a 95 percent confidence interval, indicating that toluene may do anything from increasing the risk of spontaneous abortion by 20 percent to increasing it sixfold. In statistical notation, this would be reported as OR = 3.0 [1.2-6.0], where OR stands for odds ratio, a statistical measure of likelihood. An odds ratio of 1 indicates no increased or decreased risk, and represents the null hypothesis in this case. A 95 percent confidence interval that includes the number 1 in the range is therefore considered not statistically significant. If the number 1 is not within the range, such as in the example, the results are reported as statistically significant. The 95 percent confidence interval is more informative about the possible range of results than is the p-value, but it is still tied to the convention of requiring a p-value of less than 0.05 to be considered significant.
When the p-value is greater than 5 percent or when the 95 percent confidence interval includes the null value of 1, a study is usually reported as negative, or as not having found significant evidence of an association. A negative study that fails to reach the conventional criteria for statistical significance is often considered to be a justification for ending research in a particular area. In our example, a study that finds more cases of miscarriage in mothers exposed to toluene but fails to achieve statistical significance might be interpreted to the public as a negative study, showing no association between toluene and miscarriage. This is a misinterpretation of the science and of the statistics. In fact, there may or may not be an association between toluene and miscarriage. The study assumed that there was no association (null hypothesis). It then found a result that might be found by chance alone with a greater than 5 percent likelihood, and on that basis failed to reject the null hypothesis. Because these statistical concepts are confusing (and often confuse even scientists) it is easy to lose track of the fact that an association, even if it is not statistically significant may still have important public health significance.
When a scientist determines that a result is statistically significant, a number of possible interpretations exist. There is still the possibility, though slight, that the result is due to chance, or that a bias in the study design resulted in a spurious association. Alternatively, the result may indicate a real association between an exposure and a result. The association may be causal, meaning that the exposure actually causes the result observed. However, it is also possible that the relationship between the two factors is not so clear and, in fact, a third factor, unrecognized in the research, comes into play. For example, an association between coffee consumption and risk of heart attack might be assumed to indicate a causal relationship. However, if heavy coffee drinkers also tend to be cigarette smokers, then the connection with heart attacks may actually have to do with the cigarette smoking, not coffee consumption. Such a third, hidden variable is called a confounder.
If a scientist decides that a result is not statistically significant, again there are a number of possible interpretations. There may truly be no relationship between the exposure and the result. Alternatively, there may be an association, but because of chance or a bias in the study, no association is recognized. An important possibility to consider, however, is that the experiment did not have enough statistical power.
Statistical power refers to the ability of a study or experiment to detect reliably the effects that are being examined. For example, if an outcome is rare, then a scientist needs a large study population to determine whether the rate of occurrence changes. A study conducted on one hundred lab animals is unlikely to detect a doubling of birth defect rates if the defects normally occur only once in a thousand births. Similarly, in epidemiological research, studies must have a large enough study population to recognize small differences in rates of disease. A study on a small population may be able to detect major changes, such as a tripling of the disease rate, but a larger population would be required to detect a 20 percent increase in disease.
The scientific method is used for two distinct purposes. To many scientists, its primary use is in the quest for truth-the search for a more complete and accurate understanding of the phenomena of our world. In order to avoid false assertions that would lead away from the truth, scientists have rigorous standards for judging their results. In the scientists' view, it is better to avoid drawing a conclusion than to draw a false conclusion. Nonscientists, however, count on this same tool to generate information to help make decisions about how to act, whether on a personal or a societal level. Rather than seeking absolute truth, the decision maker seeks guidance-evidence that will help justify one course of action over another.
Since scientists choose to be extremely cautious in the reporting of their results, decision makers are often in a difficult position. The information they need will almost always be incomplete, but in the interest of protecting health, they must decide whether to take action. The scientist is cautious in the search for knowledge; the decision maker may wish to be cautious in protecting health. A better understanding of the limits of the scientific method will help us all use existing scientific data to make health protective decisions.
Aside from the limitations of the scientific method itself, a number of other factors can limit the effectiveness or relevance of scientific research. Scientific research is a process driven by people's curiosity and interest in certain areas, and that curiosity will shape the resulting research. From its first step, the scientific method is entirely dependent on a number of human factors. The questions that get asked are informed by, for instance, the scientists' previous work, their ability to make new connections between ideas, the established understanding of that area of inquiry, the interest of the department head, and the source of the funding for the lab. Science can attempt to address only questions that are actually being asked; questions that are not asked will not be answered.
Even when certain questions are asked, the interpretation of the answers can be colored by an individual's perspectives and preconceptions. Although the concept of scientific objectivity is a central aspect of the scientific endeavor, all human beings, including scientists, interpret their world based on what they already know and believe. What one scientist perceives as a benign change in organ function, another may perceive as an injury to the animal. A surprising result may be written off as a mistake, contamination, or a fluke instead of being fully evaluated. For this reason, scientists must be extremely explicit about their assumptions and observations so that' others may make their own judgments of their work.
In the end, we must recognize that while science can inform us in many ways, science cannot itself make decisions for us. Science is a tool. When it is used appropriately, it can provide some of the information we need in order to make decisions likely to protect health.
source: From Chapter 2 Generations at Risk: Reproductive Health and the Environment Ted Schettler, M.D., M.P.H., Gina Solomon, M.D., M.P.H., Maria Valenti, and Annette Huddle, M.E.S., The MIT Press, Cambridge, Massachusetts - London, England. Major funding for the original report on which the book was based is from the Jessie B Cox Charitable Trust and the John Merck Fund. Also the Cox Trust, the Alton Jones Foundation, and the Pew Charitable Trusts, and the CS Mott Foundation
PSR/San Francisco
|
PSR/New York City
|
PSR/Greater Boston
|
|
If you have come to this page from an outside location click here to get back to mindfully.org |