RECOMMENDED REVISIONS TO
THE "LONDON PRINCIPLES"
The following suggested revisions to the London Principles were developed by the Denver panel participants and agreed upon except where exceptions or alternative suggested wording are noted. London Principles text recommended to be deleted is crossed through, and recommended new or revised text is in bold italics.
A. I. PRINCIPLES AND QUESTIONS FOR EVALUATING
AN EPIDEMIOLOGIC REPORT FOR
Revisions to Principles A-1 to A-3
Principle I-1. The population studied should be pertinent to the risk assessment at hand, and it should be representative of a well-defined underlying cohort or population at risk.
Principle I-2. Study procedures should be described in sufficient detail, or available from the study's written protocol, to determine whether appropriate methods were used in the design and conduct of the investigation.
Principle I-3. The measures of exposure(s) or exposure surrogates should be: (a) conceptually relevant to the risk assessment being conducted; (b) based on principles that are biologically sound in light of present knowledge; and (c) properly quantitated to assess dose-response relationships.
Revisions to Principles A-4 to A-6
Principle I-4. Study outcomes (endpoints) should be clearly defined, properly measured, and ascertained in an unbiased manner. Definition and measurement of study outcomes:
Principle I-5. The analysis of the study's data should provide both point and interval estimates of the exposure's effect, including adjustment for confounding, assessment of interaction (e.g, effect of multiple exposures or differential susceptibility), and an evaluation of the possible influence of study bias. Data analysis methods and presentation:
Principle A-6. The reporting of the study should clearly identify both its strengths and limitations, and the interpretation of its findings should reflect not only an honest consideration of those factors, but also its relationship to the current state of knowledge in the area. The overall study quality should be sufficiently high that it would be judged publishable in a peer-reviewed scientific journal. Discussion and interpretation of study results:
II. PRINCIPLE FOR EVALUATING A BODY OF STUDIES ("META-ANALYSIS")
Principle B-6. A properly conducted meta-analysis, or preferably an analysis based on the raw data in the original studies, may be used in hazard identification and dose-response evaluation when such combination includes an evaluation of individual studies and an assessment of heterogeneity. Thecombined results ought to provide, more than any single study, precise risk estimates over a wider range of doses. Before using these tools, the gains should be judged sufficient to justify potential errors in inference resulting from combining studies of dissimilar design and quality.
Discussion of Principle B-6
This is one of the more important principlesbecause meta-analysis of multiple epidemiologic studies has become more common. Although the principle refers to "properly conducted meta-analysis", there are currently no generally accepted standards for employing it (nor is itmentioned in the 1986 EPA risk assessment guidelines that are currently in effect 17), although when used quantitatively it involves application of standard statistical methods. This principle does state several minimal standards for employing meta-analysis, however. Even applying these minimal standards, some felt that meta-analysis was still fraught with potential for error, and it was remarked that if meta-analysis (based on studies as they are often currently conducted) were used as the final determinant in risk assessment, "we are in trouble". Meta-analysis
requires consideration of the quality of the various studies, but it is very difficultand controversial to attach weights to studies for quality. Ranking for quality is not a black-and-white exercise; it requires a lot of expert judgment by epidemiologists.
Epidemiologists feel more comfortable in combining results from randomized clinical trials, when they follow similar protocols; but for observational studies, heterogeneity can create a "morass". If in the future epidemiological studies were to become more like randomizedclinical trials in the transparency and consistency of their protocols (e.g., what the dose metric will be, how exposure and outcome will be measured), they would become more suitable fora well-conducted meta-analysis.
Principle B-6 incorporates three minimal standards. The first, which applies to both hazard identification and dose-response assessment, is that the reviewer should examine the individual studies carefully for quality and for explanations for study differences rather than combining study results simplistically to get a single point estimate. Within this sub-principle there is a stated preference for analysis by "pooling" of raw data where possible. The second is that, for dose-response assessment, combining thedata should produce gains in terms of better risk measures with narrower confidence intervals over a wider range of doses. The third is that one should be mindful of the hazards of meta-analysis, and take into consideration that the body ofstudies for a particular agent may contain so many unexplainable inconsistencies as a consequenceof differences in quality, design, and results that application of a meta-analysis methodology may be counter-productive and may mislead by obscuring significant uncertainties. If there is one very good study, nothing may be gained by combining it with others; in reality, combining may makethe evaluation weaker. All of this requires agreat deal of expert judgment.
"Meta-analysis" as it is performed currentlyoften means summarizing relative risks weightedaccording to sample size, while an analysis that combines the raw data extracted from individual studies as if one were conducting a single large epidemiologic study is often referred to as
"pooling". Principle B-6 indicates a preferencefor "pooling" because it allows for a better evaluation of possible systematic flaws or weaknesses, whereas meta-analysis can amplify flaws or weaknesses (for example, by reinforcing biases in different studies); however, it cannot always be done because the raw data may not be available from all of the studies, and it is difficult to apply for dose-response purposes because many studies will not have sufficient exposure detail. Examples given of situations where there was sufficiently good exposure data, in terms of detail and uniformity, to allow pooling were the British analysis of EMF and childhood leukemia, where all the studies used the same method of measuring exposure, and the risk assessment for airborne radon, where the eleven uranium mining studies quantified exposures in a similar way. It is important that the combined analysis cover all studies, whether positive, negative, or inconclusive.
One participant raised the question of whether, after applying meta-analysis or pooling and finding a weak relative risk, the result would be considered stronger than a similar level ofrelative risk estimated from a single good study. This was considered a difficult question, andsome felt that the weak relative risk indicated by the meta-analysis would not be considered more significant unless there was some confirmatory animal data regarding the shape of the dose-response curve and biological plausibility, and even then they would consider the results "equivocal".
Principle II: Meta-analysis is a method for combining and contrasting results from different studies that can be used as an aid in the review of evidence in a particular area. A properly conducted meta-analysis, or when possible, an analysis based on the raw data of original studies, may be used in evaluating the epidemiologic evidence concerning the exposure-disease relation under investigation. Such a combination of evidence across epidemiologic studies should follow a rigorous protocol that includes a thorough evaluation of the methods and biases of individual studies, an assessment of the degree of conflict among studies, development of explanations for any conflicts, and, if the evidence is consistent, a quantitative summary of that evidence. The outcome of a meta-analysis of epidemiologic studies can provide more precise assessment of the inter-
study variation in the association under investigation, as well as the reasons for the differences observed across individual studies. When the evidence is conflicting, the primary goal of a meta-analysis should be identifying and estimating differences among study-specific results, rather than estimating an overall quantitative summary of the association across studies. With this goal, problems in combining evidence from studies that are dissimilar in design and quality can be minimized.
Discussion of Principle II
This is an important principle for epidemiology in regulatory risk assessment, particularly because meta-analyses are commonly used in policy formulation. The principle begins with a short definition of meta-analysis and then refers to a "properly conducted meta-analysis". A large body of literature is accumulating on the methods for meta-analysis that can be referred to for guidelines on how meta-analyses of observational epidemiologic studies should be performed.18 Meta-analysis should be considered a systematic method to aid in evaluating scientific evidence in an area. It allows quantitative evaluation of explanations for differences in results. Indeed, it should be viewed primarily as a comparative (analytic) method, rather than simply as a means for providing an overall quantitative summary or "synthesis" of the study results.
The principle also refers to the preference of obtaining individual study data and combining these for a "pooled" analysis. One should recognize, however, that pooling raw data from all published and unpublished studies is often impossible, and when possible is a time-consuming and labor-intensive process. The principle also refers to "exposure-disease" relations in order to encompass a wider range of health risk scenarios. The principle lists the main components that should be included in a meta-analysis. A protocol for the meta-analysis should include several specific steps. These steps are:
1) clearly identifying the study variables (i.e., disease outcome, exposure, confounders, effect modifiers, intermediate factors on the causal pathway);
2) identifying all studies performed;
3) identifying and extracting the relevant information from those studies;
4) quantifying the effects using the appropriate analytic models;
5) assessing heterogeneity of results across studies and possible sources of heterogeneity, including biases;
6) presenting the individual study results and, where appropriate, summary estimates across studies using descriptive, graphical and statistical analyses;
7) performing sensitivity and influence analyses to evaluate the assumptions made in the meta-analysis; and
8) interpreting the results with consideration of the limitations of the individual studies and the meta-analysis itself.
Two main objectives of meta-analysis are specifically mentioned in this principle. First is the analytic objective, which is the thorough assessment of differences across studies; this objective is essential unless there are absolutely no conflicts among the study results. Second is the synthetic objective, in which a summary estimate of association is constructed; this objective is justifiable only if there is little conflict among the study results used to create the summary.
III.A. RECOMMENDED ADDITIONAL PRINCIPLES
FOR THE USE OF EPIDEMIOLOGIC DATA
IN HEALTH HAZARD IDENTIFICATION
AND RISK ASSESSMENT
1. Data from well-conducted epidemiologic studies should be given more weight than data from animal or in vitro experimentation. Animal and in vitro studies, however, may provide clues regarding the mechanism underlying associations and aid in understanding causality. Epidemiological studies without biological plausibility should be considered less definitive.
2. Epidemiologic data and study reports, and their impact on the overall risk assessment, should be evaluated by epidemiologists. Scientists from other relevant disciplines should be consulted regarding consistency between the epidemiologic data and non-epidemiologic data.
3. The potential for study results being the result of chance, bias, or confounding should be carefully considered and explored in the assessment.
4. Reasons for all significant inconsistent or implausible results in study findings should be explored. The Bradford Hill factors (or the Surgeon General's criteria) may provide a useful framework for assessing whether there is sufficient evidence of a causal relationship.20 A well-conducted statistical meta-analysis (consistent with revised Principle B-6), or a careful qualitative meta-analysis, can provide a useful means for exploring the degree of inconsistency or
consistency in the available studies in connection with consideration of Hill's factors.21
5. Statistical methods for adjusting study findings should not be employed unless they have been sufficiently validated and are consistent with available relevant biological and toxicological information on mechanisms.
6 All relevant data and study findings should be considered. If there are significant gaps in the reported data or findings, they should be addressed by first consulting with the investigators, and then by reflecting uncertainties in the assessment as necessary.
7. The suitability and sufficiency of the epidemiologic data for evaluating the specific exposure circumstances of concern to risk managers and the public should be discussed and reflected in the overall assessment.
8. In a quantitative risk assessment, any significant uncertainties in the hazard, exposure, or dose-response assessments should be reflected in the overall results. The overall results should include information on dose-response, consistency or inconsistency between human and animal data, and the ability to estimate a likely threshold for adverse effects in humans.
9. The potential importance of the public health issues under consideration should not be a rationale for giving more weight to epidemiologic findings that are otherwise less certain.22
10. Hazard identification and risk assessment results should be communicated in a manner that addresses the likely concerns of risk managers and exposed populations.
III.B. QUESTIONS FOR SUPERVISING RISK ASSESSORS
AND RISK MANAGEMENT OFFICIALS TO ASK
THE EPIDEMIOLOGISTS AND OTHER
RISK ASSESSMENT TEAM MEMBERS
1.1. The following questions assume that an interdisciplinary team of reviewers, including a number of expert epidemiologists, has examined the relevant epidemiologic evidence (and that scientists from other disciplines have reviewed any relevant animal or other data) and prepared at least a preliminary risk assessment, including a risk characterization, the latter of which has been read by the person(s) asking these questions.
1.2. If there are specific legal criteria that are applicable to a particular risk assessment issue, the questions below should be tailored to those criteria.
1.3. These questions are not necessarily the final step in the risk assessment process, since it is assumed that the questions may expose areas that require additional work or clarifications, resulting in revisions and another similar review.
1.4. It is assumed that a hazard identification or risk assessment document is being prepared in order to inform the general public and/or to determine whether government regulations are needed, and what regulations, if any, should be promulgated.
1.5 Unlike the other two sets of questions (the first for evaluating individual studies, and the second for evaluating a body of studies), these questions are not all worded so that a Yes answer is preferable. This is intentional.
Questions/Checklist Regarding the
Weight of the Human Evidence
2.1. Will the assessment team be regarded as having a high degree of relevant expertise and absence of bias?
2.2. Were any policy or value preferences put aside, and will the assessment be viewed as scientifically objective?
2.3. Are there sufficient data of sufficient quality to draw firm conclusions, or are there significant uncertainties?
2.4. Were there any significant disagreements among the team within their areas of expertise?
2.5. Are there any aspects of the analysis that would not be regarded as meeting accepted scientific norms? Are you confident that all significant aspects of the assessment would be accepted by an expert external peer review panel?
2.6. Is there sufficient epidemiologic evidence presented in the assessment to show convincingly a causal relationship between exposure to the agent and an adverse health effect, or is the evidence of lesser weight?
2.7. What additional work, if any, would need to be done to address and resolve any significant uncertainties? Is such work feasible? How much time and money would be required? Are there studies under way that could resolve significant uncertainties? How long will it be before their findings are available?
2.8. More specifically, with regard to assessing the weight of the evidence --
a. Does the assessment take into consideration all studies and data that should be considered?
b. Were there studies or data that were not taken into consideration, or which should be given limited weight or relevancy, due to uncertainties concerning their possible biases or other limitations?
c. If any arguably significant studies or data were not taken into consideration, what was the rationale? Were they seriously flawed in some way? Were there problems in determining their relevancy to the risk assessment situation under consideration?
d. Were some studies given more weight than others; and if so, is the rationale clearly expressed and scientifically valid?
e. Are there uncertainties in the data or study reports that might be explained by obtaining additional information from the investigators, and has an attempt been made to do so?
f. Has the nature of the exposure been properly and consistently characterized in the assessment, or does the assessment need to differentiate in some manner among different exposure circumstances?
g. Are the study findings within the overall body of evidence consistent?
h. If the study findings are not consistent, can the inconsistencies be explained?
i. Have plausible alternative explanations for associations been explored?
j. Have any statistical adjustments made for potential confounding or bias been validated, and have they taken into account all of the likely potential sources of confounding and bias?
k. Are the study findings relied upon for the conclusions in the assessment sufficiently strong to rule out effectively the possibility that an apparent association is due to chance, bias or confounding that cannot be sufficiently identified and adjusted for?
l. Have any temporality issues been addressed? (i.e., Is it clear that any adverse health effects developed a sufficient length of time after exposure, rather than having existed or been developing prior to exposure?)
m. Are there biological or other data that are not coherent with the epidemiologic findings, and have these issues been addressed?
n. Are there biological data that argue strongly for or against the plausibility of a causal relationship?
o. Should we expect the studies to show a dose-response relationship, and do they?
p. Are there sufficient exposure data of sufficient accuracy to support a reasonable dose-response assessment?
q. Are the findings from the epidemiologic studies consistent with relevant animal data and any quantitative assessment method(s) employed in the assessment, particularly in the area of dose-response? Are there reasons for regarding animal data as not significant or of limited value for supplementing the human evidence?
r. If there are not sufficient data for establishing a dose-response relationship, are there sufficient data to determine a no-observable-adverse-effect level (NOAEL)?
s. Have any significant uncertainties (including in the hazard assessment) been reflected in the quantitative estimate(s) of risk (if any)?
2.9. Is it clear from the risk assessment document how the data were evaluated, particularly with regard to any issues likely to be considered significant or controversial? Does the document describe the analytic methods sufficiently to allow other scientists to replicate and check the analysis?
2.10. If you were given responsibility for challenging the risk assessment, what particular points would you raise? Have those points been clearly and satisfactorily addressed in the assessment document?
2.11. Are there certain caveats, qualifications, or explanations that should be added to the findings to ensure that regulatory officials and the public understand their relevance? For example, do the findings appear to apply only to certain subpopulations or under certain conditions?
2.12. Upon further consideration, are there clarifications or other revisions that should be made to the risk characterization?
2.13. What questions that have not been asked should be asked?
2.14. In the end, what level of confidence do you have in your overall conclusions and the supporting explanations in the assessment?
11Several of the panelists wished to note that these principles and questions, and the ones for meta-analysis, could be used to evaluate whether there was a causal relationship with a beneficial, as well as an adverse, health outcome attributable to exposure.
12Two panelists suggested adding following "disease": "who are sampled from the source population of the cases".
13One panelist suggested that "for potential confounders . . . ." be replaced with "for selection bias. . . ."
14Two panelists were unsure whether this covered sufficiently whether the health outcome measured was the most relevant one.
15Two of the panelists wished to note in connection with this subquestion, and also (c) and (d), that if more than one surrogate were used to assess a given outcome, one should ask whether the surrogates gave roughly the same answer both in terms of direction and effect size.
16One panelists suggested changing "measures" to "estimates".
17Subsequent to the conference, EPA released proposed revisions to the 1986 guidelines on April 21, 1996. The revisions specifically addressmeta-analysis in subsection 126.96.36.199.("Criteria for Assessing Adequacy of Epidemiological Studies") under the heading "Combining Statistical Evidence Across Studies". With regard to proposed standards, see the ILSI report referenced in footnote 11, above.
18See Greenland S., Chapter 32 - "Meta-analysis", In: Rothman KJ, Greenland S., Modern Epidemiology (Second Ed. Philadelphia: Lippincott-Raven Publishers, 1998).
19Several of the epidemiologists noted their wariness towards use of Hill's factors due to their many limitations (see, e.g., the discussion at pp. 24-28 in the chapter on "Causation and Causal Inference" by K.J. Rothman and S. Greenland in MODERN EPIDEMIOLOGY (2d ed. 1998, Lippincott-Raven)), and the inappropriately rigid manner in which those factors or criteria had been applied by some; and they indicated their preference for use of the more detailed analysis outlined in their revised version of the London principles. See also the limited discussion of some of these issues above in the Discussion Summary above at pages 32-33.
20This factor can be considered appropriately during the risk management phase of a regulatory decisionmaking process..