The following suggested revisions to the London Principles were developed by the Denver panel participants and agreed upon except where exceptions or alternative suggested wording are noted. London Principles text recommended to be deleted is crossed through, and recommended new or revised text is in bold italics.


Revisions to Principles A-1 to A-3

Principle I-1. The population studied should be pertinent to the risk assessment at hand, and it should be representative of a well-defined underlying cohort or population at risk.

a. Were study subjects representative
of exposed and unexposed persons
(cohort study), or of diseased and
non-diseased persons
of persons at risk of getting disease
(case-control study)? If not, were data
collected on major risk factors for
the disease or condition under study
to allow appropriate adjustment in
the analysis for potential confounders
and evaluation of effect modifiers
in the analysis?

p. 36

b. To minimize bias Were exposed and
unexposed persons comparable "at
baseline" (cohort study), or were
cases similar to controls,
prior to
exposure, with respect to major risk
factors for the disease or condition
under study? If not, were data collected
on major risk factors for the disease or
condition under study to allow
appropriate adjustment in the analysis
for potential confounders and
evaluation of effect modifiers in
the analysis?

Principle I-2. Study procedures should be described in sufficient detail, or available from the study's written protocol, to determine whether appropriate methods were used in the design and conduct of the investigation.

a. To minimize the potential for bias,
Were interviewers and data collectors
blind to the exposure status and
case/control status of study subjects
and to the hypothesis being tested?
b. Were there procedures for quality
control in place for all major aspects
of the study's design and
implementation (e.g., ascertainment
and selection of subjects for study,
methods of data collection and
analysis, follow-up, etc).


c. Were the effects of nonparticipation,
a low response rate nonresponse,
or loss to follow-up taken into account in
producing the study results?

Principle I-3. The measures of exposure(s) or exposure surrogates should be: (a) conceptually relevant to the risk assessment being conducted; (b) based on principles that are biologically sound in light of present knowledge; and (c) properly quantitated to assess dose-response relationships.

a. Were well-documented procedures
for quality assurance and quality
control followed in exposure
measurement and assessment (e.g.
calibrating instruments, repeat
measurements, re-interviews, tape
recordings of interviews, etc.)
b. Were measures of exposure
consistent with current biological
understanding of dose (e.g., with
respect to averaging time, dose rate,
peak dose, absorption via different
exposure routes)?
c. If there is uncertainty about
appropriate exposure measures, was
a variety of measures used (e.g.,
duration of exposure, intensity of
exposure, latency)?
d. If surrogate respondents were the
source of information about exposure,
was the proportion of the data they
provided given, and were their
relationships to the index subjects
described? Were risk estimates
excluding surrogate respondents


e. To improve study power and enhance
the generalizability of findings,
was there sufficient variation in the
exposure among subjects?
f. Were correlated other exposures
measured and evaluated to assess the
possibility of competing causes,
confounding, and potentiating effects
g. Were exposures measured directly
rather than estimated? If estimated,
have the systematic and random
errors been characterized, either
in the study at hand or by reference
to the literature?
h. Were measurements of exposure or
human biochemical samples of
exposure made? Was there a
distinction made between exposures
estimated by emission as opposed
to body absorption?
Was there a
distinction made between external
exposure and internal dose, if


i. If exposure was estimated by
questionnaire, interview, or
existing records, was reporting bias
considered, and was it unlikely to
have affected the study outcome?
j. Was there an explanation/understanding
of why exposure occurred, the
context of its occurrence, and the
time period of exposure?

Revisions to Principles A-4 to A-6

Principle I-4. Study outcomes (endpoints) should be clearly defined, properly measured, and ascertained in an unbiased manner. Definition and measurement of study outcomes:

a. Was the outcome variable a disease
entity or pathological finding rather
than a symptom or a physiological
Was the operational
definition of the health outcome
b. Was variability in the possible
outcomes understood and taken into
account -- e.g., various
manifestations of a disease
considering its natural history?

Was the health outcome defined
using objective rather than
subjective criteria?


c. Was the method of recording the
outcome variable(s) reliable -- e.g.,
if the outcome was disease, did the
design of the study provide for
recording of the full spectrum of
disease, such as early and advanced
stage cancer; was a standardized
classification system, such as the
International Classification of
Diseases, followed; were the data
from a primary or a secondary
Was the health outcome
assessment reliable?
d. Has misclassification of the
outcome(s) been minimized in the
design and execution of the study?
Has there been a review of all
diagnoses by qualified medical
personnel, and if so, were they
blinded to study exposure?
the health outcome assessment
e. Were the component information
required for the health outcome
assessment valid?
f. Were the evaluation criteria
for the health outcome applied
equally to the exposed and
unexposed individuals?


Principle I-5. The analysis of the study's data should provide both point and interval estimates of the exposure's effect, including adjustment for confounding, assessment of interaction (e.g, effect of multiple exposures or differential susceptibility), and an evaluation of the possible influence of study bias. Data analysis methods and presentation:

a. Was there a well-formulated and
well-documented plan of analysis?
If so, was it followed?
Were the
analytic methods clearly described?
b. Were the methods of analysis
appropriate? If not, is it reasonable
to believe that better methods would
not have led to substantially
different results?
Were the data
analysis methods appropriate to
address the study question?
c. Were proper analytic approaches,
such as stratification and regression
adjustment, used to account for
well-known major risk factors
(potential confounders such as age,
race, smoking, socio-economic
status) for the disease under study?

If necessary, was confounding
controlled using appropriate
statistical methods?
d. Has a sensitivity analysis been
performed in which quantitative
adjustment was made for the
effect of unmeasured potential
confounders, e.g., any unmeasured,
well-established risk factor(s) for
the disease under study?
completeness of control for confounding
addressed through sensitivity analysis
or other analytic approaches?


e. Did the report avoid selective
reporting of results or inappropriate
use of methods to achieve a stated
or implicit objective? For example,
are both significant and
non-significant results reported in a
balanced fashion?
Was sufficient
consideration given to effect modifi-
cation, given the study question?
f. Were confidence intervals provided
in the main and subsidiary analyses?

f. Were point and interval measures16
of effect provided?

Principle A-6. The reporting of the study should clearly identify both its strengths and limitations, and the interpretation of its findings should reflect not only an honest consideration of those factors, but also its relationship to the current state of knowledge in the area. The overall study quality should be sufficiently high that it would be judged publishable in a peer-reviewed scientific journal. Discussion and interpretation of study results:

a. Were the major results directly related
to the a priori hypothesis under


b. Were the strengths and limitations
of the study design, execution, and
the resulting data adequately
c. Is loss to follow-up and non-response
documented? Was it minimal? Has
any major loss to follow-up or
migration out of study been taken
into account?
d. Did the study's design and analysis
account for competing causes of
mortality or morbidity which might
influence its findings?
e. Were contradictory or implausible
results satisfactorily explained?
f. Were alternative explanations for
the results seriously explored and
g. Were the Bradford Hill criteria (see
Appendix B) for judging the
plausibility of causation (strength of
association, consistency within and
across studies, dose response,
biological plausibility, and
temporality) applied when
interpreting the results?
h. What are the public health
implications of the results? For
example, are estimates of absolute
risk given, and is the size of the
population at risk discussed?


a. Are there major threats to the validity
of the study?
b. Are the exposed and unexposed
groups comparable in baseline
(pre-exposure) risk of disease,
or have any differences been accounted
for in the analysis?
c. [For case-control studies only] Are the
controls likely to be representative of the
source population from which
the cases arose, or have any
differences been accounted for in
the analysis?
d. Is non-response or loss to follow-up
likely to have introduced substantial
e. Are missing data likely to have
introduced substantial bias?
f. Is exposure measurement error likely
to have introduced substantial bias?
g. Is health measurement error likely to
have introduced substantial bias?
h. Is error in measuring confounders
likely to have introduced substantial
i. Is random error likely to have produced
substantial inaccuracy in the measure
of effect?



Principle B-6. A properly conducted meta-analysis, or preferably an analysis based on the raw data in the original studies, may be used in hazard identification and dose-response evaluation when such combination includes an evaluation of individual studies and an assessment of heterogeneity. Thecombined results ought to provide, more than any single study, precise risk estimates over a wider range of doses. Before using these tools, the gains should be judged sufficient to justify potential errors in inference resulting from combining studies of dissimilar design and quality.

Discussion of Principle B-6

This is one of the more important principlesbecause meta-analysis of multiple epidemiologic studies has become more common. Although the principle refers to "properly conducted meta-analysis", there are currently no generally accepted standards for employing it (nor is itmentioned in the 1986 EPA risk assessment guidelines that are currently in effect 17), although when used quantitatively it involves application of standard statistical methods. This principle does state several minimal standards for employing meta-analysis, however. Even applying these minimal standards, some felt that meta-analysis was still fraught with potential for error, and it was remarked that if meta-analysis (based on studies as they are often currently conducted) were used as the final determinant in risk assessment, "we are in trouble". Meta-analysis


requires consideration of the quality of the various studies, but it is very difficultand controversial to attach weights to studies for quality. Ranking for quality is not a black-and-white exercise; it requires a lot of expert judgment by epidemiologists.

Epidemiologists feel more comfortable in combining results from randomized clinical trials, when they follow similar protocols; but for observational studies, heterogeneity can create a "morass". If in the future epidemiological studies were to become more like randomizedclinical trials in the transparency and consistency of their protocols (e.g., what the dose metric will be, how exposure and outcome will be measured), they would become more suitable fora well-conducted meta-analysis.

Principle B-6 incorporates three minimal standards. The first, which applies to both hazard identification and dose-response assessment, is that the reviewer should examine the individual studies carefully for quality and for explanations for study differences rather than combining study results simplistically to get a single point estimate. Within this sub-principle there is a stated preference for analysis by "pooling" of raw data where possible. The second is that, for dose-response assessment, combining thedata should produce gains in terms of better risk measures with narrower confidence intervals over a wider range of doses. The third is that one should be mindful of the hazards of meta-analysis, and take into consideration that the body ofstudies for a particular agent may contain so many unexplainable inconsistencies as a consequenceof differences in quality, design, and results that application of a meta-analysis methodology may be counter-productive and may mislead by obscuring significant uncertainties. If there is one very good study, nothing may be gained by combining it with others; in reality, combining may makethe evaluation weaker. All of this requires agreat deal of expert judgment.

"Meta-analysis" as it is performed currentlyoften means summarizing relative risks weightedaccording to sample size, while an analysis that combines the raw data extracted from individual studies as if one were conducting a single large epidemiologic study is often referred to as


"pooling". Principle B-6 indicates a preferencefor "pooling" because it allows for a better evaluation of possible systematic flaws or weaknesses, whereas meta-analysis can amplify flaws or weaknesses (for example, by reinforcing biases in different studies); however, it cannot always be done because the raw data may not be available from all of the studies, and it is difficult to apply for dose-response purposes because many studies will not have sufficient exposure detail. Examples given of situations where there was sufficiently good exposure data, in terms of detail and uniformity, to allow pooling were the British analysis of EMF and childhood leukemia, where all the studies used the same method of measuring exposure, and the risk assessment for airborne radon, where the eleven uranium mining studies quantified exposures in a similar way. It is important that the combined analysis cover all studies, whether positive, negative, or inconclusive.

One participant raised the question of whether, after applying meta-analysis or pooling and finding a weak relative risk, the result would be considered stronger than a similar level ofrelative risk estimated from a single good study. This was considered a difficult question, andsome felt that the weak relative risk indicated by the meta-analysis would not be considered more significant unless there was some confirmatory animal data regarding the shape of the dose-response curve and biological plausibility, and even then they would consider the results "equivocal".

Principle II: Meta-analysis is a method for combining and contrasting results from different studies that can be used as an aid in the review of evidence in a particular area. A properly conducted meta-analysis, or when possible, an analysis based on the raw data of original studies, may be used in evaluating the epidemiologic evidence concerning the exposure-disease relation under investigation. Such a combination of evidence across epidemiologic studies should follow a rigorous protocol that includes a thorough evaluation of the methods and biases of individual studies, an assessment of the degree of conflict among studies, development of explanations for any conflicts, and, if the evidence is consistent, a quantitative summary of that evidence. The outcome of a meta-analysis of epidemiologic studies can provide more precise assessment of the inter-


study variation in the association under investigation, as well as the reasons for the differences observed across individual studies. When the evidence is conflicting, the primary goal of a meta-analysis should be identifying and estimating differences among study-specific results, rather than estimating an overall quantitative summary of the association across studies. With this goal, problems in combining evidence from studies that are dissimilar in design and quality can be minimized.

Discussion of Principle II

This is an important principle for epidemiology in regulatory risk assessment, particularly because meta-analyses are commonly used in policy formulation. The principle begins with a short definition of meta-analysis and then refers to a "properly conducted meta-analysis". A large body of literature is accumulating on the methods for meta-analysis that can be referred to for guidelines on how meta-analyses of observational epidemiologic studies should be performed.18 Meta-analysis should be considered a systematic method to aid in evaluating scientific evidence in an area. It allows quantitative evaluation of explanations for differences in results. Indeed, it should be viewed primarily as a comparative (analytic) method, rather than simply as a means for providing an overall quantitative summary or "synthesis" of the study results.

The principle also refers to the preference of obtaining individual study data and combining these for a "pooled" analysis. One should recognize, however, that pooling raw data from all published and unpublished studies is often impossible, and when possible is a time-consuming and labor-intensive process. The principle also refers to "exposure-disease" relations in order to encompass a wider range of health risk scenarios. The principle lists the main components that should be included in a meta-analysis. A protocol for the meta-analysis should include several specific steps. These steps are:


1) clearly identifying the study variables (i.e., disease outcome, exposure, confounders, effect modifiers, intermediate factors on the causal pathway); 2) identifying all studies performed; 3) identifying and extracting the relevant information from those studies; 4) quantifying the effects using the appropriate analytic models; 5) assessing heterogeneity of results across studies and possible sources of heterogeneity, including biases; 6) presenting the individual study results and, where appropriate, summary estimates across studies using descriptive, graphical and statistical analyses; 7) performing sensitivity and influence analyses to evaluate the assumptions made in the meta-analysis; and 8) interpreting the results with consideration of the limitations of the individual studies and the meta-analysis itself.

Two main objectives of meta-analysis are specifically mentioned in this principle. First is the analytic objective, which is the thorough assessment of differences across studies; this objective is essential unless there are absolutely no conflicts among the study results. Second is the synthetic objective, in which a summary estimate of association is constructed; this objective is justifiable only if there is little conflict among the study results used to create the summary.



1. Data from well-conducted epidemiologic studies should be given more weight than data from animal or in vitro experimentation. Animal and in vitro studies, however, may provide clues regarding the mechanism underlying associations and aid in understanding causality. Epidemiological studies without biological plausibility should be considered less definitive. 2. Epidemiologic data and study reports, and their impact on the overall risk assessment, should be evaluated by epidemiologists. Scientists from other relevant disciplines should be consulted regarding consistency between the epidemiologic data and non-epidemiologic data. 3. The potential for study results being the result of chance, bias, or confounding should be carefully considered and explored in the assessment. 4. Reasons for all significant inconsistent or implausible results in study findings should be explored. The Bradford Hill factors (or the Surgeon General's criteria) may provide a useful framework for assessing whether there is sufficient evidence of a causal relationship.20 A well-conducted statistical meta-analysis (consistent with revised Principle B-6), or a careful qualitative meta-analysis, can provide a useful means for exploring the degree of inconsistency or


consistency in the available studies in connection with consideration of Hill's factors.21 5. Statistical methods for adjusting study findings should not be employed unless they have been sufficiently validated and are consistent with available relevant biological and toxicological information on mechanisms. 6 All relevant data and study findings should be considered. If there are significant gaps in the reported data or findings, they should be addressed by first consulting with the investigators, and then by reflecting uncertainties in the assessment as necessary. 7. The suitability and sufficiency of the epidemiologic data for evaluating the specific exposure circumstances of concern to risk managers and the public should be discussed and reflected in the overall assessment. 8. In a quantitative risk assessment, any significant uncertainties in the hazard, exposure, or dose-response assessments should be reflected in the overall results. The overall results should include information on dose-response, consistency or inconsistency between human and animal data, and the ability to estimate a likely threshold for adverse effects in humans.


9. The potential importance of the public health issues under consideration should not be a rationale for giving more weight to epidemiologic findings that are otherwise less certain.22 10. Hazard identification and risk assessment results should be communicated in a manner that addresses the likely concerns of risk managers and exposed populations.



1.1. The following questions assume that an interdisciplinary team of reviewers, including a number of expert epidemiologists, has examined the relevant epidemiologic evidence (and that scientists from other disciplines have reviewed any relevant animal or other data) and prepared at least a preliminary risk assessment, including a risk characterization, the latter of which has been read by the person(s) asking these questions.

1.2. If there are specific legal criteria that are applicable to a particular risk assessment issue, the questions below should be tailored to those criteria.

1.3. These questions are not necessarily the final step in the risk assessment process, since it is assumed that the questions may expose areas that require additional work or clarifications, resulting in revisions and another similar review.


1.4. It is assumed that a hazard identification or risk assessment document is being prepared in order to inform the general public and/or to determine whether government regulations are needed, and what regulations, if any, should be promulgated.

1.5 Unlike the other two sets of questions (the first for evaluating individual studies, and the second for evaluating a body of studies), these questions are not all worded so that a Yes answer is preferable. This is intentional.

Questions/Checklist Regarding the Weight of the Human Evidence

2.1. Will the assessment team be regarded as having a high degree of relevant expertise and absence of bias?

2.2. Were any policy or value preferences put aside, and will the assessment be viewed as scientifically objective?

2.3. Are there sufficient data of sufficient quality to draw firm conclusions, or are there significant uncertainties?

2.4. Were there any significant disagreements among the team within their areas of expertise?

2.5. Are there any aspects of the analysis that would not be regarded as meeting accepted scientific norms? Are you confident that all significant aspects of the assessment would be accepted by an expert external peer review panel?

2.6. Is there sufficient epidemiologic evidence presented in the assessment to show convincingly a causal relationship between exposure to the agent and an adverse health effect, or is the evidence of lesser weight?


2.7. What additional work, if any, would need to be done to address and resolve any significant uncertainties? Is such work feasible? How much time and money would be required? Are there studies under way that could resolve significant uncertainties? How long will it be before their findings are available?

2.8. More specifically, with regard to assessing the weight of the evidence --

a. Does the assessment take into consideration all studies and data that should be considered?

b. Were there studies or data that were not taken into consideration, or which should be given limited weight or relevancy, due to uncertainties concerning their possible biases or other limitations?

c. If any arguably significant studies or data were not taken into consideration, what was the rationale? Were they seriously flawed in some way? Were there problems in determining their relevancy to the risk assessment situation under consideration?

d. Were some studies given more weight than others; and if so, is the rationale clearly expressed and scientifically valid?

e. Are there uncertainties in the data or study reports that might be explained by obtaining additional information from the investigators, and has an attempt been made to do so?

f. Has the nature of the exposure been properly and consistently characterized in the assessment, or does the assessment need to differentiate in some manner among different exposure circumstances?


g. Are the study findings within the overall body of evidence consistent?

h. If the study findings are not consistent, can the inconsistencies be explained?

i. Have plausible alternative explanations for associations been explored?

j. Have any statistical adjustments made for potential confounding or bias been validated, and have they taken into account all of the likely potential sources of confounding and bias?

k. Are the study findings relied upon for the conclusions in the assessment sufficiently strong to rule out effectively the possibility that an apparent association is due to chance, bias or confounding that cannot be sufficiently identified and adjusted for?

l. Have any temporality issues been addressed? (i.e., Is it clear that any adverse health effects developed a sufficient length of time after exposure, rather than having existed or been developing prior to exposure?)

m. Are there biological or other data that are not coherent with the epidemiologic findings, and have these issues been addressed?

n. Are there biological data that argue strongly for or against the plausibility of a causal relationship?

o. Should we expect the studies to show a dose-response relationship, and do they?


p. Are there sufficient exposure data of sufficient accuracy to support a reasonable dose-response assessment?

q. Are the findings from the epidemiologic studies consistent with relevant animal data and any quantitative assessment method(s) employed in the assessment, particularly in the area of dose-response? Are there reasons for regarding animal data as not significant or of limited value for supplementing the human evidence?

r. If there are not sufficient data for establishing a dose-response relationship, are there sufficient data to determine a no-observable-adverse-effect level (NOAEL)?

s. Have any significant uncertainties (including in the hazard assessment) been reflected in the quantitative estimate(s) of risk (if any)?

2.9. Is it clear from the risk assessment document how the data were evaluated, particularly with regard to any issues likely to be considered significant or controversial? Does the document describe the analytic methods sufficiently to allow other scientists to replicate and check the analysis?

2.10. If you were given responsibility for challenging the risk assessment, what particular points would you raise? Have those points been clearly and satisfactorily addressed in the assessment document?

2.11. Are there certain caveats, qualifications, or explanations that should be added to the findings to ensure that regulatory officials and the public understand their relevance? For example, do the findings appear to apply only to certain subpopulations or under certain conditions?

2.12. Upon further consideration, are there clarifications or other revisions that should be made to the risk characterization?


2.13. What questions that have not been asked should be asked?

2.14. In the end, what level of confidence do you have in your overall conclusions and the supporting explanations in the assessment?


11Several of the panelists wished to note that these principles and questions, and the ones for meta-analysis, could be used to evaluate whether there was a causal relationship with a beneficial, as well as an adverse, health outcome attributable to exposure.

12Two panelists suggested adding following "disease": "who are sampled from the source population of the cases".

13One panelist suggested that "for potential confounders . . . ." be replaced with "for selection bias. . . ."

14Two panelists were unsure whether this covered sufficiently whether the health outcome measured was the most relevant one.

15Two of the panelists wished to note in connection with this subquestion, and also (c) and (d), that if more than one surrogate were used to assess a given outcome, one should ask whether the surrogates gave roughly the same answer both in terms of direction and effect size.

16One panelists suggested changing "measures" to "estimates".

17Subsequent to the conference, EPA released proposed revisions to the 1986 guidelines on April 21, 1996. The revisions specifically addressmeta-analysis in subsection"Criteria for Assessing Adequacy of Epidemiological Studies") under the heading "Combining Statistical Evidence Across Studies". With regard to proposed standards, see the ILSI report referenced in footnote 11, above.

18See Greenland S., Chapter 32 - "Meta-analysis", In: Rothman KJ, Greenland S., Modern Epidemiology (Second Ed. Philadelphia: Lippincott-Raven Publishers, 1998).

19Several of the epidemiologists noted their wariness towards use of Hill's factors due to their many limitations (see, e.g., the discussion at pp. 24-28 in the chapter on "Causation and Causal Inference" by K.J. Rothman and S. Greenland in MODERN EPIDEMIOLOGY (2d ed. 1998, Lippincott-Raven)), and the inappropriately rigid manner in which those factors or criteria had been applied by some; and they indicated their preference for use of the more detailed analysis outlined in their revised version of the London principles. See also the limited discussion of some of these issues above in the Discussion Summary above at pages 32-33.

20This factor can be considered appropriately during the risk management phase of a regulatory decisionmaking process..