Briefing on Data Access
February 26, 1999
An Industry Perspective on the Proposed Revision
Roger O. McClellan
Chemical Industry Institute of Technology
I am pleased to participate in this briefing on public access
to data. Contrary to the published title in the program, I will not provide
an industry perspective on the proposed revisions. To do so would be presumptive
for a number of reasons, the most significant of which is that it is a misconception
to think there is a monolithic entity called "industry." In my experience,
industry is quite a heterogeneous body, whether it is defined by major product,
by sales from a few tens of thousands of dollars annually to tens of billions
of dollars, or by ownership in sole proprietorships to a multinational public
stock corporation. In any event, this is not a matter of industry, academics,
or government. It is rather a matter of the scientific community and its relationship
to the broader public. I offer here my own personal views, based on my experience
as a research scientist, as a senior executive of a multidisciplinary team
that had been funded by both public and private monies, and, most significantly,
my service on a number of advisory committees reviewing the scientific basis
for major federal policy decisions.
My comments are based on five personal beliefs: One, institutions
and scientists receiving federal funds receive them not as one-way transactions,
but rather as compacts with the public to use the funds to conduct research
and develop information that will serve the public good. Two, science is best
conducted in an open and transparent atmosphere that includes rigorous peer
review and publication of information in scientific journals that are widely
disseminated. Three, after publication, investigators should be willing to
share primary data sets with appropriate attention given to protecting the
rights of individual subjects and potential proprietary interests of the original
investigators and their institutions. Four, the scientific process, as we
have just heard, is an iterative process, with various significant checks
and balances that are applied both at the level of individual investigators
and their immediate associates, but which also involves interactions between
those individuals and other individuals and teams. This process places a premium
on the sharing of data to facilitate the validation of key analyses and interpretations,
including the development of alternative analyses. Five, federal policies
and regulations, and especially those concerning public health and the environment,
should be based on the best available scientific information and interpretations.
My personal involvement began nearly a decade ago when I served
as Chair of EPA's Clean Air Scientific Advisory Committee, the committee charged
by Congress with advising the EPA (Environmental Protection Agency) Administrator
on a scientific basis for the National Ambient Air Quality Standards. In the
early 1990s, a number of papers appeared in the literature dealing with the
association between airborne particulate matter and health effects. Many of
the papers used new analytical approaches to analyze very complex data sets,
air quality, and multiple health parameters. Controversy soon arose when different
investigators analyzing similar or very closely related data sets reached
very different conclusions. Recognizing that these papers would have a critical
role in EPA's criteria document, the position paper that would ultimately
be used to revise the National Ambient Air Quality Standard for particulate
matter, I, as the past Chairman of the Clean Air Scientific Advisory Committee,
and its then Chairman, George Wolf, wrote to Administrator Browner on May
16, 1994, asking the agency to take a lead role in making key data sets on
air quality and health responses available for analysis by multiple analytic
teams. Certainly there were many, many studies that would appear in the criteria
document, but there were certain studies that loomed as very large and central
to our deliberations. One of those data sets had been collected by investigators
at Harvard and their collaborators. It was those studies that we thought would
be useful in 1994. Unfortunately, the agency did not take the leadership role
that Dr. Wolf and I had envisioned. Fortunately, the Health Effects Institute
(HEI), jointly funded by the EPA and the automotive industry, did step forward
to provide leadership for the conduct of analyses by a single, excellent team
of investigators from Johns Hopkins University led by Professor Jon Samet.
Dr. Wolf and I would have preferred that several teams were involved in the
exercise. The analyses were accepted for publication, and I think played a
key role in final decisions made and ultimately the promulgation of a revised
particulate matter standard. That action was strengthened by the re-analyses
that were done. Those re-analyses are still in progress, nearly five years
after our request to Administrator Browner. I am confident that the work will
be well done. It will be published, I am confident, in peer-reviewed journals,
and I am also confident it will have a critical impact in the next round of
review of the National Ambient Air Quality Standards for particulate matter,
standards that have wide-ranging potential impacts on public health and on
the economy at large.
Why was the process so protracted? I submit that the major problem
is that the scientific community - and I repeat - the scientific community
has not developed adequate procedures to deal with the issue of critical data
sets and their sharing among others in the scientific community, especially
when those data bear on important public health decisions. In some sense,
we, the scientific community, have abdicated our responsibility. To whom?
To the U.S. Congress and to the Office and Management and Budget.
Consider this: Who is best qualified to develop the guidance
for sharing data bearing on important public health policy decisions when
the data are acquired with public funds, or, for that matter, private funds?
The U.S. Congress? What about OMB? If the Congress and the OMB are not the
institutions to take the lead role, which is? I submit it is the scientific
community. One might ask exactly what is the position of the Congress and
the OMB. They have moved to a familiar vehicle - the Freedom of Information
Act. From there, they have moved to Circular A-110. I personally do not think
that those are the right vehicles for this. However, the Congress and the
OMB have stumbled upon familiar tools. Why? In part, because we, the scientific
community, did not give them a better field of vision. So, I urge the scientific
community to ask the Congress and the OMB to call a "time-out." Give us the
opportunity in the scientific community to examine the appropriate processes,
to do our jobs before the Congress and the OMB become involved. I believe
that we in the scientific community should be willing to accept the responsibility
of dealing with this very complex issue. Why? Because it is critically important
to individual scientists and investigators, to graduate students, and to research
institutions; it is central to the total scientific enterprise. Most significantly,
it is important to the American public, and the relationship of the scientific
community to the American public.
How can we proceed? First, I think the scientific community
must engage in a positive dialogue. There have been some positive suggestions
made; but, quite frankly, this is a reactionary approach, not a positive,
proactive approach to the critical issue of how best to share data. We now
are talking about data that will be increasingly important and commonplace
in our community as we engage in larger and larger, multidisciplinary studies.
We will take advantage of the advances in terms of modern molecular biology,
the advance in informatics, epidemiology, all the scientific fields to create
larger data sets and data sets that have important impacts on our understanding
of public health. We must encourage that kind of active dialogue. We need
to go further; we need to encourage our various professional organizations
to hold meetings to discuss and indeed debate the issues and solutions. And
I emphasize the solutions.
Second, provision must be made for the National Research Council
and the Institute of Medicine to form a joint committee to review the issue
in its broadest context. This committee must develop guidance for access to
data that are developed with public funding, for the sharing and re-evaluation
of data, and for the ultimate publication of the resulting analyses so they
can be used in public policy setting and rulemaking.
This NRC/Institute of Medicine effort could build on previous
activities, such as the 1985 NRC committee on sharing research data. Since
1985, many advances have been made in the field of informatics and in the
individual scientific fields that produce and introduce new dimensions to
the issue of access for data, data sharing, and data re-analysis. In developing
these guidelines, I urge the committee to examine the issue in its broadest
context. This must include exploring the guidance for access to data related
to public health matters, irrespective of whether it was developed with public
or private funds. Some industries have indicated a willingness for open sharing
of data concerned with public health matters. A good example is the recent
announcement by the Chemical Manufacturers Association to make publicly available
on the Internet the results of tests on some 3,000 high-production-volume
chemicals that will be tested over the next five years. I think they have
made a bold statement in promising that kind of public access. It will be
an enormous challenge to develop the means for making that available, and
most importantly, for the public to be able to understand what has been placed
in its hands.
In my opinion, this is an especially critical time for the scientific
community to demonstrate leadership in these matters. Why? In part, because
those of us in the health community are asking the public to double, over
a short period, the investment of public funds in health research.
Q [John Gardenier, National Center for Health Statistics]:
Would it be a concern to you to find out that there are extremely adverse
ethical consequences that were certainly not intended? There is some potential
that the law could force people to violate the contracts under which the data
was collected and force the American public to give data to the federal government
that they gave to researchers only on the explicit condition that they would
not be given to the federal government. Worse than that, once within the federal
government, to the extent that it may fall outside the protections of FOIA,
that data might be available to the general public, which would be a gross
violation of the public's civil rights.
A [Kathy Casey]: I think what you refer to, in part,
is existing confidentiality agreements that the agencies have with particular
grantees, or joint agreements between grantees who are, in part, privately
funded. There are legitimate concerns as to whether there would be some sort
of retroactive effect on existing understandings that have now produced data
and/or publications as a result. The retroactivity or prospectivity of this
provision, and whatever OMB may do in terms of making research data subject
to FOIA and/or beyond FOIA, should be addressed.