Federal Focus, Inc.

Briefing on Data Access

February 26, 1999

The Proposed OMB Revision and the Federal Agencies

Wendy Baldwin
National Institutes of Health

In the spirit of full-disclosure, I must say that I come from a field in demography that has grappled with issues of data sharing. I am a big advocate of data sharing for research purposes within the scientific community. If one looks at the complexity of the National Institutes of Health (NIH), one can see fields where data sharing has been very well established, other fields where it is more nascent, and fields where it is very, very difficult. That complexity poses problems when one confronts language that does not reflect the nuances in scientific data and other consequences that might be associated with data sharing. It is very important then that discussions of these issues take place. The NIH is endeavoring to define these issues so that the community can identify solutions that will work and those that will not. This way unintended consequences can be anticipated.

The law refers to the Freedom of Information Act (FOIA), which has some considerable strengths, but also some gaps. It is important to look at just what FOIA will do and what it will not do. First of all, though, it is important to remember that this application of FOIA is very different from the present application of FOIA. Currently, when someone makes an FOIA request of me, he is asking me to provide him data documents that are currently in my possession. Under the new application, I would have to go get the data from my grantee and then make it available. It is important not to underestimate the difference between those two strategies.

The strengths of FOIA do protect confidential information. They protect private information, identifying information, and proprietary data, and there are some statutory protections for CRADAs (cooperative research and development agreements) in particular. On the other hand, concerns have already been raised about the particular aspects the FOIA exemption would protect in those cases. When some of the difficult cases are posed to the FOIA offices, they respond that those questions must be settled in the courts. Bear in mind that there are some simple applications of FOIA that are very, very good and must not be underestimated. There are other areas, however, in which the answers are not clear, and situations arise which are dependent on a legal process to clarify them.

The first area that the scientific community should weigh into is what, exactly, is the definition of "data." At the NIH, there is a tremendous variability in what constitutes scientific data. There are demographic studies, many of which have already dealt with issues of data sharing. There are x-ray crystallography studies, which have also dealt with the issues of data sharing. And there are studies where the underlying data would be the individual investigator's laboratory notebooks. It is hard to construe how these would be used in a data-sharing setting. There are also videotapes of family interactions which provide the underlying data for research questions about family interaction and child development. The FOIA protections would probably dictate that those data could not be released because there is no way to protect the privacy of the subjects.

But, as someone has already pointed out, that is not a determination for the investigator to make; it is a determination for the scientific community to make. So, we would require the data to make that determination. When one looks at protection in that case, it is important to understand how that protection would be effected. And it would be very, very cumbersome. We must think about what the nature of data is and what is actually intended.

Certainly, this community is very concerned about when data would have to be released. Some studies are very straightforward. Do the study, publish the results, and release the data. Other studies involve a sequential release of data. Part of the data is released; more data is analyzed; there is another data release; more data may be collected; more data is released. Would this require a sequential data release to go with each publication? Does the publication of any of the data from an underlying data set imply the release of all of the data? It is important to raise these questions because different constituencies and different research communities will have different concerns and needs.

Longitudinal data present very special cases. If data are released early in a longitudinal study, the possibility is raised that people can violate the confidentiality of that data set. I think the FOIA protections would tell us that we can take away identifying information - names, social security numbers, telephone numbers, ID numbers - but, in fact, in a complex data set, or in a data set that refers to a very small geographic area, it may not be possible to truly protect the confidentiality of those individuals. So, I think that in this case the FOIA takes us part of the way there, but it is not clear to me that it takes us all of the way there. The research community must weigh in on how different types of data might be affected by this.

We have already commented on the protection for an individual's privacy under FOIA, but the FOIA protection for privacy is for individuals, not for what I will call entities. In a research project that is doing a study of six clinics, the identity of those clinics is not protected under FOIA. So, we have to remember that, although we may have a very general view of privacy protection, FOIA has a very narrow view of it. It is an individual's privacy. In addition, privacy is only extended to living individuals, which, in some cases, one might argue is not sufficient.

The problem is that making the determination to eliminate certain data because they would identify individuals is not something an investigator would simply do. That is a decision we make; the FOIA office would receive the full document and the redacted document and then make the determination as to whether the redaction was appropriate, bearing in mind how FOIA operates now for records we already hold. We are then obliged to keep those records for six years. I do not know how we would handle the administrative burden of doing this if we had any volume of requests. As I am sure most of you are aware, accessing data without an understanding of the accompanying documentation is not terribly useful. So, there are many areas that must be clarified, and one is certainly what the documentation would be.

There has been some discussion about what would happen if data were totally in the private sector, but the way the legislation is written, it refers to data that is supported by federal funds, regardless of the level of federal funding. We, of course, have many studies where the federal funding is only a portion of the funding. Other funding may come from the private sector, from pharmaceutical manufacturers, or from foundations. It may come from a medical research council in another country. It may come from other entities such as State governments or managed care organizations. Some of those organizations, in fact, have incredibly rich data that can be very valuable for the research community, but they are not willing to make it available to anyone. They are willing to make it available under certain conditions; terms of reference as to what it will be used for, whether it is used for a peer-reviewed grant, whether the investigator will make an attempt to identify individuals, whether it will be used only for research purposes, not marketing purposes or publicity. FOIA does not allow one to put conditions on what the use of the data will be. In managed care, for example, or even in State agencies where the State medical care data would be very valuable, they are very selective about the circumstances under which they would make data available, and very wary of a process that would make data available to anyone regardless of his or her purpose.

The language so far is not clear about how long this access would remain in effect. Certainly, in areas that have established data sharing policies, these policies are in effect in perpetuity. Circular A-110 requires that data be maintained for three years following the termination of the grant. We must have clarification of the intent before determining how long one could maintain a reach-through to data.

Now, I must address the issues about the cost of compliance. When I think about how long the access would be, it becomes very complicated, because the grantee is the institution, not the individual. If I have a grant, two years after its completion I presumably still have the right to access the data, and I must access them if I receive an FOIA request currently. But if an investigator were no longer at an institution, what would be the obligation of the grantee institution to maintain the data, maintain the documentation, and be able to fulfill an agency request? I am not presenting unsolvable problems. I am presenting issues that the scientific community, the agencies, Congressional committees and the OMB need to grapple with so that we can understand what is on the table, what we are agreeing to and what we will now be doing.

Cost of compliance - there is very supportive language in the bill, in the OMB Circular, and in the Federal Register notice that acknowledges that this could be a costly process, and that, in fact, if it incurs a cost to the agency or the grantee, this is chargeable to the requestor. There is a problem with the current FOIA in that I can charge, for example, if someone makes a request and it costs me $1,000 for photocopying to fulfill the request, but I will not receive the money. The money goes to the Treasury. One thousand dollars worth of photocopying may not be an issue; however, the administrative burdens of this application of FOIA are qualitatively different than the administrative burdens of the current FOIA, which are already significant. The question is not only whether I can charge, but how can I actually implement a billing strategy so that both the agency bearing a considerable burden can be compensated, and the grantee bearing a considerable burden can be compensated. Again, is this insolvable? Probably not, but it must be solved before we have a very difficult case involving very large expenses and we are unable to compensate people.

I believe there are appropriate ways to share data. There are many different fields in which the scientific community has come together and said, "Yes, it is appropriate to share data. This is how the integrity of the data is ensured. Here is how the documentation of the data is ensured, and here is how it is made available." Currently, that exists to some degree through data archives. These are very valuable. However, archives may place conditions on data sets they receive which I call "value-added." Consider data funded by a private entity willing to share its data, to put it in a public archive, and to make it available for a modest cost. This is very good. They may also require that the person who accesses it guarantee it will only be used for research purposes. Currently, I think there is a potential risk that FOIA does not allow conditions on what that use might be. I would hate to see anything have a perverse effect on the very valuable data archives that I think are one of the most constructive ways we have of sharing data.

The OMB notice, the NPRM (notice of proposed rulemaking), focuses attention on published data and on data used for federal regulations and policies. I think many in the scientific community feel that this is a constructive step toward shaping how this might work. But, it is not enough. For example, I do not know what is covered by the word "published." "Published in a peer-reviewed journal" I understand. I understand that the data has been vetted. I understand what rules have been applied before those data can be used for publication. But "published" by itself I do not understand. Is a PowerPoint demonstration a publication? Is a poster of very preliminary findings a publication? I do not know the answer, and I think it needs to be clarified.

Let me give you a quintessential NIH example: Imagine a clinical trial of two drugs. There is a data safety monitoring board that is going to review interim data, and they discover that Drug A is so spectacularly successful that it becomes unethical to continue the trial. The trial is stopped, and a clinical alert is issued to physicians across the country saying, "If you have patients who present with these circumstances, Drug A is the drug of choice." This becomes a standard of care. Is that a publication? The investigators have not even analyzed their data fully yet. Only the data safety monitoring board has seen it. The NPRM right now puts out published data as well as use and policy and regulation, but these are terms that need to be refined, or there will be an unintended consequence, which must be addressed right now. Do we mean by "policy" or "regulation" those that are published in the Federal Register through a normal policy-making procedure? Or do we mean statements that come out from federal agencies recommending a certain course of action? I do not know. I would suggest that during this sixty-day period we definitely need to understand the answers to those questions. I have focused on the scientific aspects of data sharing, but I suspect that we need to focus on the process of rulemaking, which is really not an NIH activity. However, since it might be our data that are implicated in rules, we are focusing on the scientific aspects of data sharing. These are some of the issues about which we at the NIH community are most concerned and are hoping to clarify during this comment period. Thank you.

Q [April Burke, Association of Independent Research Institutes]: Ms. Casey characterized agencies as having the right to access data. That is actually not correct. What A-110 says is that an agency can obtain data for a federal purpose, not for a private purpose

A [Kathy Casey]: I do not think I made that distinction. I simply said that the federal agencies have the ability to obtain the data.

Q [April Burke]: When you were asked the question earlier about whether it would be appropriate for private information that was not funded by the federal government to be subject to FOIA, you seemed to think that it would be. I think there is a point here being made about public versus private. Dr. Baldwin, what would have been a federal purpose under A-110 for an agency to obtain data and not to use the waiver that is currently in A-110 which would be eliminated in the NPRM?

A [Wendy Baldwin]: We do not generally obtain data. We have that right. I can imagine circumstances under which we would obtain data for a federal purpose; for example, if we were involved in a fraud investigation. However, we make a very clear distinction between data that are collected internally by us, directly, in our intramural program or by contract as an extension, and those collected from grants. Grants are an assistance mechanism, and there really is quite a distinction there. While we put expectations out there for our grantees, we would not reach through to get those data for our purposes.

A [Kathy Casey]: I did not mean to suggest that it was somehow currently available for a private purpose. What I did say earlier about data that might not be federally funded is that if data were privately funded and used as the basis for a federal rule or policy, then I think our expectation is that there should be access to that data because they are being used as the basis for a federal rule. I hope that clarifies my point.

Q [April Burke]: I do not see how the federal government could reach through that activity and do that.

A [Kathy Casey]: The tangible circumstance that I would cite is a situation in which there were studies used to underlie particular EPA (Environmental Protection Agency) regulations. The viewpoint was that if this data were used as the basis for federal policy, then there should be public access to that information.

Q [April Burke]: Would you feel, for example, that if the Federal Reserve Board were going to make a federal policy with respect to financial issues and they wanted to rely on stock exchange information, those companies that had information about their ownership or their own financial data should make it available to the federal government and the public through FOIA?

A [Kathy Casey]: I do not want to comment on Federal Reserve policy, necessarily, but if the information is being used for a federal rule or a federal policy that affects millions of people, there should be reasonable access to the underlying data that are used to support it.

A [Jean Fruci]: I want to clarify something for a moment. Let's take the example you have raised in the EPA regulation. The data that underlie the EPA regulation were a whole series of studies. But what they relied on for crafting a policy was the peer-reviewed published paper. That is what they had in hand, and that is what they used as the basis of their rulemaking. That, in fact, is publicly available because it is published in a series of journals that anyone can obtain. But EPA did not go through all of the underlying data that went into producing the peer-reviewed published paper. They used the summary data that were provided in the peer-reviewed paper. You raised a second issue that I think is also very important. That is, if there are a couple of studies that are extremely critical to a rulemaking or a federal policymaking, does the public have a right to a higher level of scrutiny over particular studies, something beyond the peer-review process, to ensure that the data are really high quality? Again, I think that is a very legitimate point, and we would agree. But we question whether FOIA is the way that one would go about doing that.

A [Kathy Casey]: I agree with you that in this case we were trying to get to the underlying data, rather than just the peer-reviewed study that was provided. And I think the disagreement here is the mechanism that is being used to provide this information. Again, I would like to make clear that I think the idea behind getting access to the data is to allow people to duplicate it, verify it, and validate it.

Q [Bill Gardner, University of Pittsburgh, member of the National Conference of Lawyers and Scientists, which is a joint AAAS/American Bar Association committee]: Ms. Casey, how do you feel about the following danger: Not only might a vested interest take exception to work that someone is doing and try to harass it, but also there are ideologically motivated groups that resist work in certain fields of science. What would stop anyone from doing this? Dr. Baldwin hinted that there are areas of research where data sharing is abysmally bad. It is not just nascent. When people try to obtain data in some fields of clinical research, it simply is not being made available. If FOIA is not the right way to do this, exactly why is FOIA not the right way to do this? And if it is not FOIA, what is the way to go about this?

A [Kathy Casey]: You raise some very legitimate concerns, which should and can still be addressed. Our view was that FOIA was the only mechanism that we knew of at that time that would necessarily be able to make data publicly accessible while also having sufficient exceptions to address the concerns that we had. It was, at least, a very good starting point from our perspective.

A [Jean Fruci]: You raise some very good points. Having formerly been a researcher who worked on synthesizing different databases from different existing studies, I did not encounter many problems with people sharing their data with me, but I have heard of instances where they do occur. Again, when one uses something like FOIA, he is taking a one-size-fits-all approach. I think what may work very, very well for a data sharing mechanism and maintenance of a database that could be accessible to everyone for physics might be a terrible approach to use in the medical research community. One must look at what are the contexts of the types of data that we have available, and what are some good, cost-effective mechanisms to make the most data possible available to the widest audience. I do not know that I have for you a simple answer, or that I am able to say, for example, "Well, it is not FOIA, but it is this." I do not know that there is one "this," because of the many different ways in which we study science, and the many different ways in which it would be most useful to make data available for sharing. Let us look at what is already out there that works well. Let us look at specific areas where things are working very badly, as you suggest. Let us get some creative solutions, because I agree that if we do not make data available - and in some cases it is definitely needed, whatever goes beyond the peer-reviewed published paper - for sharing among the community, we do not make progress. It is the data that we pass on to the next group of scientists that allow us to make progress in these fields. If we do not do that efficiently and effectively, we are not going to make progress.

A [Wendy Baldwin]: If I could reinforce that, I think different scientific fields must evaluate the risks, problems, and issues specific to their fields. Those different constituencies must come up with strategies as to what actually makes sense, is workable, and can be built in from the beginning. There are many, many issues there, and I encourage the scientific community to step up to that challenge, but it is not simple.

SCIENCE POLICY

Risk Assessment

London Principles

Endocrine Effects

Other Areas

YOUNG ADULT PROGRAMS

Ed-Mentor

Jazz Band

AGRI-BIOTECH PROGRAM

Symposium

EVENTS & WRITINGS

Symposia

Publications