Author + information
- aStanford University School of Medicine, Stanford, California
- bMassachusetts General Hospital, Boston, Massachusetts
- ↵∗Address for correspondence:
Dr. Mark A. Hlatky, Stanford University School of Medicine, HRP Redwood Building, Room T150, 259 Campus Drive, Stanford, California 94305-5405.
Individual patient data from clinical research studies are increasingly becoming available to researchers who were not investigators in the original study that collected the primary data. The trend toward data sharing has been driven in part by the Internet-age ethos that “data want to be free,” and by specific policies proposed by the Institute of Medicine (1) and adopted by the National Institutes of Health (NIH) (2). There has been considerable discussion about many aspects of sharing data from clinical research studies (3–7), particularly how to handle protections of human subjects (e.g., identifiability of individuals, release of sensitive information), the time point at which data should be shared, the costs of data sharing, and the potential benefits and risks of making individual patient data widely available.
The ultimate goal of data sharing is to produce novel research findings that might be published in peer-review journals. As papers based on shared data become more common, journal editors will encounter questions that have not received much attention, but should now be considered. In particular, how should authors of papers based on shared data present their methods and results? How should journal editors, reviewers, and readers assess the scientific validity of papers derived from shared data? Should investigators from the original study review the manuscript to assess how the data were analyzed and interpreted? How should potential conflicts of interest be assessed and managed?
There are several different models for sharing individual patient data from clinical research studies (3), ranging from full open access to having all analyses performed by data managers, with outside investigators only reviewing the results.
An example of the open access model is the NIH’s BIOLINCC, which releases copies of deidentified datasets from NIH-funded studies to researchers who complete a relatively simple data request form. Although a BIOLINCC request must include a research plan, that plan is not reviewed for scientific validity by the NIH staff, nor is there any requirement that the investigators of the original study be notified of, involved in, or review the proposed analysis. At the other end of the spectrum, the American College of Cardiology’s National Cardiovascular Disease Registries do not release individual patient data to outside investigators, but instead provide access by allowing individual investigators to submit research proposals. A committee of outside experts reviews these proposals for validity, feasibility, and importance of the research question. Approved proposals are forwarded to a National Cardiovascular Disease Registry data center, where staff statisticians perform the requested analyses and return the results to the outside investigators. In an intermediate model of data access, NIH-funded studies may establish data-sharing plans that allow study investigators to evaluate data requests for scientific merit, and work directly with the outside researchers. Yet another model is for a trusted third party to hold the study datasets and administer outside requests for access (5). Thus, there are a wide variety of ways in which original data from a clinical research study might be shared with outside investigators and lead to scientific papers, with considerable heterogeneity in the degree of rigor in vetting the research proposals across these data-sharing mechanisms.
Manuscripts Derived From Shared Data
The worthwhile goal of data sharing is to increase knowledge. Publication of findings is a key next step. But studies based on shared datasets are likely to differ in many respects from original research studies, in which the authors designed the study, collected the data, and analyzed it. How should papers based on shared data be presented? We suggest that transparency is the key principle to guide preparation of manuscripts derived from the analysis of shared data. Table 1 summarizes a set of proposals for preparation and submission of manuscripts based on shared data. These proposals are our own, and do not reflect the official position of JACC. We offer them to stimulate discussion and elicit comments.
We suggest the Methods section of a paper based on shared data should clearly state that the authors obtained and analyzed a copy of a dataset that was collected by other investigators, and then shared. We have repeatedly seen papers in which this disclosure was not made, leading to uncertainty as to whether the paper was from the original study group or was an outside analysis of previously collected data. In our view, a simple statement that “we obtained a copy of the [study name] dataset from [data source]” should be the first sentence of the Methods section of every paper based on shared data, because it accurately describes what was done, and provides key context that the editors and readers need to evaluate the validity of the manuscript.
Importantly, reanalysis of a shared study dataset may well lead to different conclusions than those previously published by the original study investigators. We suggest that authors of papers based on shared data should clearly identify any differences between their results and those of the original study investigators, and they should discuss the reasons for any such differences in the paper. They should also make clear that their conclusions are not necessarily those of the original study’s investigators.
Readers of a peer-review journal expect that the editors have thoroughly evaluated the scientific validity of any papers they publish—acceptance in a major journal is rightfully perceived as a signal of the study’s quality and importance. A journal’s editorial staff needs to understand the methods and findings of all submitted manuscripts, which can be more challenging for manuscripts based on analyses of shared data.
Editors can reasonably expect that papers written by the original investigators of a clinical research study (especially a large, multicenter study) have been subject to several quality control checks, such as approval of the analysis plan and a thorough internal peer review of the manuscript by the study’s publication committee or steering committee prior to submission; indeed, most studies that will share data require stringent pre-publication vetting of all manuscripts written by the study group’s own investigators. Editors can also reasonably assume that the investigators who collected the data understand its strengths and limitations, and appreciate the nuances and reliability of each piece of data. Outside investigators analyzing a shared dataset will not be as familiar with the study’s methods, and the documentation provided for a shared dataset is not likely to provide full information about how the data were collected, checked, cleaned, quality controlled, and analyzed in the original study. As a result, we suggest that editors and reviewers should scrutinize the validity of analyses derived from a shared dataset perhaps more than they might with analyses conducted by the original study investigators.
Last, during the peer-review process, to avoid misunderstandings by analysts of a shared dataset, we suggest that journals should generally obtain a peer-review evaluation by an investigator of the original study; such reviewers would be in the best position to identify any problems with the analyses. This is a change from standard operating procedure in peer review, where journals typically avoid having manuscripts reviewed by other members of the original study group; manuscripts based on shared data will need to be handled differently.
Wider access to primary data from clinical research studies has clear advantages, including the ability to test new hypotheses, examine questions that the original investigators did not pursue, or test whether the original findings hold up under alternative forms of analysis. However, there are also possible adverse effects and unintended consequences of data sharing, including misunderstanding of study data by outside investigators, errors in analysis, and confusion in the medical community about what the study found. Because the availability of shared data is growing, medical journals need to be prepared to manage and fairly evaluate the increasing number of papers that will be generated based on analyses of shared data. We have suggested a few modifications of the processes for manuscript preparation and evaluation to help realize the advantages of wider access, while minimizing the potential disadvantages. We welcome comments about these proposals.
Both authors have reported that they have no relationships relevant to the contents of this paper to disclose.
- 2017 American College of Cardiology Foundation