The Data Concierge

August 2009

David A. Fenstermacher, Ph.D., Chair and Executive Director, Department of Biomedical Informatics, H. Lee Moffitt Cancer Center & Research Institute

The transformation of biomedical research into a data-centric enterprise is creating new challenges for managing and disseminating data to the research community. With a new focus on translating innovative findings at the bench into results at the bedside, the integration of patient-based data with research data is critical. . This generates issues regarding how we collect, store, and release data in ways that incorporate best practices regarding human subject research, data usage, data quality and scientific reproducibility. One of the most critical issues with using patient-based data is the adherence to a wide range of Federal and State regulations such as The Common Law, HIPAA (45 CFR Parts 160 and 164), 21CFR Part 50 (Food and Drug Administration), 45 CFR Part 46 (Office of Human Research Protection), the Genetic Information Nondiscrimination Act of 2008 (GINA) and the new regulatory provisions Economic Stimulus package (ARRA). Other regulatory considerations include contractual obligations, intellectual property protections and data sharing policies that are governed by Institutional Review Boards (IRB) and local scientific review committees (SRC). In addition, the sheer amount of data available is increasing at unprecedented rates while the use of vocabulary and metadata standards for these data are limited, overwhelming many researchers. To facilitate access to data for research the Moffitt Cancer Center (MCC) has developed a new role, the Data Concierge.

The Data Concierge was created to provide a "one-stop shopping" resource for access to data that resides in multiple information systems, including the cancer registry, electronic medical records (EMR), biospecimen tracking, molecular data, billing data and many others. Much of these data have been integrated into a centralized data warehouse but not every data element from every source system is available. A researcher may want the diagnosis of patients that have microarray data from tumor resections. The diagnosis data can be obtained from many data sources. Clinical diagnosis is captured in the EMR, a pathological diagnosis is recorded in the EMR or the biospecimen tracking database, a billing code based on ICD9 in the financial and billing database or multiple diagnoses in the cancer registry that uses the ICD-0-3 standards. Depending on whether the researcher is a health economist or a molecular biologist the data needed will vary greatly. The Data Concierge provides an interactive service, working with researchers (by phone, email or in person) to understand the data needs and to explain the multiple resources available for a given type of data, how the data was obtained (from the physician or patient self-report), any limitations on the data, and how the data relates to other data or information systems within the cancer center. In a sense, the Data Concierge is providing the valuable links between data and metadata that are not captured within primary and data integration resources.

After considering many models for the creation of a data service, the role of the Data Concierge was established in the Department of Biomedical Informatics, within the Data Quality and Curation Team. Although it might seem puzzling to position a data-release service in a data quality group, the concept has proven to be the bridge between many disparate efforts focused on the delivery of research data, including regulatory compliance, data representation, data integrity and accuracy, and data dissemination. The Data Concierge is the primary contact person for researchers seeking the creation of specific data sets for specific research questions. The data release requests range from simple aggregated data with minimal regulatory issues to de-identified, limited or full PHI disclosure data sets. At MCC, the Tissue and Data Release Committee (TDRC) governs the release of data. This committee reviews each request and assures that all review processes, such as IRB or SCR approvals, have been completed and that tissue and data requests align with and adhere to other center-specific guidelines to protect IP or, in the case of tissue, to assure valuable resources are not exhausted unnecessarily. The TDRC was established to facilitate access to tissue and data, not to create another regulatory barrier, and considers data release requests for de-identified, limited and PHI disclosure data sets. The Data Concierge is a non-voting member of the TDRC who provides valuable information to the committee regarding data requests, since the Data Concierge often assists the researcher in drafting the data request, and is part of the deliberations on the conditions under which data is released to a researcher. This process assures that data releases are aligned with institutional policies and regulatory compliance. Another important role of the Data Concierge is to generate the approved data sets for release to the researchers and is the first person, in many instances, to notice inconsistencies in the data or metadata. As a member of the Data Quality and Curation team, the Data Concierge works with the Data Steward, the Data Analysts and the Data Management and Integration Team (an Information Technology (IT) team) to further investigate, document and remediate data quality issues (i.e. broken ETL (extract, transform and load) processes, data inconsistency between source systems or the lack of ISO 11179 data dictionary standards). Therefore, the Data Concierge is a key link between the researchers, the governing committees, the regulatory processes, data quality, data standards, and IT.

There is one more link that completes the Data Concierge service: data dissemination. Researchers have varied needs regarding on data usage and visualization. Although commercial tools can satisfy these requirements at times, the need for customized solutions is critical to delivering data in useful ways for each project. Examples include the need to integrate laboratory data with Center data using project-specific databases, process data and visualize using bioinformatics tools, integrate data sets with data sharing networks such as caBIG® (Cancer Biomedical Informatics Grid), and integrate data with external annotation services that provide valuable metadata. With the Data Concierge service being housed in the Department of Biomedical Informatics, there is a direct link to the Biomedical Informatics Shared Resource Facility (BISRF) that provides informatics services and solutions for the Moffitt research community. Based on user requirements, the Data Concierge is able to bring together bioinformaticists, research software developers, and data management experts from the Department to assist with data dissemination. And, with careful coordination between BISRF and IT, data transfers and updates can be automated, commercial and open-source tools can be integrated and the appropriate security, and access controls will be instantiated such that researchers have access to real-time data in a user-friendly environment that fulfills the project requirements and preserves regulatory compliance.

Providing a comprehensive data release service requires having access to the appropriate tools and data sets across the organization. This requires that the person assuming the role of Data Concierge be highly trained to understand the data sources, how the data is coded, and the standards applied to each data element (if available). This person must also have strong technical skills and a collaborative and friendly personality. Finding the person who possesses this comprehensive skill set is the most challenging aspect of establishing a Data Concierge service, and the most important. The Data Concierge at MCC is a cancer registrar who abstracted data and fulfilled data requests for the Moffitt Cancer Registry. She is the ideal person for this role and quickly established the service as an extension of her work at the Cancer Registry. She is responsible for the overwhelming successes of the Data Concierge services as described above. Without an experienced internal candidate, extensive planning and training will be required for a new hire before "opening the doors" to such a service. As the Data Concierge service at the Moffitt Cancer Center continues to evolve in this data-centric age of biomedical research, the ability to refine processes that facilitate streamlined and seamless access to data will be critical to future successes. Even more critical, however, is the alignment of these processes and services with institutional priorities and regulatory compliance that creates a culture of responsible data use in an era where human data is the foundation of translational research and personalized medicine.

Member Viewpoints

Featured in issues of the GIR Newsletter and the GIR website, these articles are contributed by GIR representatives on current IT-related issues, challenge solutions, and technological innovations in academic medical institutions.