Title: Metadata Definitions / Working Group: Emerging Technologies
Version: 0.2 / Date: 17th May 2013
PhUse
Emerging Technology Working Group
Metadata definitions
Table of Contents
1INTRODUCTION: purpose of this document
2SCOPE
3DEFINITIONS
3.1Metadata management
3.1.1Metadata
3.1.1Structural metadata
3.1.2Descriptive metadata
3.1.3Process metadata
3.1.4Structural metadata: standards metadata
3.1.5Study-Instance Metadata or Study specific metadata
3.1.6Semantic Metadata
3.1.1Metadata repository
3.1.2Metadata registry
3.1.3Data element
3.1.4Attribute
3.1.5Class
3.1.6Data type
3.2Master data management
3.2.1Master Data
3.2.2Master Data Management
3.2.3Master Reference Data
3.2.4Master Data Source System
3.2.5Reference Data
3.2.6Reference Data Management
3.3Controlled Terminology, code systems & value sets
3.3.1Concept
3.3.2Code
3.3.3Code system
3.3.4Concept definition
3.3.5Concept designation
3.3.6Concept domain
3.3.7Concept identifier
3.3.8Concept representation
3.3.9Value set
3.4Interoperability
3.4.1Interoperability
3.4.2Technical interoperability (“machine interoperability”)
3.4.3Semantic interoperability
3.4.4Process Interoperability
3.5Data aggregation, integration
3.5.1Data pooling
3.5.2Data aggregation
3.5.3Data integration
4INPUT (draft material that can be used – to be deleted in final document)
4.1Metadata management
4.2Master data management
4.3Controlled terminology
4.4Interoperability
4.5data aggregation
5REFERENCES & RELATED DOCUMENTs
6Appendices
6.1CDISC glossary
1INTRODUCTION: purpose of this document
This document provides agreed definitions around meta-data management and related aspects across the industry. It is expected that these definitions will be re-used in the FDA guidelines as agreed cross industry definitions.
To be of operational value, the document contains not only definitions but also a short description and example of use. Whenever possible, the definitions are built from those existing definitions from FDA guidance's, CDISC glossary, check cross industry definition (e.g. Gartner). Reference to the source definition is provided.
This document does not intend to be extensive and complete. It is intended to bring clarification on the most commonly used (and misused !) definition in our industry around metadata and master data management;
The CDISC glossary [CDISC1] (and document in attachment) is heavily used as reference in this document; It is expected that the reader of this document is familiar with the abbreviations and Synonyms contained in the CDISC glossary; these are not repeated here.
2SCOPE
The following topic areas are in scope of this document
•Metadata management: metadata (structural & operational), data elements, attributes, classes..
•Master data management: Master data, reference data, master reference data
•Controlled terminology, code systems, value sets, permissible values
•Data pooling, data integration, data aggregation
•Interoperability, semantic interoperability
Definitions are provided per topic area to ease reading and structure of this document.
3DEFINITIONS
3.1Metadata management
3.1.1Metadata
SynonymDefinition & source /
- Wikipedia. The term metadata refers to "data about data". The term is ambiguous, as it is used for two fundamentally different concepts (types).
- Structural metadata is about the design and specification of data structures and is more properly called "data about the containers of data";
- Descriptive metadata, on the other hand, is about individual instances of application data, the data content. In this case, a useful description
- ISO 11179. “Descriptive data about an object [ISO/IEC 20944-1]”. Thus, metadata is a kind of data.
- Adrienne Tannenbaum, Metadata Solutions: "Metadata: the detailed description of the instance data; the format and characteristics of populated instance data; instances and values depending on the role of the metadata recipient." and "Instance data: That which is input into a receiving tool, application, database, or simple processing engine".
Description / Metadata describe instance data. Instance data are data stored in a computer as the result of data entry by a person or data processing by an application.
A metadata can become an instance data described itself by a level 2 metadata (or meta metadata) As an example Marcelina ??
There are 2 types of metadata (see below for more details description and examples)
- Structural metadata
- Descriptive metadata
Example / See structural metadata and descriptivemetadata
Recommendeddefinition
3.1.1Structuralmetadata
Synonym / Standard metadataData Standard
Definition & source /
The design and specification of data structures (e.g. format, semantic, ..), cannot be “data about data”, because at design time the application contains no data. In this case the correct description would be "data/information about the containers of data".- [FDA1]
Structural metadata is structured information that describes, explains, or otherwise makes it easier to retrieve, use, or manage data. - Octagon.Standards metadata is the metadata that is defined, maintained, and governed as the standard description of the data that will facilitate clinical software re-use and thus process efficiency. It is metadata that describes the standard, not a study built per the standard. Both industry standards such as CDISC and sponsor-defined standards are commonly thought of standards metadata.
Description / Structural metadata is what most of people mean by metadata. Structural metadata is said to “give meaning to data” or to put data “in context.”
Structural metadata, or standards metadata, is the source from which the Study specific metadata (see below) is built. Key components of standards metadata often include data domains, data elements, terminology, data mappings and transformations, and data derivations.
The successful usage of standards metadata requires sufficient standards governance that should include:
- workflows to address the creation of and/or revision of the standards
- version control of standards metadata and study specific metadata
- access control to the metadata, by user role
Example / The number 120 itself is meaningless without structural metadata such as
- The name of the variable (e.g. Systolic Blood Pressure) with its definition
- The unit related to this physical quantity (e.g; Systolic Blood Pressure Unit = mmHG)
- For instance the variable “Sex” is described by a set of structural meta data such as the label, data type (char) and associated value sets (male and female, ..), role in SDTM, …
- The metadata for the AE (Adverse Event) SDTM domain that is compliant with the CDISC SDTM Implementation Guide (Version 3.1.3) consists of attributes such as Variable Name, Variable Label, Type, Controlled Terms, Role, etc.
Recommended definition
3.1.2Descriptive metadata
Synonym / Process metadataSemantic metadata
Definition & source /
The individual instances of application data, the data content. In this case, a useful description would be "data about data content" or "content about content".- Ralph Kimball's "Process metadata describes the results of various operations in a data warehouse."
- metadata that describes relevant or domain-specific information about content. It provides conceptual, contextual, and processing information for data elements. It can also provide greater depth and more insight about the "container" of the data, whether it is a file, document, or representation.
Description / It is used in different contexts
- Data operations and statistical analysis. Additional content on the data that support further analysis of the data. For instance patient population in the context of a clinical trial study is operational metadata
- Software implementation (process metadata): describes the results of various operations happening in an application, be it in a data warehouse or any other application. This includes
- processes used to reformat (convert) or transcode content.
- all information needed to support data lineage & traceability
- details of origin and usage (including start and end times for creation, updates and access).
- “How” - how the data is used within the info flow
- “Where” - source of the data element
- “Who” - who created, modified and approved the data element
- “When” - versioning info of the data element
Example /
- Study related metadata: patient population, indication, therapeutic area
- Process metadata:
- metadata needed for the effective management of version control for standards metadata: the UserID that executed the last modification, the date of the last modification, and the UserID who approved the last modification.
- What is the source of the data and in which system is it authored
- Who can use a piece of information different roles for access and action they can perform: who can edit it in which system, who has read access to it
- Which transformation happen to the data, how and when
- Audit trail: who access which information, when
Recommended definition
3.1.3Process metadata
(suggest to combine with descriptive metadata !!!!)
SynonymDefinition & source /
- Ralph Kimball's "Process metadata describes the results of various operations in a data warehouse."
Description / Process metadata describes the results of various operations happening in an application, be it in a data warehouse or any other application. This includes
- processes used to reformat (convert) or transcode content.
- all information needed to support data lineage & traceability
- details of origin and usage (including start and end times for creation, updates and access).
Example /
- What is the source of the data and in which system is it authored
- Who can use a piece of information different roles for access and action they can perform: who can edit it in which system, who has read access to it
- Which transformation happen to the data, how and when
- Audit trail: who access which information, when
- Version control
Recommended definition
3.1.4Structural metadata: standards metadata
Synonym / OUT – included in structural metadataDefinition & source
Description
Example
Recommended definition
3.1.5Study-Instance Metadata or Study specific metadata
Synonym / Study Data StandardsStudy Specific Structural metadata
Definition & source / [No source]
- Study-Instance metadata is a defined grouping of metadata that serves as the most complete representation of the metadata that defines an individual study.
- It is commonly thought of as the set of metadata that is actually consumed by the clinical technology platform to facilitate processes that are more automated and consistent.
- It consists of Structural and Descriptive[dI1]metadata
Description / Within the context of a Metadata store, Study-Instance Metadata is stored separately from the Standards Metadata, either as a set of relationships back to the Standards Metadata or as a copy of the Standards Metadata. This is dependent on the Metadata store tool in use.
The Study-Instance Metadata (the most complete representation of the metadata that defines an individual study) is exported to and consumed by the clinical data platform to ensure maximal automation and consistency of the processes for trial design, execution, storage, analysis, and submission.
Because the Study-Instance Metadata can consist of Structural, Standards, and Operational Metadata, there exists a wide range of purposes that can be served as Study-Instance Metadata.
- Trial-definition metadata per the PRM
- Trial-definition metadata per SDTM Trial Design
- Study CRFs metadata
- Data-definition metadata
- Submission Define.xml
Example / During the set-up of a clinical trial collection database, the Oncology project team decides to use the AECAT variable in anticipation of grouping the multitude of adverse events at the time of analysis. This project team has been granted the option to select AECAT from a subset of the Permissible data elements of the SDTM standard by the standards governance group for the sponsor’s organization. This project choice is stored within the Study-Instance Metadata for use by the CDMS tool to accurately construct the collection database.
Recommended definition
3.1.6Semantic Metadata
Synonym / OUT – included in descriptiveDefinition & source
Description
Example
Recommended definition
3.1.1Metadata repository
SynonymDefinition & source
Description
Example
Recommended definition
3.1.2Metadata registry
SynonymDefinition & source / ISO 11179 standard and this web page it seems the definition of "MDR" should be discussed. Is it a Metadata Repository or Metadata Registry? The point that was interesting from that website was a "Registry is a protected back room where human-centric workflow processes are used ensure that metadata items are non-duplicates, precise, consistent, concise, distinct, approved and unencumbered with business rules that prevent reuse across an enterprise". There is quite a good point here.
Description
Example
Recommended definition
3.1.3Data element
Synonym / DEDefinition / [FDA1]
A data element is the smallest (or atomic) piece of information that is useful for analysis (e.g., a systolic blood pressure measurement, a lab test result, a response to a question on a questionnaire).
[CDISC1]
1. For XML, an item of data provided in a mark-up mode to allow machine processing. [FDA - GL/IEEE]
2. Smallest unit of information in a transaction. [Center for Advancement of Clinical Research]
3. A structured item characterized by a stem and response options together with a history of usage that can be standardized for research purposes across studies conducted by and for NIH. [NCI, caBIG]
NOTE: The mark up or tagging facilitates document indexing, search and retrieval, and provides standard conventions for insertion of codes.
[ISO1]
unit of data for which the definition, identification, representation and permissible values are specified by means ofa set of attributes
Description / A Data Element is the most elementary unit of data that cannot be further subdivided from a semantic point of view, as it is linked with a precise meaning.
A data element has:
- An identification such as a data element name
- A clear definition/ semantic description
- A data type
- Optional enumerated values (value sets)
- One or more representation terms (synonyms)
- In the context of SDTM a variable is equivalent to a Data Element
- In the context of BRIDG, an attribute is equivalent to a Data Element
Example / Birth Date is a Data Element
- DE name: BirthDate
- Definition: date and time on which the subject is born
- Data type: date (mm/dd/yyyy – hh/mm/ss – time zone)
- Value sets: not applicable
- Synonyms: BRTDTC in CDISC SDTM, birthdate in BRIDG
Recommended definition
3.1.4Attribute
3.1.5Class
3.1.6Data type
3.2Master data management
3.2.1Master Data
SynonymDefinition & source / [Gartner – Magic Quadrant for Master Data Management of Customer Data Solution]
Master data is the consistent and uniform set of identifiers and extended attributes that describes the core entities of the enterprise, such as customers, prospects, citizens, suppliers, sites, hierarchies and chart of accounts.
Description / Master Data is business data that has a consistent meaning and definition, shared across systems. It is produced into a “master system” as part of a transaction and is used for reference and validation in transactions within other systems.
- Master Data – as any other data – are defined with structural Meta data
Example /
- Site identification information such as : Site ID, Site Name, Site Address, …
- Investigator identification attributes
- Study Identification attributes
Recommended definition
3.2.2Master Data Management
SynonymDefinition & source / [Gartner – Magic Quadrant for Master Data Management of Customer Data Solution]
MDM is a technology-enabled discipline in which business and IT work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise's official, shared master data assets.
Description
Example
Recommended definition
3.2.3Master Reference Data
SynonymDefinition & source
Description / A combination of Master Data and Reference Data. The governance of these 2 components is quite different:
- reference data are often defined by external organizations and are defined at design time; they are generally managed within a terminology server (or a meta data repository) as part of all the code systems
- master data are created during application run time through a transaction and are stored into the source system considered as the source of truth.
Example
Recommended definition
3.2.4Master Data Source System
3.2.5Reference Data
SynonymDefinition & source
Description /
- In context of Master Reference Data Management this corresponds to the set of code systems that are commonly used across many different systems and attributes
Example /
- List of Country codes
- List of Therapeutic areas
Recommended definition
3.2.6Reference Data Management
3.3Controlled Terminology, code systems & value sets
3.3.1Concept
SynonymDefinition & source
Description
Example
Recommended definition
3.3.2Code
3.3.3Code system
3.3.4Concept definition
3.3.5Concept designation
3.3.6Concept domain
3.3.7Concept identifier
3.3.8Concept representation
3.3.9Value set
3.4Interoperability
3.4.1Interoperability
SynonymDefinition & source /
- ISO 11179 interoperability concerning the creation, meaning, computation, use, transfer, and exchange of data [ISO/IEC 20944-1]
- ISO 1117: capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have little or no knowledge of the unique characteristics of those units [ISO/IEC 2382-1]"
- IEEE: ability of two or more systems of components to exchange information and to use the information that has been exchanged.
Description
Example
Recommended definition
3.4.2Technical interoperability (“machine interoperability”)
SynonymDefinition & source / Technical Interoperability: The focus of technical interoperability is on the conveyance of data, not on its meaning. Technical interoperability encompasses the transmission and reception of information that can be used by a person but which cannot be further processed into semantic equivalents by software. Note that mathematical operations can be -- and frequently are -- performed at the level of technical interoperability. A good example is the use of a “check digit” to determine the integrity of a specific unit of transmitted or keyed-in data. The same mathematical formula is performed at each end of a transaction and the results compared to assure that the data was successfully transmitted.
Technical interoperability moves data from system A to system B.
Synonyms: Functional, Syntactic, exchange
Description
Example
Recommended definition
3.4.3Semantic interoperability
SynonymDefinition & source / Semantic Ineroperability: To maximize the usefulness of shared information and to apply applications like intelligent decision support systems, a higher level of interoperability is required. This is called semantic interoperability which has been defined as the ability of information shared by systems to be understood… so that non-numeric data can be processed by the receiving system. Semantic interoperability is a multi-level concept with the degree of semantic interoperability dependent on the level of agreement on data content terminology and the content of archetypes and templates used by the sending and receiving systems.
Semantic Interoperability ensures that system A and system B understand the data in the same way
Description
Example
Recommended definition
3.4.4Process Interoperability