Genetic Variation Model

Description

The Genetic Variation model described here specifies the structure and semantics for the transmission of information created during single or multiple gene testing and analysis of a single subject with chromosomal based DNA. This model is not meant to be a biological model; rather it is aimed at the needs of healthcare with the vision of personalized medicine in mind. It also facilitates genetic analysis within clinical research conducted within the healthcare enterprises. Human medical patients, human clinical research subjects and animal clinical research subjects can all have genetic test results transmitted in an HL7 message that uses the Genetic Variation model as its payload. It is expected that a constrained genetic variation model used as a result payload will be plugged into other fully functional HL7 V3 messages used in genetic diagnostic testing and clinical research trials.

Genetic Variation Scope and Modularity

The model presented in this ballot is further constrained to genetic variation analyses based upon sequence variation and derived from a set of scientific laboratory methods (such as SNP probes, sequencing and genotype chip arrays) that focus on small scale genetic changes, usually in the coding region(s) of one or a small number of genes. Gene expression analysis, Non-DNA based methods and viral genotyping are not suitable for the Genetic Variation model and are (or will be) addressed by different models within the HL7 Clinical Genomics Work Group.

The model presented here is based upon use of predecessor R-MIMs (Genetic Locus [POCG_MT000010] and genetic Loci [POCG_MT000050]) as well as respective CMETs that have now been combined into this one, unified CMET. During the Draft Standard for Trial Use (DSTU) period of those predecessor models, various members of the HL7 Clinical Genomics Work Group worked with data derived from Sequence, Probe and Genotype Chip methods of single subject genetic analysis. These use cases derive from genetic testing messaging within the clinical practice and the clinical research areas, and the input from both environments is consolidated to provide one unified, interoperable view of the now unified Genetic Variation CMET.

Main Model Characteristics

·  The entry point to this model is a GeneticLoci act which represents the ordered “test” or planned “assay”, whether that involves a single gene/genetic locus or multiple genes/genetic loci

·  The Genetic Locus specifies each gene or locus involved in the analysis, and the locus can be associated with a pair of alleles on paternal and maternal homologous chromosomes.

·  Core observations associated directly with the allele are Sequence, and Sequence Variation. These core classes are also the ones which encapsulate raw genetic data. Reference values should be specified in definitional mood (“DEF”), while observed values are specified in event mood (“EVN”).

·  Alternatively, sequence and sequence variation data could be associated directly with the Genetic Locus observation if data is not available at the allelic level of granularity.

·  In addition to the core classes, the Phenotype CMET is provided as a model associated with all core classes, to allow for complex interpretations that can not fit within each class’s interpretation code.

The following figure show a bird's eye view of the Genetic Variation model:

Model Walkthrough

Notes on the use of the 'id', 'code' and 'value' attributes

The use of these attributes in the various Genetic variation model classes depends on the extent to which the data has being personalized and how different are the results from known, published parts of the human (or animal) genome. It is also different in those classes that encapsulate raw genomic data.

For example, when using the Individual Allele class, in the case that the patient's allele was fully sequenced and found to be identical to a reference sequence registered in a reference database, then in this case the ID of the reference sequence is the ID (accession number) from a reference database, the allele value is the code for the normal human (wildtype) allele from the reference database and will contribute to the interpretation, and the observed sequence value is the only raw data observation (there being no variation). If variation is present and the variation pattern does not match that of an allele defined within the reference database, then a temporary identifier (possibly using the LSID format) could be placed in the id attribute, a code used that indicates this is a novel allele pattern, and the full sequence and variation point values should be reported, since they are not derivable from the reference database’s accession number.

In the 'encapsulating' Sequence class the 'value' attribute should hold the bioinformatics markup itself. In this case case, the code should hold an indication of the exact bioinformatics format used to populate the 'value' attribute.

See the Genetic Variation Implementation Guide for a detailed walkthrough of the model, its classes, elements and attributes. The following overview presents a high level walkthrough of each act, and its relationships to acts and participations that link to that act.

Entry Point and Core Classes

Genetic Loci

The Genetic Loci class is the entry point of the Genetic Variation model. It allows several genes (or several locus regions) to be grouped together in a “test order”, and for an interpretation to be made that considers information from the combined gene/locus analyses. It also allows a report to be attached at a level that covers the analysis as a whole. If the reported analysis is based on only a single gene or locus, then the GeneticLoci class has a code and value identical to that of the Genetic Locus below.

The Genetic Loci act allows relationships with the following acts:

·  Genetic Locus – The core backbone path to the model, each genetic loci analysis must contain one or more individual genetic locus acts

·  Genetic Document – A genetic loci analysis may be linked to one or more genetic documents, that may explicate the test/analysis methodology, or report the results of the testing.

·  Associated Property - A genetic loci analysis may be linked to one or more associated properties, which on an individual basis further define inherent attributes of the analysis as a whole

·  Associated Observation - A genetic loci analysis may be linked to one or more associated observations, which on an individual basis report ancillary observations about the analysis as a whole

·  Phenotype - A genetic loci analysis may be linked to one or more phenotypic interpretations, which specify a complex interpretation of the results of the genetic loci analysis as a whole

The GeneticLoci act allows the following participations:

·  Subject – A genetic loci analysis may be linked to one or more subjects. The genetic subject participation should be used only if the Genetic Variation model is the payload in a message that does not allow specification of the subject of the test.

·  Performer - A genetic loci analysis may be linked to one or more performers. The genetic loci performer participation should be used only if the same organization performed all laboratory and analytic acts within that instance of the model. If multiple performers were involved, specify the performer on each lower level act.

·  Author - A genetic loci analysis may be linked to one or more authors. The genetic loci author participation should be used when a specific individual can be identified as the author of the analysis.

·  Verifier - A genetic loci analysis may be linked to one or more verifiers. The genetic loci verifier participation should be used when a specific individual can be identified as the reviewer and approver of the observational results and analysis.

Genetic Loci Attributes

·  ID – The ID attribute is a globally unique instance identifier for the Genetic Loci act. The Genetic Loci ID should remain constant across all genetic variation analysis interpretation revisions that derive from a common original genetic test observation or set of observations.

·  Code – The Genetic Loci code is required, and identifies the act type as a Genetic Loci.

·  Negation Indicator – The Negation Indicator is an optional attribute that allows the sender to indicate that a Genetic Loci act is NOT to be performed for a specific reason. The Negation Indicator is a Boolean that is set to “True” when, for example, a subject in a clinical trial has not yet provided consent to genotype, and the sender wishes to place a positive instance of this negation in the transmission.

·  Title - The Genetic Loci Title is an optional attribute used to provide a short print label when the genetic test group/panel can not be defined by a coded reference (see the value field) to a gene testing database.

·  Text – The Genetic Loci text field is used when the suite of genes or genetic locus that are part of the loci group need further explication. A free text description of the Loci for visual display or reporting can be included in this text field.

·  Status Code - This required code indicates if a Genetic Loci test is cancelled, completed, or in process (in the latter case, partial results are being reported). The status code can also be used to nullify a prior set of results, when, for example, the prior set were associated with the wrong subject.

·  Effective Time – This required attribute identifies the biologically relevant time of this observation, typically the collection time of the specimen used in the assay or testing. The GeneticLoci.effectiveTime attribute should thus be equal to or earlier than the time of any analysis or interpretation of the data under the Genetic Loci instance.

·  Confidentiality Code – This required attribute allows the sender to define the confidentiality status for all information about the Genetic Loci analysis. This Confidentiality will be conducted to all acts under the Genetic Loci, unless it is overridden by a different value at a lower level.

·  Reason Code – This optional attribute defines one or more reasons (clinical practice or research) that a Genetic Loci instance is being created. If the interpretation code attribute (see below) is populated, then the reason code should be populated as well, in order to provide the semantic context for the interpretation.

·  Value - The Genetic Loci value defines the type of genetic loci being analyzed. For example, a multi-gene Alzheimer’s Disease propensity panel would have a code (e.g. LOINC) defining it as such, and would then contain several calls for the Genetic Locus act to define the variation within each gene that contributed to the analysis. For Multi-Gene analyses that can not be easily linked to a code, the value field may be a text description of the Loci involved.

·  Interpretation Code – This optional field defines one or more interpretations derived from a genetic variation test whose “raw” data may or may not be included in the message. The interpretation code should carry a phenotypic interpretation when the interpretation can be expressed as a single (or small number of) short, concise statements that are easily coded. If a more elaborate discussion of the phenotype is required, the Phenotype CMET model should be used.

·  Method Code – This optional field defines one or more methods used to perform the genetic variation analysis. The values for the genetic Loci Method Code are intended to be very high level, and are used to indicate the type(s) for test methods used in the analysis. (e.g. a genotype chip for 8 genes may be combined with sequence results for two additional genes to produce the overall suite of genes combined in the Genetic Loci analysis).

Genetic Locus

The Genetic Locus class is the critical child act of a Genetic Loci analysis. Each Genetic Locus instance defines one gene (or unnamed genetic region) to be defined and associated with a suite of observations and statements about that gene (or region). A Genetic Locus instance is thus composed of one or more gene (genetic locus) result sets.

The Genetic Locus act allows relationships with the following acts:

·  Allele – The normal core backbone path to the model. Each genetic locus will usually contain two (maternal and paternal) allele acts. For technologies that do not result in an allelic analysis, direct linkage to the genetic variation may be used.

·  Sequence – A genetic locus may be linked to one or more sequence acts. In most sequence based analyses of know genes, the reference sequence is defined by this relationship. An observed sequence (or set of sequences) may also be linked if an allelic interpretation is not being used.

·  Sequence Variation - A genetic locus may be linked to one or more sequence variation acts. In sequence based analyses of know genes, adjustments to the reference sequence used can be defined by this relationship. An observed sequence variation (or set of sequence variation observations) may also be linked if an allelic interpretation is not being used.

·  Associated Property - A genetic locus may be linked to one or more associated properties, which on an individual basis further define the locus. When a named gene can not be used in the Genetic Locus code attribute, one or more Associated Property acts should be used to define the genetic location (e.g. the chromosome, start position, and end position)