Project Editor's Draft For: ISO/IEC Xxxxx-Xxx Information Technology Xxxx Part Xxx: Xx

Project Editor's Draft For: ISO/IEC Xxxxx-Xxx Information Technology Xxxx Part Xxx: Xx

Reference number of working document: ISO/IEC JTC1 SC32Nnnnn

Date: 2007-11-08

Reference number of document: ISO/IEC WD1 11179-4

Committee identification: ISO/IEC JTC1 SC32 WG2

SC32 Secretariat: US

Information technology—
Metadata registries —
Part4: Terminological principles for data

Document type: International standard

Document subtype: if applicable

Document stage: (20) Preparatory

Document language: E

Warning

This document is not an ISO International Standard. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an International Standard.

Recipients of this draft are invited to submit, with their comments, notification of any relevant patent rights of which they are aware and to provide supporting documentation.

ISO/IEC WD111179-4

Copyright notice

This ISO document is a working draft or committee draft and is copyright-protected by ISO. While the reproduction of working drafts or committee drafts in any form for use by participants in the ISO standards development process is permitted without prior permission from ISO, neither this document nor any extract from it may be reproduced, stored or transmitted in any form for any other purpose without prior written permission from ISO.

Requests for permission to reproduce this document for the purpose of selling it should be addressed as shown below or to ISO’s member body in the country of the requester:

ISO copyright office

Case postale 56

CH-1211 Geneva 20

Tel. +41 22 749 01 11

Fax +41 22 749 09 47

E-mail

Web

Reproduction for sales purposes may be subject to royalty payments or a licensing agreement.

Violators may be prosecuted.

ContentsPage

Foreword

Introduction

1Scope

2Normative references

3Terms and definitions

4Terminology theory

4.1General

4.2Concepts

4.3Relations

4.4Signifiers

4.4.1Introduction

4.4.2Designations

4.4.3Labels, Identifiers, and Locators

4.4.4Namespaces

4.5Definitions

4.6Prototypes

5Data (in theory)

5.1Values

5.2Datum

5.3Mapping terminology to ISO/IEC 11179-3

5.4Data quality and measurement error

6Describing Data

6.1Data element concept

6.2Conceptual domain

6.3Value domain

6.4Data element

7Data (in practice)

7.1Microdata versus macrodata

7.2Tables and time series

8Summary of data definition requirements and recommendations

8.1Requirements

8.2Recommendations

8.3Provisions

8.3.1Premises

8.3.2Requirements

8.3.3Recommendations

9Conformance

Bibliography

Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.

International Standards are drafted in accordance with the rules given in the ISO/IECDirectives, Part2.

The main task of technical committees is to prepare International Standards. Draft International Standards adopted by the technical committees are circulated to the member bodies for voting. Publication as an International Standard requires approval by at least 75% of the member bodies casting a vote.

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO shall not be held responsible for identifying any or all such patent rights.

ISO/IEC11179-4 was prepared by Technical Committee ISO/IEC JTC1, Information Technology, Subcommittee SC32, Data Management and Interchange.

ISO/IEC11179 consists of the following parts, under the general title Information technology— Metadata registries (MDR):

Part001: Framework

Part002: Concept systems

Part003: Data semantics

Part004: Terminological principles for data

Part005: Naming conventions and namespaces

Part006: Registration and administration

Introduction

This third edition of ISO/IEC 11179-4 is a significant expansion of the document from the previous editions. Previously, the Part dealt with rules and guidelines for forming definitions for data. Editions 1 and 2 were essentially the same.

Edition 3 expands the notions in the previous editions by addressing data as terminology for special languages. As terminological things, then, definitions for data are still important, thus it is reasonable to include the rules and guidelines from the previous editions in this new edition.

This Part of ISO/IEC 11179 borrows heavily from the basic standards for terminology, ISO 704 – Principles and methods – and ISO 1087-1–Terminology work, Part 1: Vocabulary. Through the language of terminology, we show that data are terminological in nature; the main constructs of ISO/IEC 11179-3 – Metadata registries, Part 3: Data semantics– have a natural terminological interpretation; and classification is deeply connected to constructing data, using data, and understanding error associated with data.

ISO/IEC 11179 is essentially about the semantics and the management of the description of the semantics of data. The fundamental framework for the semantics of data is the data element description. So, the connection between how data are described and the fundamentals of terminology and terminology managementare described.

© ISO2007– All rights reserved / 1

ISO/IEC WD111179-4

Information technology—
Metadata registries —
Part4: Terminological principles for data

1Scope

ISO/IEC 11179-4 (Ed 3) describes terminological principles for data. A datum is a designation in the terminological sense, so the theory of terminology for special languages applies. The underlying theory for understanding the semantics and representation of data are provided.

All the major data constructs described in ISO/IEC 11179-3 are derived from the principles described here.

2Normative references

The following referenced documents are indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.

ISO 704:2000 – Principles and methods

ISO 1087-1:2000 – Terminology work, Part 1: Vocabulary

ISO/IEC Guide 2, Standardization and related activities — General vocabulary

ISO/IEC11179 (all parts), Information technology — Metadata Registries (MDR)

3Terms and definitions

For the purposes of this document, the following terms, abbreviations, and definitions apply.

3.1

xxx

definition of term

3.2

xxx

definition of term

3.1

xxx

definition of term

4Terminology theory

The theory of terminology for special language is described in ISO 704 – Principles and methods and the terms used in the theory are defined in ISO 1087-1 – Terminology work, Part 1: Vocabulary. The most relevant ideas are briefly discussed here. For more details, consult the references.

4.1Objects

An object is anything perceivable or conceivable. For the purposes of this international standard, any thing is an object. A more abstract notion is not considered. No claim is made either for or against the idea of using the constructs defined in this clause to structure or build an ontology.

4.2Properties and characteristics

In ISO 1087-1, the term property is not defined. For the purposes of this international standard, a property is the result ofa determination either directly or indirectly about some object. One form of determination is through observation – something humans perceive through their senses.Noticing the color of a person’s eyes is an observation or direct determination of the eye color of that person. Another form of determination is through detection by an instrument. An oral thermometer is an instrument that detects internal body temperature of a person. Observing a reading on the thermometer is an indirect determination about the internal temperature of a person. The specific observed eye color and internal body temperature are properties of a person.

It is through properties that we are able to make distinctions between objects. For instance, one person may be 185 cm tall, have brown colored eyes and hair, and have medium brown colored skin. Another may be 170 cm tall, have blue colored eyes and blond hair, and have very light brown colored skin. These properties of each person serve to distinguish between the two.

The idea that a property is the result of a determination seemingly requires all objects to be perceivable. In Information Technology, it is not possible to compute with a conceivable object. It is possible to compute with a description of that object, however. The descriptors are characteristics of the object, and the values the descriptors take are properties.

ISO 1087-1 defines a characteristic, but that definition is not used here. Instead, a characteristic is a determinable. A determinable is something capable of being determined, definitely ascertained, or decided upon. Eye color, for instance, is a determinable. It is capable of being ascertained by looking into a person’s eyes to determine their color. A property, on the other hand, is what gets determined. This is called a determinant. A determinant is an element that determines or identifies the nature of something. Blue is a determinant for eye color. So, a characteristic has the capacity for being determined (determinable), whereas the property is the result of a determination (determinant). Some characteristics of a person are height, eye color, hair color, and skin tone. Examples of corresponding properties, taken from the paragraph above in clause 4.3, are: height has the properties 185 cm and 170 cm; eye color has the properties brown and blue; hair color has the properties brown and blond; and skin tone has the properties medium brown and very light brown.

A set of properties corresponds to a characteristic. In examples 11 and 12 in clause 5.2, different sets of properties may correspond to the same characteristic, depending on needs. In addition, the same property may correspond to two characteristics. The following example illustrates this.

EXAMPLE 1: A property may correspond to two characteristics. Consider the characteristicmined metalsfor the mines in some country. Typical properties are gold, silver, lead, iron, copper, and tin. Second, consider the characteristic medal type for the results of a swimmer in international competition events. Here, the properties are gold, silver, bronze, and none. The gold and silver in the two sets of properties each have the same meaning. Therefore, gold and silver correspond to two characteristics.

4.3Concepts

A concept is a unit of thought created by a unique set of characteristics. Consider the concept “person”. The characteristics of a person include height, eye color, hair color, and skin tone. There are many others.

Some characteristics are indispensable for understanding a concept. These are theessential characteristics. A delimiting characteristic is an essential characteristicused for distinguishing a concept from related concepts. For example, an essential characteristic of people is they are designed to stand and walk upright. This is also a delimiting characteristic since it distinguishes people from gorillas.

The intension of a concept is the set of characteristics which makes up the concept. The extension of a concept is the totality of objects to which a concept corresponds.

A defining characteristic (This is outside the scope of ISO 704 and ISO 1087-1) is a characteristic that may be an essential characteristic, exceptit doesn’t alwayscorrespond to properties of objects in the extension. A defining characteristic of people is that they stand and walk upright. Not every person is capable of walking and standing upright, but all people are designed that way. Paralyzed or injured people may not be able to stand.

Characteristics and properties are concepts in their own right. As concepts, each kind plays a role, and this is how the ideas are distinguished.

The example below illustrates the importance of establishing essential characteristics for a concept. In particular, the addition of a single characteristic may have profound influences on the objects in the extension of the concept. Adding or removing characteristics affects the meaning of a given concept, changing the concept itself. Thus, the extension would be expected to change.

A general concept is a concept which corresponds to two or more objectswhich form a group by reason of common properties. An example is the concept “planets in our solar system”. An individual concept is a concept which corresponds to only one object. An example is the concept “Saturn”. In other words, a general concept may have more than one object in its extension, and an individual concept must have exactly one object in its extension.

NOTE: A concept might be so defined that there exists only one object in its extension even though the possibility for more exists. This is still a general concept. For example, the notion “all planets with one moon” is a general concept. There is one known planet with one moon – earth – but there are undoubtedly more.

4.4Relations

A relation is one of the following kinds: generic, partitive, hierarchical, associative, sequential, temporal, causal, antonymy, equivalence, mononymy, monosemy, polysemy, and homonymy. These kinds are defined in ISO 1087-1 and explained in ISO 704. The first 7 kinds in the list are relations between concepts; the others involve designations, either between themselves or with concepts. A concept system is a set of concepts structured according to the relations among them. The International System for Industrial Classification (ISIC rev3.1)[1] is a concept system. So is the Linnaean taxonomy of living things in biology[2].

NOTE: The definition of a concept system does not require there exists relations defined on the concepts. It only requires that the concepts are structured according to the relations among them. If there are no relations, then the structure is flat.

A terminological system is a concept system along with a designation for each concept. An ontology is a concept system along with a computational model. Describing computational models is outside the scope of this standard, but they are used to allow for automated reasoning, logical inferencing, and general semantic computing.

A generic concept is a concept in a generic relation having the narrower intension. A specific concept is a concept in a generic relation having the broader intension. A comprehensive concept is a concept in a partitive relation viewed as the whole. A partitive concept is a concept in a partitive relation viewed as one of the parts making up the whole. A superordinate concept is a concept which is either a generic concept or a comprehensive concept. A subordinate concept is a concept in a partitive relation viewed as one of the parts making up the whole.

Two concepts having the generic relation with each other are “planets in our solar system” (generic) and “planets with rings in our solar system” (specific). The intension of the specific concept has more characteristics – the intension includes the existence of rings around the planet – and therefore has a broader intension.

Two concepts having the partitive relation with each other are “solar system” (comprehensive) and “planet” (partitive). A solar system is made up in part by planets. Other parts of a solar system may include dwarf planets, moons, asteroids, and comets.

4.5Signifiers

4.5.1Introduction

Though not defined in ISO 1087-1, for the purposes of this international standard a signifier is a concept whose extension contains only perceivable objects. An object in the extension of a signifier, again for purposes of this international standard, is a token. For instance, the objects 5 and 5 are both tokens of “the numeral five”, a signifier.

A signifier has the potential to designate a concept or to refer to an object. In each case described below, the signifier takes a role to differentiate it from the kinds.

4.5.2Designations

In ISO 704 and ISO 1087-1, adesignation is a representation of a concept by a sign (signifier in this International Standard) which denotes it. For instance, the token“apple” is an English word designating the concept“the fleshy usually rounded red, yellow, or green edible pome fruit of a usually cultivated tree (genus Malus) of the rose family”[3]. The letter “M” might designatethat a person is married, as recorded in some database.

4.5.3Labels, Identifiers, and Locators

A label is a signifier associated with an object. It simply refers to the object.

An identifier is a label produced under some naming convention (See ISO/IEC 11179-5.). For instance, some database management systems automatically produce a key for each record added to a table in the database. Usually, the next unused integer is the one assigned. This constitutes a naming convention for the key, which is an identifier.

A locator is an identifier with a de-referencing mechanism associated with it. For instance, a URL (Uniform Resource Locator) has the HTTP (Hyper-Text Transfer Protocol) mechanism associated with it.

4.5.4Namespaces

A namespace is a set of signifiers. Namespaces exist for some business purpose, but this is outside their scope. For instance, a namespace used for controlling the tags in a set of XML elements may have been created for some purpose, but nothing prevents another application from using that namespace for a completely different application. Rules about how namespaces relate to each other, for instance in resolving names of variables in programming languages, is outside the scope of a namespace.

EXAMPLE 2: Suppose X is the name of a variable defined globally (say as an integer) and in some function (say as an array) in a computer program. If the compiler encounters the name X, how does it know what datatype and memory location to assign it? The compiler must search each namespace in order of some preference. The preference is enforced by the compiler; it is not part of each namespace.

A fundamental problem in data management, programming languages, XML, and many other situations is the use of the same name (signifier) for different elements. Typically, namespaces serve to differentiate multiple applications of a signifier. They usually enforce a uniqueness condition on the signifiers in the set.

4.6Definitions

A definition is representation of a concept by a descriptive statement which serves to differentiate it from related concepts. There are 2 kinds of definitions. An intensional definition is a definition which describes the intension of a concept by stating the superordinate concept and the delimiting characteristics. The definition of delimiting characteristic in clause 4.2 is an example of an intensional definition. An extensional definition is a description of a concept by enumerating all of its subordinate concepts under one criterion of subdivision. The definition of relation in clause 4.2 is an example of an extensional definition.

NOTE: Both kinds of definitions usually depend on knowing the definitions of other concepts in order to fully understand the concept under study.

4.7Prototypes

This sub-clause is outside the scope of ISO 704 and ISO 1087-1.

A prototype is an object in the extension of a concept that fits the characteristics of the concept well. Every concept has prototypes, and every object in the extension of a concept fits the characteristics to some degree. The degree can vary, but there are no hierarchies and no levels implied.