Resumé of Proforma building procedure:

Resumé of Proforma scoring procedure:

Mapping DJ Middleton's Character Descriptions to Prometheus Description Elements:

i.Quantitative Scores

ii.Qualitative Scores

The problem of semantics and context

The problem of Property/stategroup composition

The requirement for modifier and qualifier terms

The problem of relative states

Mapping to Prometheus

(a) Simple and Moderately Complex DELTA Characters

(b) Complex DELTA Characters

iii.Outstanding Issues

Experiences scoring legacy data

Results

Analysing Alyxia description data captured in the Prometheus Database

CONCLUSIONS

note: hyperlinks are relative to this file, and can be downloaded from the same root folder if not displaying on your machine
Creating a Prometheus Proforma to re-enter DJ Middleton's Alyxia specimen descriptions (from DELTA format)

Résumé of Proforma building procedure:

The PFVis Tool displays the full details of the input angiosperm description ontology, Ontology.xml.

Navigation through the ontology is primarily mediated via a collapsible tree hierarchy representing all of the possible anatomical structures present on an angiosperm specimen and their possible compositional relationships. The structures that a user wishes to include in his proforma description template are selected (enabled), which also enables their parent structure in the hierarchy. This process establishes the structural context of a structure that is to be described. Additional structures can be added to these enabled structures. These include regions (apex, inside, margin etc.) and 'generic' structures (hair, pore, vein).

The properties that can be described for an enabled structure are displayed as a list of both structure-specific and universally applicable qualitative properties, and universally applicable quantitative properties. Again, the user enables properties that are to be included for scoring in the proforma.

In the current ontology prototype the qualitative properties are in fact sets of states that tend to be used in a similar description context (i.e. 'stategroups'). The member states of each stategroup are displayed under the group, and the user can select some or all of the available states to be available in the proforma for specimen scoring.

Thus the structures to be described, the properties to be described, and the possible states (scores) are selected from the ontology. At this stage of Proforma specification, properties can be modified by applying spatial modifiers or relating them to other scores (e.g. leaf width relative to leaf length).

It is possible to add new scoring properties (as duplicates of existing stategroups, with editable names, and potentially having a different subset of scorable states selected) and it is possible to duplicate any structure in the structural hierarchy to allow distinguishable 'types' of a given structure to be scored independently. It is also possible to prescore (fix) some scores, and it is possible to determine whether the scores for a particular structure are to be collected in the abstract (i.e. a representative, average value) or whether an actual real, concrete score must be entered. It is also possible to alter the order of structures (sibling nodes) in the structural hierarchy, which can be used to control the order in which a description template will be displayed to a user.

Completed Proformas (i.e. filtered/edited views of the base ontology) can be saved and reloaded as a Proforma.xml document.

The process of creating a Proforma is shown in the video:

Making&ScoringASimpleProforma.exe

Résumé of Proforma scoring procedure:

The edited Proforma view of the ontology, is automatically displayed as a scorable description template – where a new page for each describable structure displays the properties to be scored and the available states that may be chosen to score that property. (For quantitative properties data entry boxes are provided for number entry). Scores can be modified as they are recorded by selecting from a few simple modifier terms. It is possible to select multiple states for the score of a qualitative property (e.g. brown AND yellow).

Each scoring unit (i.e. a property for a single structure, represented as a Description Object in the application) can be replicated, to allow multiple instance scores to be collected for concrete data, or to allow alternative ('or'-ed) values to be collected for abstract data.

A new score sheet is created and completed for each specimen/taxon to be described. The structures and properties can be scored in any order or in the predetermined proforma ontology order. No score is compulsory and it is possible to record absence of scorable structures, negative scores and the deliberate omission of some structures.

The scored specimen details are scored as an XML representation of the data (i.e. detailing all the completed Description Objects for that specimen). Completed XML scores can be reloaded into the interface, and are parsed by a separate application to be stored in the Prometheus II relational database (each Description Object in the specimen description XML being represented by one or more Description Elements in the database). Each Description in the database records the identity of the specimen described, its author, the identity of the Proforma used as the description template and consists of the set of Description Elements and Modifiers for that description.

The process of creating and scoring a simple Proforma is shown in the video:

Making&ScoringASimpleProforma.exe

Whilst a more complex Proforma is demonstrated in the video:

LoadingAComplexProforma.exe

Mapping DJ Middleton's Character Descriptions to Prometheus Description Elements:

David Middleton's recorded the descriptions of nearly 1400 specimens in the Pandora taxonomic database for his revision of the Alyxia genus (Middleton 2000, 2002).

These descriptions are composed and recorded in a DELTA format, with there being 133 DELTA characters.

This data is contained in the following data files:

  • The Alyxia description Characters:

ALYXIACharacterDefinitions.txt

  • The DELTA matrix format of the descriptions

1400AlyxiaSpecimenDELTADescriptionMatrices.txt

  • Text conversion of these descriptions:

1400AlyxiaDELTADescriptionsInEnglish.xls

  • Specimen details:
    Alyxia&KopsiaNames&Specimens.xls
  • Conversion of Middleton's DELTA characters to Prometheus format:

MappingDELTACharactersToPrometheus.xls

MappingDELTACharactersToPrometheus.pdf

i.Quantitative Scores

Of the 133 characters used by Middleton, 49 are quantitative scores, which lend themselves readily to storage as Prometheus Quantitative Description Elements, composed of a defined structure (with structural context), defined property, and a score (which may be a range) with appropriate defined unit. The full mappings between the DELTA Characters and atomized Prometheus description elements are shown in file (v) above.

In some cases complicated spatial modifiers are required to accurately represent exactly what part of the plant is described. For example,

Character #49: Stamens inserted at/ mm from corolla base

requires use of a spatial modifier to capture exactly the distance being measured:

Structure: Tube

Path: ENTIRE PLANT.Inflorescence.Flower.Perianth.Corolla.Tube
Property: Length (renamed Length: base to stamen insertion)

Modifier:

RelMod:Between

Path1: ENTIRE PLANT.Inflorescence.Flower.Perianth.Corolla.Tube.Base

Path2:ENTIRE PLANT.Inflorescence.Flower.Androecium.Stamen.Base

Units: mm

The representation of

Character #50: Stamen insertion <ratio in tube>/ of tube length

is even more complex, as it is in fact Character #49 as a ratio to:

Structure: Tube

Path: ENTIRE PLANT.Inflorescence.Flower.Perianth.Corolla.Tube
Property: Length

Units: mm

Although represented in the description template as a single Description Object, when parsed to the database, Character #50 is recorded as a ratio of one Description Element to another, and representing the spatial modifier for the first of these requires a further two Description Elements.

ii.Qualitative Scores

The problem of semantics and context

In order to remove ambiguity Prometheus only uses strictly defined terms for the composition of descriptions. However, David Middleton's character descriptions and character states are composed of English language phrases and the terminology used by is not explicitly defined (although aspects of it are discussed in his published Revision). It is therefore impossible to translate his descriptions into Prometheus statements with 100% accuracy, and we have had to interpret his descriptions to the best of our ability, and map his terminology to our defined terminology. This is a major problem with the representation of 'legacy' data in the Prometheus system.

Most of the semantic ambiguity in the DELTA character descriptions is in the character/state terminology where we cannot know exactly what is meant by the use of individual undefined words to describe the character or the observed state, but there are also a number of instances where the structural terminology is somewhat ambiguous through either omission or the use of non-standard terminology. For example in Character #101 'pistil head' <pubescence> it is not clear whether 'pistil head' is equivalent to a 'stigma' in the Prometheus Ontology, or possibly just the 'apex of a pistil,' or perhaps there is an undefined substructure 'head' on the pistil. There is similar confusion about description of 'Corolla bud head' (Characters #67 and #82). Further structural ambiguities concern the structural context of described structures. For example in the descriptions bracts can be localized to a number of places, but some characters do not distinguish exactly which position of bract is being described (this is 'solved' in Prometheus by always describing a structure in an explicit context). In another somewhat ambiguous Character (#69: blade <coriaceous>) we cannot be certain whether it is a leaf blade that is being described, or the blade of a petal, sepal, bract etc. (In common usage 'Blade' refers to the Leaf Blade, but a number of other characters here explicitly refer to 'Leaf Blade', making the use of 'Blade' anomalous.)

The problem of Property/stategroup composition

The Prometheus Description model breaks qualitative character descriptions into atomized description element statements, composed of the structure and property being described and the scored state being recorded. In the Proforma scoring template Description Objects for qualitative 'Characters' list the possible states to be scored for a given single property for a chosen structure, and it is not possible in this model to group states from different properties as alternatives in the same Description Object. (One DELTA character can, however, map to more than one property, so that to represent a single character, more than one Description Object is required in the Proforma template).

As discussed elsewhere (Paterson et al, 2004) when creating our angiosperm description ontology it proved difficult to recognize and organize the states used in character descriptions into 'Properties'. For this reason we initially grouped states into sets representing their contextualized usage, with these 'stategroups' representing de facto properties. These state groups were used as the 'Qualitative Properties' for construction of our Alyxia description proforma, with each Qualitative Description Object only presenting alternative states drawn form a single stategroup.

However, Proforma specification using such inflexible groups was problematic and required some reorganisation of the stategroups in the ontology to cope with this data, or the unnecessary splitting of a single character into multiple Description Objects because the states required had been classified into different usage groups. A solution that we favour would use a more flexible organisation of states into hierarchical properties. We propose creating a hierarchy of properties, with different states attached to a property at a given level in the hierarchy, but in which states would also 'belong' to any parent properties of their specific property group. A Description Object would use as specific a property as possible that contained all the necessary states. For example 'Outline Shapes', might be a subproperty of '2D Shapes', and that of 'Shapes', 'Appearance' and finally of the root property itself: 'Qualitative Property'. Such a hierarchical arrangement would allow states from 'different' property groups to be used together in one Description Object, by using a property level higher up the hierarchy. (Such an hierarchical organisation of states and their properties is demonstrated in DemoProperties.pps). Properties themselves could still be contextualized to specific structures as for Stategroups in the present ontology, or it would be possible to contextualize subsets of states from a given property to applicable structures.

The requirement for modifier and qualifier terms

A central tenet of the Prometheus approach to recording taxonomic descriptions is to encourage quantitative data acquisition where possible, and to discourage the use of 'poorly'-defined relative states for recording quantitative data. However, it is recognized that often the working taxonomist is not able, or does not need, to record accurate quantitative data, but still wishes to record some approximate information. This is particularly a problem for 'legacy' data coded in natural language or using DELTA characters, where the only distinction between alternative states are relative modifiers. For example,

DELTAPROMETHEUS

Character #91.
midrib <sunken type> / leaf.midrib <shape>
1. slightly sunken / sunken (slightly)
2. very clearly sunken / sunken
3. deeply sunken / sunken (strongly)

However, we would still discourage modifiers for de novo descriptions as they may be of little value for interpreting and comparing data at a later stage.

The Prometheus modifiers and qualifiers are scored at data entry time and include

  • frequency modifiers: Always, Mostly, Sometimes, Usually, Rarely.
  • Densely, Sparsely
  • Slightly, Strongly
  • and the special modifier NOT used to record negative scores

The precise meaning of these modifiers is undefined, nor can it be captured what they are relative to; they are probably only of real use when regenerating natural language descriptions. Prometheus has quantitative measures to record densities, or can, for example, specifically relate density in one location to density in an other location, or one size measurement to another by using relative scores ( =, <, >, >=, <=, != ).

Some possible modifiers were considered too indefinable to be of any use, for example the shape modifiers broadly and narrowly, and colour modifiers pale and dark.

The problem of relative states

Legacy data, not collected according to the Prometheus model, frequently includes relative states such as large, small, short, long, narrow, wide. Typically these are used without explicitly recording what other structures and score values they relate to. For example, where a hair can be recorded as 'short' or 'long', does that mean 'in relation to other hairs on the same specimen', or 'in relation to similar hairs on other specimens'? Sometimes this difference can be inferred from the available states in the DELTA Character definition, as in the example below, but it is not explicitly captured in the data. In Prometheus terms it would be better to record an actual quantitative measurement in the data, and post-analysis can evaluate the relative lengths of hairs on different structures or specimens. However, in order to allow the representation of legacy data we have defined a number of 'comparator' states, explicitly either in the context of (a) the specimen being described OR (b) the range of specimens being described in the entire Project.

DELTAPROMETHEUS

Character #94.
Hair type <on inflorescence> / inflorescence.hair <shape-general> / inflorescence.hair <comaparator>
1. short straight / straight / short (vs other spp)
2. short curved / curved / short (vs other spp)
3. long straight / straight / long (vs other spp)
4. long curved / curved / long (vs other spp)

The states in the Stategroup: <comparators> include

Average (relative to Dataset/Project),
Dense (relative to Dataset/Project)
Equal (relative to Dataset/Project)
Large (relative to Dataset/Project)
Long (relative to Dataset/Project)
Narrow (relative to Dataset/Project)
Short (relative to Dataset/Project)
Small (relative to Dataset/Project)
Sparse (relative to Dataset/Project)
Wide (relative to Dataset/Project) / Average (relative to Specimen),
Dense (relative to Specimen)
Equal (relative to Specimen)
Large (relative to Specimen)
Long (relative to Specimen)
Narrow (relative to Specimen)
Short (relative to Specimen)
Small (relative to Specimen)
Sparse (relative to Specimen)
Wide (relative to Specimen)
Mapping to Prometheus
(a) Simple and Moderately Complex DELTA Characters

Of the 84 Qualitative Characters used by Middleton, 48 can be represented by a single Description Object, which present a group of alternative states selected from a single stategroup for specimen scoring. However, in order to achieve this some reorganisation of our original stategroups was necessary – even duplicating the occurrence of a state in more than one group. (Such states probably should require different definitions if they are being used in clearly different contexts).

The remaining 84 Characters comprise more complex statements, which record two or more observations about different properties of the structure or structures being described by the 'Character'. For these characters it was necessary to map some or all of the DELTA 'Character States' to two or more Description Objects (and hence Description Elements). 22 Characters mapped to two Description Objects, whilst 7 mapped to three and 4 to four Description Objects in order to capture the full details of the Character. A further 4 Characters (discussed below) were extremely complex and would require mapping to multiple Description Objects.

Examples of how it is necessary to represent DELTA characters in multiple Prometheus statements are found in the first few characters:

(i)

Character #1. Plant <Habit>

1. Erect shrubs

2. Ground creepers

3. Climbers

4. Treelet

5. Shrub with arching stems

The angiosperm ontology has defined state terms for shrub, creeper, climber and treelet all grouped together under the Stategroup <Habit>. If we wished to represent the DELTA character with a single stategroup/property Description Object we could define new terms in the ontology for 'erect shrubs' and 'shrub with arching stems', or we might be able to create and use modifiers for terms – such as Erect (however, the explosion of possible modifiers would be unlimited). We have decided to interpret this data as recording something both about the habit of a specimen and the architecture of its stems, as represented below. If we represented properties hierarchically we could consider <Habit> to be a type of <Architecture> and could group Erect and Arching with the other Architecture states for plant, or might choose to describe the stem <Architecture> separately. Our current mapping is illustrated:

DELTAPROMETHEUS

#1. Plant <Habit> / Plant <Habit> / Plant <Architecture>
1. Erect shrubs / shrub / erect
2. Ground creepers / creeper
3. Climbers / climber
4. Treelet / treelet
5. Shrub with arching stems / shrub / arching

(ii)