Data coding 1
2010.04.20
4a. Data coding
NB. Reading this document is no longer a priority for most users because Python code has expanded the densely coded records or 'histories' into extensive spreadsheets. However, the records are the definitive data, not the expansions, and their conciseness makes them very useful. An understanding of the records will be needed, e.g., to treat missed values in a different manner to the expansions. This understanding may come intuitively from comparing records with their expanded values and column titles. If so, it may be sufficient to go directly to the 04b_WORKED_EXAMPLES.
1. INTRODUCTION
During the years 2001-6, a locality had 2 hive stands of 3 hive boxes each at a separation of 20-100m. After 2007 the amount of fielded equipment was reduced, e.g., to 2 boxes or to 1 stand. Hives (four Amer Bee J articles, 2000) are boxes that contain 5 stacks,each of 7-10 nest blocks. A nest block contains 4-8 bores (nest cavities or potential nests). The fewer bores the larger the borewidth (8 bores of 3.2 mm in a Z block, 7 of 4.8 mm in an A block, 6 of 6.4 mm in a B block, 5 of 8.0 mm in a C block or 4 of 9.6 mm in a D block). Actually the bores (term in the lit.) are not drilled holes but U-section grooves stopped at one end and covered with a lid; however the cross_sectional area and radius equal that of a drilled hole with the specified borewidth as diameter. Except for the bee Hylaeus, or the wasp Auplopus, in large borewidths, a nest is a linear sequence of provisioned cells or cocoons. A viable nest is defined as at least 1 viable immature or unemerged adult. Krombein’s vestibular and intercalated cells (=empty spacers) are not counted. The cell count for a viable nest is therefore >=1, and a 0 indicates nest death for the species concerned.
Spreadsheets (initially called KEYS+MACROS, later LABELS_TAXA) in 11RECORD_SOURCES_and_SUMMARYanalyses, and Tables 4a.1 to 4a.3 below, list the many species abbreviations that follow. Section 4b works through examples.
- TABLES OF SINGLE ENTRIES (mainly of historical interest)
Tables of single entries are provided to the right in the t-spreadsheets for Generation 2004 only (in the “Part1” workbooks), and are explained below, as a means of introducing the user to the data set. More usually tables contain bore histories (see 3 later) which are field data that I can continue to check and edit. The user must use “find and replace” editing, the Lotus/Excel macros, or Unix filters etc. to break the history character strings down to the data forms required for analysis.
A t-spreadsheet for years 2001-5 consists of tables in which each row represents a nest block. The first few columns describe the position of the nest block in the hive stand and its physical nature, the next 4-8 the contents if any of its bores, and finally the Z to D borewidth characters pad the row to a constant width for editing convenience. Contents of bores at some particular stage of nest development are described by letters or counts. For example,
347B5_ 44_ _BBB
might record Spring counts of the viable cocoons in the first, third and fourth bores of the nest block in the seventh row of the fourth stack of the third hive, which has 6 bores of B borewidth. All names are abbreviated, and shown in keys on the sheet. The capital letters to the right are padding. The 3-letter abbreviations for the scientific names of host species begin with lower case letters and end in upper case ‘tie-breaking’ letters. For example,
347BotB _otBotB__B B B
shows nests of Osmia texana rather than O. tersula (which would be coded as otA).
Other host name abbreviations include nest descriptors of 2-3 lower case letters that describe some aspect of nest construction or provisioning, e.g.,
347Bma _mama__B B B,
(where ma keys to construction from a paste of masticated leaves). Abbreviations for any sort of loss, e.g., due to parasitoids, pests or harvesting, are shown by 2-3 capital letters. For example,
347B-ME5 _-ME4-ME4__B B B,
where the losses are due to the polyphagous eulophid pest Melittobia chalybii.
Although this form of presentation is well suited to analysis the number of tables can be very large, and it is more practical for me to retain the field data format and bundle all the information for a bore into a single alphanumeric string or ‘history’ that can archived and checked. For the third and fourth bores of the present brief example, this record would be
)otB_ma/4-ME4$0:
Records comparable to this are encountered for year 2001. As the project has progressed, however, counts have been made at a number of stages of nest development, and dates for female behaviours added, so histories have grown to be quite long. However, this is very practical in the field, as each bore’s entire history can be developed, reviewed and repeatedly checked over a period of upto 18 months or so without having to switch to other cells in the large spreadsheet. As formatting rules are strict, histories are easily cut by spreadsheet macros and edited to single numbers or letters as input to analysis, as shown by the examples in the spreadsheets for year 2005 (generation 2004).
3. TABLES AND LISTS OF BORE HISTORIES
3.1 STAGE SURVIVAL RECORDS
The history for a single bore in a nest block bore consists of a series of nest segment records each beginning with a characteristic segment-starting nonalphanumeric symbol and terminated by a colon as the universal segment terminator. Note that the terminator colon is usually qualified as #: , where the # stands for nest debris examined and removed by me, or as !: for a “lost” nest (! =assumed removed more or less cleanly without loss of life by a new nester).
The segment-starting symbols
) > ^ | ---symbols of the final data set,
and ] } ^ * ---additional symbols used in two-generation working records,
which to my fancy are either right-facing or sharp, begin the host segments in a bore record. The symbols in the first group begin segments for the overwintered nest or old generation, and we will see later that the segments mainly report survival counts and causes of loss at different stages of nest development. Of this group ) is reserved for the first (rearmost in the nest block) overwintered host nest, for the second overwintered host nest, and ^ begins the list of any other failed or overwintering host nests in the same bore. The single generation records of the data set use only these symbols. The second group of segment-starting symbols above play similar roles for the new generation but are only used in two-generation field records. The first and second new nest segments begin with ] and } respectively, while ^ begins the list of any other failed or surviving new season nests. See Table 4a.1. To allow ad hoc supplementary records for old and new generations there are two other segment-starting symbols, | and * which are currently used for non-nest structures or shortform harvest records.
After the segment symbol comes the 3 letter abbreviation for the species or higher taxon hypenated to the nest descriptor. Then a chain of upto 6 symbols (& % @ \ / $) in that fixed order denotes the 6 pragmatic stages of nest development (see Table 4a.2). Each number in the symbol chain is the cell count (=count of viable insects) at the beginning of that stage. Thus in
)scB_mb&4%4@4\4/4$4#:?satisfactory#:
)scB indicates a nest of the Potter Wasp Symmorphus cristatus that is the first in the bore, while the arcane nest descriptor mb stands for mud construction and larval chrysomelid beetle prey. All 4 provisioned cells were fertile, surviving to the Spring, and I was able to examine sufficient nest debris for corpses to estimate the emergence as complete. The corresponding comment is flagged by the ? symbol and the same terminator as the segment receiving comment. Whitespace is avoided in all records and comments so as suit a wide variety of editing and filtering procedures. Comments are mainly temporary field notes and should always be ignored unless clear and objective (e.g., species name or abbreviation, prepupal colours, sex numbers, reference to the museum collection. (In this case the example the comment has no obvious value.) Note that the emergence stage ($-stage) value is always an estimate, and never a true count, because some corpses may have been lost during emergence or removed by a nest visitor. If an inspection was missed uncertain values are indicated by the value Q, as in
)scB_mb&5%Q@Q\Q/4$4#:
where the missed values must be 5s or 4s. In this instance the estimated emergence ($-stage) is likely good (because there were no corpses in the nest debris). But debris cannot be inspected when the nest is lost to reconstruction. So in the record
)scB_mb&4%4@4\Q/2$2q!:
(where Q should be 4, 3 or 2) the emergence estimate is ‘doubly uncertain’ because it is conceivable that the new nester might have killed some immatures when reconstructing the old nest (!:). However, these uncertainties are speculative, and in practice I believe that the records are nearly always correct. Inspection of the debris on the hive louvers, or of rare nests in which a half cocoon precedes a change in species, suggests that forcible eviction is very uncommon, unless there is crowding, but though never heavy it does occur. A count followed by a lower case q is also an estimate where there is doubt.
When there are cell losses prior to Spring emergence due to the Cuckoo Wasp Chrysis (CH) or summer emergence (SE) these are shown as negative entries, e.g., for the Potter Ancistrocerus antilopeaaA with its mud nest and largecaterpillar provisions the record segment
)aaA_mc&3-1%2@2-CH1-BO0-ME0-SE1\0#:
shows that the nest had ‘failed’ by the late Fall (0 count at beginning ofthe Fall or \-survival stage). Here the &3-1 indicates that the provisions of one cell (or egg-stage) were untouched, i.e., either no egg was laid there or much more likely the egg or small larva died, while @2-CH1-BO0-ME0-SE1 indicates that one of two summer cocoons was replaced by Chrysis CH, that there was a bombyliid superparasite BO (no host cells killed), that MelittobiaME was also apparent in or near the bore, but not as an obvious cause of host death, and that the second host cocoon emerged that summer SE without overwintering. For simplicity losses due to large parasitoids (large also means 1 per cell) are always deemed to occur during the summer cocoon stage ( @-stage), this is typically the earliest stage that parasitoids are discovered. Summer emergence SE is coded as a special type of loss purely for convenience. It may or may not be a biological loss as some species have a prolonged nesting period that goes into late September. Another example is
)aaA_mc&3-1%2@2-CH1-BO0-SF1\0#:
Here, summer emergence failed SF and the eclosed adult died within the nest.
Such records can become quite long if there are multiple nests, multiple parasitoids and several causes of death. Parasitoid record segments begin with mostly left pointing symbols that mostly match the host segment symbols. Thus the host segment symbols
), >, ], }, ^
have
(, <, [, { , ~
as their matching parasitoid segments. As an example the parasitoid segment following the Ancistrocerus segment just above might be
(CH@1-BO1\0;BO@1\1-ME1/0#:
indicating that the Chrysis was killed by the bombyliid after spinning its summer cocoon, and the hyperparasite was in turn killed by the pest Melittobia during late Fall or a winter thaw (the overwintering \-stage). So no parasitoids emerged. Whether Melittobia emerged or not is not recorded as it is tiny and too numerous to count, i.e., it is a pest.
3.2 DATED EVENTS (‘dot extensions’)
The female behaviour record, developed in 2005, is a ‘dot extension’ of the 2-3 letter nest descriptor that adds dates of simple behaviours like the founding of the nest and its stoppering. The dot extension begins at the dot and ends at the & that begins the stage survival part of the record segment (if there are no stage survival data the dot extension has the usual : terminator). An individual bore and its nests are currently examined only 1-3 times in a summer, but recording a different part of the hive stand each week or so should give some idea of the variation of the intensity of nesting for the stand as a whole during the season. Some of these behaviours differ with the species or within the species. Symmorphus cristatus often blocks the entrance to the bore with a protruding mud plug. As an example of a behavioural record
)scB_mb.n060721s060820&7-DI7-ME0%0#:
shows that a nest (n) first observed on July 21 2006 was found stoppered (s) on August 20. This one record by itself gives limited information about the actual dates of the events but has some use when there are several comparable records for different dates. When the female is seen on the nest, as in
)scB_mb.fp060721s060724&7-DI7-ME0%0#:
the timing information is more precise. In this case the female was seen working on the provisions of the unfinished first cell of the nest (fp), and a stopper was found only a few days later, so the entire nest was built quickly (only to be killed by dipteran maggots DI). See Table 4a.3.
The dot extensions are not expanded in the SOURCE spreadsheets that define the maintained data set, but are expanded in the dotSOURCE sheets.
4. HISTORIES WHEN THE EQUIPMENT CHANGES
When there are no equipment changes the annual spreadsheet shows only the histories of the overwintered and emerged generation. Generations are numbered by the year in which the nests are made, and any summer emergence without overwintering is recorded as a special type of loss, so generation i-1 always emerges in year i (except the single parsivoltine species Osmia texana which emerges in year i+1). To compare the histories for generation i-1 with generation i compare the spreadsheet tables for years i and i+1.
Changes in the equipment configuration, if there are any, are made in the early Spring before emergence. When there is such a change in the Spring of year i, comparison of generations i-1 and i is still possible because the sheet for year i-1 lists the deletions, switches, localities and block positions of the blocks for year i. The sheet for generation i begins with the new nest block arrangements properly ordered.
5. MAPS, TABLES AND LISTS
The data set contains photographs of the equipment. Maps are spreadsheets where the cells map to the field equipment. There are a few of these showing where the species are positioned within the equipment. In Tables, considered in 2. above each row of cells represents a nest block, and some columns represent bores in the nest blocks while others contain physical descriptors of the nest blocks or other information. In lists each row now represents a single bore.
Table 4a.1: Segment start symbolsNEST RECORDS IN THE DATA SET / Generation i / It is simplest if only one nest is allowed to overwinter in a bore, but current practice is to allow emergence competition.
1st host nest / ) / Start of 1st nest's record giving host cell counts at different survival stages, with losses due to pests or parasitoids, etc.
1st host’s parasitoids / ( / Similar, but cell counts and losses for 1st host’s large parasitoids.
2nd host nest / Etc.
2nd host’s parasitoids / Etc.
remaining host list / ^ / The survival records of the various hosts are concatenated by semicolons, with the usual colon as the final terminator of the segment.
parasitoids of remaining hosts / ~ / The survival records for the various large parasitoids are concatenated..... etc.
non-nest / | / | is used for nonnest and ad hoc records, e.g., (i) an unprovisioned structure (e.g. a nest stopper on an empty bore) or a bore containing a single provsioned but unsealed cell; (ii) a needed duplicate record for an earlier generation of the biennial species otB; (iii) a shortform record that is part of a large harvest.
Following section is for my reference. These symbols are only used in the two generation field records, not in the single generation SOURCE spreadsheets.
NEW GENERATION NESTS / Nests of the new generation i+1, new season's nests, "crop" nests. / Sometimes there is a 2nd or even a 3rd nest in the same bore.
1st host nest / ] / Start of 1st nest's record of host counts and losses due to pests or large parasitoids, etc.
1st host’s parasitoids / [ / stage counts and losses for the 1st new nest’s large parasitoids
2nd host nest / } / Etc.
2nd host’s parasitoids / { / Etc.
remaining host list / ^ / survival counts and losses for the remaining new season’s hosts
list of parasitoids of the remaining hosts / ~ / similar but for the parasitoids.
non-nest / * or = / Field sheets carry both old and new generation data. * (Excel) or = (lotus123) for the new.
Table 4a.2: NEST STAGE SYMBOLS
provisioned cells = eggs or small larvae with abundant provisions. Count is 0 if there is a multicelled completely empty nest (no provisions, unusual).
% / large larvae, precoccoon, with provisions appreciably consumed.
@ / summer cocoons depending on spp and date contain large larvae, prepupae, pupae or adults (the @ character looks a bit like a cocoon).
\ / Oct to March overwintering cocoons (the \ character suggests falling temperatures).
/ / Seed nest: April to early Aug unemerged cocoons (the / character suggests rising temperatures).
$ / Estimated emergence from the nest after overwintering. (The $ character represents success.) Any summer emergence is recorded in the stage survival counts as a nominal loss at the @ stage, e.g., …@4-SE1\3….
OTHER SYMBOLS
# / As an isolated symbol is inconsistently used to mean an empty bore.
: / Segment terminator.
#: / Qualified termination. The nest or part of nest described in the segment has been well examined and scraped away by me
!: / Qualified termination meaning a lost nest. More precisely the nest described in the data segment has been scraped clean or almost clean by a new nester ---it is believed without loss of life in almost all cases (see ESO 2008 talk on 'wars').
; / Punctuation for successive host or parasitoid records in a list.
? / NOT a question BUT begins a comment. Comments may contain commas but no spaces. Though the value of a comment is often short lived, it often helps my checking.
Q / With a preceding numeral indicates a good estimate, i.e., one deem ed only slightly uncertain; q also ends an uncertain species name or a question within a comment.
Q / An unknown or missing count where an accurate value is not possible. However, a fair estimate can often be interpolated by the user. See Glossary.
, / Used sparingly in comments. Use within records to count females has been abandoned, e.g., ../12-PH4,f2$:..., where 2 of the 4 harvested were found to be female and 8 have yet to be examined.
Table 4a.3: DATED EVENTS (dot extensions) Dates are given as yymmdd, e.g. 040711
female not seen / female seen
female seen / f
only end wall, or far wall, of first cell of nest built / w / fw
provisioning of first cell begun / p / fp
nest with at least one completed cell / n / fn
nest left incomplete (last naked or cocooned immature exposed) / i
nest stoppered or being stoppered / s / fs
female dead in entrance to nest / fd
a posterior nest that has been superseded (Krombein’s usage, =not completely destroyed) by a more anterior nest / k / kf
A questionable record, e.g., one based on a student’s report. Note the q immediately follows the dot. / .q / .q
Eggs / e
Small larva / l
Prepupa / r
Pupa / a
Emerging during inspection / g
Emerged just before inspection (use caution) / h
# of provisioned cells in the nest when it was first found / c / c
Example: scB_mb.fw040712n040731s040830c1 {{history segments follow}}