Implementation Notes of the OBIS Schema
Date Last Modified
it is acceptable to enter a date only if the time is unknown
if the DiGIR data table contains elements with different
modification times, enter the most recent time
if the modification date-time is unknown, enter the date-time of first
"publication"
Collection Code and Institution Code
The Collection Code and the Institution Code
can be the same in the case of Institutions that serve only one Collection
These can be full names instead of codes/abbreviations,
if preferred.
The Collection Code (and/or the Field Number)
can be hold concatenated Station and Expedition names/codes.
Catalog Number
The Catalog Number should be stable through time. So if a record is deleted,
do not re-use the Catalog Number for a new record
Scientific Name
If the record is identified to genus and species level, this
field should hold the genus and species epithets with a space between (for
a total of 2 words). If subspecific epithet is known, this should be included
in the string (for a total of 3 words). If the identification was only to
a higher rank than genus, then name of the lowest known rank should be entered
(1 word)
do not include the authority for the name here
Scientific Name Author
The year of original publication should be included if known, separated from
the author name by a comma and space. If the name has undergone a genus revision,
the authority and year should be in parentheses. Valid Examples:
Smith
Jones, 1973
(Hastings, 1986)
Use of the Start/End fields
There are several fields, such as latitude, longitude, day collected, month
collected, etc., that have a start and end version. For example, the OBIS
schema has "latitude", "Start_Latitude" and "End_latitude."
How to fill in these fields is perhaps the most confusing part of the OBIS
schema.
Why are all these fields there? They seem redundant?
Yes, they are redundant, but there is a reason for that. The Darwin Core
represents all the location and time fields as single fields. But OBIS
members thought it was important to be able to express a range of location
or time. For example, a trawl might have been taken over a line transect
that is better expressed as a start and end latitude and longitude than
as a single point. Or an old specimen might only be labeled with the dates
of the cruise, and not the day it was sampled, so that all we know is that
the sample was taken sometime within a span of several months. For these
reasons, OBIS added the start and end fields to the location and time information.
However, the OBIS schema needs to be compliant with the Darwin Core, so
we still need to keep the original single-field options. So we end up with
a field for, e.g, latitude, one for Start Latitude and one for End Latitude.
Implementing the Start/End Fields.
How you implement the Start/End fields will depend on the kind of data that
you have. But regardless of your data structure, you should never have to
type the same value into more than one field - you can make the database do
this automatically.
Throughout the following directions, we will use latitude as an example.
But the same rationale applies to all of the Start/End fields: Year Collected,
Month Collected, Day Collected, Time of Day, and Longitude.
Case 1: all of your latitudes are point latitudes; none of them
have separate start end latitudes.
In this case, you should have a "Latitude" field in your database
into which you enter this information. When you install DiGIR and map
your fields to the OBIS Schema, your "Latitude" field will get
mapped to the OBIS Schema fields for "Latitude", "Start
Latitude" and "End Latitude."
Case 2: You have samples that were taken over space and want to
record a start and end latitude for all of them.
You should have "Start Latitude" and "End Latitude"
fields in your database. These map to the same fields in the OBIS Schema.
You then have a decision. "Latitude" is a required field in
the OBIS schema, and a Darwin Core field, so you must map it to something
in your database.
Solution A: The best option is to make an OBIS view of your database
and create a "Latitude" field that is the average of your "Start
Latitude" and your "End Latitude" fields (i.e. sum the
fields and divide by 2)
Solution B: If the space covered by the sample is relatively small,
you may feel that just using the "Start Latitude" field is good
enough.
In either case, though, you must take care that the location precision
is accurate (see below).
Case 3: Some of your samples were taken at a point and some were
taken over a distance.
In this case, you can use the same method as Case 2 above. For those
samples that were taken as a point, the simples approach is to have the
"Start Latitude" and "End Latitude" fields be equal.
You can fill them in individually by copying and pasting, or by a small
script/routing. Alternatively, you can leave the "End Latitude"
blank, but remember you'll have to have a way to get out the appropriate
precision fields later (see below).
Filling in the Coordinate Precision fields - general
comments
This field or fields (see below) indicates the precision with which the latitude/longitude
location is given. This is generally a function of the method used (GPS,
etc.). While this is not a required field, it is a very important one and
we highly recommend that you include it if at all possible. Note that the
unit is meters, while the latitude and longitude fields are reported in decimal
degrees. Note that when in doubt
it is always better to err on the side of indicating a larger value in this
field - it is better to indicate a little too much uncertainty than to report
false precision. When in doubt, the number of significant digits in the latitude
and longitude may roughly indicate the precision. The precision should never
be smaller than the uncertainty created by the number of significant figures
in the latitude and longitude (i.e. it doesn't make sense to report that a
location is precise to 1 m if the latitude and longitude are only given to
the tenth of a degree).
Coordinate Precision versus
Start/End Coordinate Precision
The OBIS schema has two location precision fields: "Coordinate Precision"
and "Start/End Coordinate Precision." Following the case examples
from the "Use of the Start/End fields" notes, this is how they should
be filled out.
Case 1: All of your latitudes are point latitudes; none of them have separate
start end latitudes. You should have one precision field in your database
and use this to estimate the precision with which each sample is measured
- this will be dependent on the method used (GPS, etc.). When you map to
the OBIS Schema, this field will be mapped to both the "Coordinate Precision"
and the "Start/End Coordinate Precision" fields.
Case 2: You have samples that were taken over space and want to record a
start and end latitude for all of them. You should have two precision fields
in your database. "Start/End Coordinate Precision" should refer
to the precision with which the start and end location points are known.
"Coordinate Precision" should be a value that is large enough to
span the Start and End points from the "Latitude" and "Longitude"
fields. An example: say you are recording a 1 km-long trawl and used a GPS
to get your start and end points so that you think your lat/lon measurement
error is about 10 meters. In this case, your "Start/End Coordinate Precision"
is 10. Your "Coordinate precision" will depend on whether you use
solution A or solution B above. If you use solution A and report the midpoint
of the line for "Latitude" and "Longitude," then the "Coordinate
Precision" is 500m. If you use Solution B and report the "Start
Latitude" and "Start Longitude" in the "Latitude"
and "Longitude" fields, then the "Coordinate Precision"
is 1000m.
Minimum and Maximum Elevation
versus Depth
Minimum and maximum elevation are included because they are part of the Darwin
Core, but for samples below sea level it is synonymous with Depth (except
with the opposite sign). OBIS does not query on the elevation fields - it
only uses the depth fields.
If all of your data are marine, then you can use just depth in your database.
If you want to serve elevation then it can be automatically calculated as
-depth. Or vice-versa. Just don't enter the numbers twice!
If you do hold non-marine data, such as data from lakes, then you may need
to fill in both fields. In this case, the depth indicates the distance below
the water level, while the elevation indicates the height above sea level.
So a sample taken 10 meters below the surface of a lake on the top of a mountain
that is 3000m high would have a depth = 10 and an elevation = 2990.
Elevation should not be used to indicate height above seafloor for marine
sample.
Depth Range
The preferred method is to use the "Minimum Depth" and "Maximum
Depth" fields, with both fields being equal when a collection was made
at a single depth point, and not to use the Depth Range field. All new data
entry projects should follow this format. However, we recognize that there
are some legacy databases that have a single depth range field and where the
data contributors can't take the time to individually split them up. If the
depth is always recorded in meters using numerals (i.e. "10" not
"ten"), then one of the OBIS contributors, SEAMAP, has developed
a nice routine for pulling out the minimum and maximum automatically - you
can contact Ben Best for details: bbest@duke.edu. But for those of you with
fields that look like "from one to 10 fathoms" and don't have the
time to convert them one by one, you can use the "Depth Range" field
for free text information on depth. Note that there should be no cases in
which all three are filled out for an individual record: if you have the
Minimum and Maximum, then the range can be calculated and it should not be
entered.
Individual Count versus Observed
Individual Count
The Darwin Core developed from the museum community, so "Individual
Count" refers to the number of specimens that were saved, not the number
of individuals that were caught. OBIS has added the "Observed Individual
Count" to indicate the total number per species that were caught. So
if a fisheries survey caught 100 squid of a certain species and preserved
10 for a museum collection, then Individual count = 10 and Observed Individual
Count = 100. Most databases will only have one or the other of these pieces
of information saved.
Related Catalog ItemThe Relationship Type and Related Catalog Item can be used to express tagging
data following an individual through time (i.e. a later sighting is related
to an earlier sighting). A special "relationship type" term should
be defined for this.
|