Giter VIP home page Giter VIP logo

eml's Introduction

EML: Ecological Metadata Language

Main Build Status Develop Build Status EML

Cite as:

Matthew B. Jones, Margaret O'Brien, Bryce Mecum, Carl Boettiger, Mark Schildhauer, Mitchell Maier, Timothy Whiteaker, Stevan Earl, Steven Chong. 2019. Ecological Metadata Language version 2.2.0. KNB Data Repository. doi:10.5063/F11834T2 Copy BibTeX

@article{EML_2019, title={Ecological Metadata Language version 2.2.0}, url={https://eml.ecoinformatics.org}, DOI={10.5063/f11834t2}, publisher={KNB Data Repository}, author={Jones, Matthew and O’Brien, Margaret and Mecum, Bryce and Boettiger, Carl and Schildhauer, Mark and Maier, Mitchell and Whiteaker, Timothy and Earl, Stevan and Chong, Steven}, year={2019} }

The Ecological Metadata Language (EML) defines a comprehensive vocabulary and a readable XML markup syntax for documenting research data. It is in widespread use in the earth and environmental sciences, and increasingly in other research disciplines as well. EML is a community-maintained specification, and evolves to meet the data documentation needs of researchers who want to openly document, preserve, and share data and outputs. EML includes modules for identifying and citing data packages, for describing the spatial, temporal, taxonomic, and thematic extent of data, for describing research methods and protocols, for describing the structure and content of data within sometimes complex packages of data, and for precisely annotating data with semantic vocabularies. EML includes metadata fields to fully detail data papers that are published in journals specializing in scientific data sharing and preservation.

Getting Started

Composing an EML document can be done in a simple text editor (e.g., Atom), via scripting languages like R and python (e.g., the R eml package), in general-purpose XML authoring tools (e.g., Oxygen), and in custom web-based metadata editing tools (e.g., MetacatUI). While these tools expand and shift over time, the core metadata language has been consistent and backwards compatible, allowing for decades of seamless interoperability of data sets in many repositories.

EML documents can be started simply, and then additional detail added over time. On the simple end, an EML document that provides basic bibliographic information would be sufficient for citing a data set and for simple discovery in catalogs:

<?xml version="1.0"?>
<eml:eml
    packageId="doi:10.xxxx/eml.1.1" system="https://doi.org"
    xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.1"
    xsi:schemaLocation="https://eml.ecoinformatics.org/eml-2.2.0 xsd/eml.xsd">
    
    <dataset>
        <title>Primary production of algal species from Southeast Alaska, 1990-2002</title>
        <creator id="https://orcid.org/0000-0003-0077-4738">
            <individualName>
                <givenName>Matthew</givenName>
                <givenName>B.</givenName>
                <surName>Jones</surName>
            </individualName>
            <electronicMailAddress>[email protected]</electronicMailAddress>
            <userId directory="https://orcid.org">https://orcid.org/0000-0003-0077-4738</userId>
        </creator>
        <keywordSet>
            <keyword>biomass</keyword>
            <keyword>productivity</keyword>
        </keywordSet>
        <contact>
            <references>https://orcid.org/0000-0003-0077-4738</references>
        </contact>
    </dataset>
</eml:eml>

This document can then be supplemented with additional metadata describing research projects and methods, structural information about the data, and much more.

About the EML Project

The EML project is an open source, community oriented project dedicated to providing a high-quality metadata specification for describing data relevant to diverse disciplines that involve observational research like ecology, earth, and environmental science. The specification is maintained by voluntary project members who donate their time and experience in order to advance information management for ecology. Project decisions are made by consensus of the current maintainers on the project.

We welcome contributions to this work in any form. Individuals who invest substantial amounts of time and make valuable contributions to the development and maintenance of EML (in the opinion of current project maintainers) will be invited to become EML project maintainers. Contributions can take many forms, including the development of the EML schemas, writing documentation, and helping with maintenance, among others.

Contributing

Developers may be interested in browsing the source code repository that we use in developing EML. Starting with EML 2.1.1, the master branch reflects the current stable release of EML. Development occurs in development branches (e.g., BRANCH_EML_2_2), which allows experimental additions as they are being proposed by the community. This always contains the most recent development version of EML, and therefore may be in flux, or otherwise broken. It is unlikely that it will contain the same files that are in the current release. Development branches are virtually guaranteed to change before they are released, and so they should not be used in production environments. Use development branches at your own risk for testing. Write access to this repository is reserved for current project maintainers. Please submit contributions as pull requests. We welcome contributions to this work in any form. Contributions can take many forms, including the development of the EML schema, writing documentation, and helping with maintenance, among others. Non-project members can contribute by submitting their feedback, revisions, fixes, code, or any other contribution through pull requests at GitHub. Discussion of issues occurs on the Slack channel, or through the EML Issue Tracking system. The preferred way to submit problems with EML or feature requests is the issue tracking system.

History

EML was originally developed by Matthew Jones at NCEAS based on a report by the ESA Committee on the Future of Long-Term Ecological Data and on a related paper on ecological metadata by Michener et al. (see Michener, William K., et al., 1997. Ecological Applications, "Nongeospatial metadata for the ecological sciences" Vol 7(1). pp. 330-342.). Version 1.0 was released at NCEAS in 1997 and used internally, with further internal releases of versions 1.2, 1.3, and 1.4, all of which followed the FLED recommendations closely in its content implementation. Version 2 became a community-maintained, open specification. Substantial modifications for EML 2.x came from experience using the earlier specification at NCEAS and from feedback from the ecological community, particularly information managers from the Long Term Ecological Research Network. Versions 2.1 and 2.2 introduce significant new features like internationalization, semantic annotations, and support for data papers.

Older versions (deprecated)

The following versions are still available for reference purposes, although they have been superseded by the current version (2.2.0). Please make every effort to use the current version.

Copyright and License

Copyright: 1997-2019 Regents of the University of California

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

Funding and Acknowledgements

EML was developed and is maintained with support from the National Center for Ecological Analysis and Synthesis (NCEAS), a Center funded by the University of California Santa Barbara and the state of California.

This material is based upon work supported by the US National Science Foundation under Grant No. DEB-9980154, DBI-9904777, 0225676, DEB-0072909, DBI-9983132, and DEB-9634135. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

This product includes software developed by the Apache Software Foundation (http://www.apache.org/). See the LICENSE file in lib/apache for details.

The source code, object code, and documentation in the com.oreilly.servlet package is copyright and owned by Jason Hunter. See the cos-license.html file for details of the license. Licensor retains title to and ownership of the Software and all enhancements, modifications, and updates to the Software.

This product includes software developed by the JDOM Project (http://www.jdom.org/). See jdom-LICENSE.txt for details.

eml's People

Contributors

amoeba avatar artntek avatar cgries avatar csjx avatar dependabot[bot] avatar duanecosta avatar kf8a avatar laurenwalker avatar leinfelder avatar maier-m avatar mbjones avatar mobb avatar mpsaloha avatar srearl avatar stevenchong avatar taojing2002 avatar twhiteaker avatar yvanlebras avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eml's Issues

eml-dataset changes needed


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 482, https://projects.ecoinformatics.org/ecoinfo/issues/482
Original Date: 2002-05-01
Original Assignee: Peter McCartney


Changes as decided upon at the Sevilleta EML meeting, April 24-25, 2002:
Responsible: Matt

  1. Add contact+ to dataset -- I also have a note that this should be added to
    resource as contact*. Need to figure out which.
  2. Add "publisher" *
  3. Add "purpose"
  4. Do NOT add geoform -- decided it was farsical and we could provide a
    reasonable value (e.g., document) when converting to FGDC
  5. add "maintenance" description -- para that describes freq of update and
    completion status
  6. my notes say to add distribution* elelemnt, but others say it goes in
    resource, which is it? ditto with "contact*"

Incorrect Citation reference (citeinfo) in eml-coverage, temporalCov and taxonomicCov


Author Name: Chris Jones (Chris Jones)
Original Redmine Issue: 373, https://projects.ecoinformatics.org/ecoinfo/issues/373
Original Date: 2001-12-11
Original Assignee: Chris Jones


The geolcit, classcit, and idref elements in eml-coverage.xsd use a complex type
consisting of one element reference in a sequence. The reference is to the
citeinfo element, which is a single element defined in eml-coverage.xsd (with no
documentation). This element ref needs to be changed to point to an eml
literature citation field. Most likely, we would have to import citation into
eml-coverage for this to work appropriately.

ResourceVariation type not needed


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 43, https://projects.ecoinformatics.org/ecoinfo/issues/43
Original Date: 2000-07-26
Original Assignee: Chad Berkley


Inthe resource.xsd XML Schema document, the ResourceVariation type is not
needed. Instead, it would be better to just have a set of top-level elements
defined (like dataset and literature) that can be used as the docroot for
particular resource documents. This would eliminate the need for the whole file
"resourceExample.xsd".

eml-constraint use of identifers


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 427, https://projects.ecoinformatics.org/ecoinfo/issues/427
Original Date: 2002-02-14
Original Assignee: Matt Jones


The current eml-constraint module is designed to reference table and attribute
identifers so that the relationships between two particular entities can be
established. However, we do not currently indicate how the values for these
identifiers should be obtained or constrained. Are they the eml-identifiers
(which doesn't work for attributes), or are they names (entityName,
attributeName) which might run into many problems with uniqueness issues? We
need an easy, consistent, approach that we recommend or require as part of the
semantics of this module.

In addition, constraints will always apply to one or more entities, so it is
reasonable to consider merging the entire eml-constraint module onto eml-entity.
However, doing this means that constraints that affect a table may be only
described in the description of a different table, which could definitely cause
some problems in locating the information. By maintining the independence of
the eml-constrain module, we create a single, identifiable location where both
participants in a constraint can be enumerated. This will be far easier for
applications to use to identify both sides of a constraint, at the cost of
having to specify both sides in the constraint description. Of course, this
does not apply to constraints that apply to only a single entity such as UNIQUE
constraints.

semantic metadata module/extensions


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 277, https://projects.ecoinformatics.org/ecoinfo/issues/277
Original Date: 2001-08-31
Original Assignee: Matt Jones


Need to extend EML, either by adding a new module or extending the current
entity/attribute system, so that semantic metadata can be accommodated.
Basically, this means being able to enter terms from an ontology (see bug 274)
so that a particular data table attribute can be tied into the ontology. See
the KDI proposal on canonical variables for more information.

need DTDs to correspond to XSD files


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 430, https://projects.ecoinformatics.org/ecoinfo/issues/430
Original Date: 2002-02-14
Original Assignee: Chad Berkley


The current set of DTD files checked into the eml module do not correspond in a
1:1 way with the XSD files. In particular, 1) parameter entities were resolved
(e.g., eml-dataset includes eml-resource) and should not be; and 2) multiple
global elements in the schema should be represented as possible root elements in
the DTD but in fact were eliminated. For example, in eml-entity, both
"table-entity" and "other-entity" should be root elements in the eml-entity.dtd,
but infact only "table-entity" is present because it caused some problems
withthe software we were using to parse DTDs. This needs to be fixed so that
all appropriate elements are available.

establish consistent namespaces for schemas


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 472, https://projects.ecoinformatics.org/ecoinfo/issues/472
Original Date: 2002-04-16
Original Assignee: Matt Jones


EML currently uses namespaces of the form "eml:modulename" for each of the eml
modules (e.g., eml:dataset). In contrast, we use version specific public
identifiers for the EML dtds (e.g.,
"-//ecoinformatics.org//eml-dataset-2.0.0beta6//EN"). The formal public
identifiers will need to be updated with each revision of the standard, but
benefit in that they are allow one to specifically state which version of the
module a document uses. This is important in systems where we need to be able
to reliably validate documents.

So, I think we need to change the public namespaces for eml to be versioned like
the public identifiers are. A format like this would do:
"eml:eml-dataset-2.0.0beta7"

Note that I specifically did not choose to use an http URI for this namespace
because of the intense controversy over resolvability of namespace URIs, and the
later development of specs like RDDL. The namespace spec explicitly states that
processors should NOT expect that a schema will reside at the namespace URI, nor
even that the namespace URI is resolvable as an address. Thus, the "eml" scheme
in the URI makes it clear that it is not a resolvable URL. We should rely on
schemaLocation, or handle it in each schema processor.

This will need to be changed throughout the schema docs.

Also, need to add documentation in the DTDs describing the proper public
identifier that should be used with the DTDs so that it is clear.

taxonomic coverage limited to one kingdom


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 265, https://projects.ecoinformatics.org/ecoinfo/issues/265
Original Date: 2001-08-15
Original Assignee: Matt Jones


Tim Bergsma reported this issue:

  1. Taxonomic coverage. I noticed that only a single instance of
    uppermost rank is allowed, whereas sub ranks may be repeated. This
    means that cross-kingdom studies cannot be properly documented, since
    there is no single rank (that I know of) which subsumes 'kingdom'.

revise duration elements in eml-access


Author Name: Dan Higgins (Dan Higgins)
Original Redmine Issue: 443, https://projects.ecoinformatics.org/ecoinfo/issues/443
Original Date: 2002-03-14
Original Assignee: Chad Berkley


eml-access has a duration element that is defined in terms of temporalCoverage.
This introduces a number of temporal concepts that are nonsensical when applied
to the duration of 'tickets' (e.g. geological time scales). It is suggested that
simple start and stop times/dates be used here rather than temporal coverage
concepts.

add additional entity types to EML


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 429, https://projects.ecoinformatics.org/ecoinfo/issues/429
Original Date: 2002-02-14
Original Assignee: Peter McCartney


The current eml-entity module describes two types of entities: table-entities
and other-entities. Ultimately I think we need to be able to describe several
other specific types of entities, particularly spatial images and various GIS
objects.

General image support may also be useful (e.g., for jpg, gif, etc) so that photo
quadrats and other types of images used as data and metadata can easily be
included. We may be able to easily accomodate many of these generic entity
types but utilizing a MIME-type label (e.g., image/gif) in the entityType field,
although there may also be need for additional metadata for these entity types.

revise attribute domain


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 266, https://projects.ecoinformatics.org/ecoinfo/issues/266
Original Date: 2001-08-15
Original Assignee: Matt Jones


attribute metadata describes the domain for the attributes using enumerated and
range domains, but does not currently allow for free-text domains. This could
be fixed using FGDC's unrepresentable domain.

Also, there has been a request to add 'paragraph' and 'citation' elements to the
'source' element to be more specific about the source for a domain.

eml-resource changes needed


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 480, https://projects.ecoinformatics.org/ecoinfo/issues/480
Original Date: 2002-05-01
Original Assignee: Matt Jones


Changes as decided upon at the Sevilleta EML meeting, April 24-25, 2002:
Responsible: Matt

  1. Change originator to "creator" +, remove role from RespParty but keep simple type
  2. Add "associatedParty" *, extends Resp Party with role
  3. move "pubPlace" to individual modules so it can be optional or required
  4. make sure pubDate can be a year only -- check Date type model
  5. add "language" element, use string content model (not enumerated domain)
  6. move "coverage" to resource, make "Coverage" ComplexType (repeatable, opt
    elements of each cov subtype)
  7. make "keywordThesaurus" not repeatable (0..1)
  8. rename "rights" to "intellectualRights"
  9. add "metadataProvider" * element with type of Resp Party
  10. move "distribution" * to resource
    |-- distribution*
    | |-- connection* -- connectionName
    | | |-- URL+
    | |-offlineResource*
    |-- contact* (type RespParty)
  11. Add "taxonomic" to enum list for KeywordType

need lineage and version metadata standard


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 144, https://projects.ecoinformatics.org/ecoinfo/issues/144
Original Date: 2000-09-22
Original Assignee: Matt Jones


We need a lineage and version control metadata standard that allows us to
specify precisely the versioning information among metadata files, data files,
and other objects in the system. This will likely be related to changes in the
current specification of eml-package, which involves showing links among
objects.

resolve packaging issues


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 269, https://projects.ecoinformatics.org/ecoinfo/issues/269
Original Date: 2001-08-31
Original Assignee: Matt Jones


There are some contentious issues surrounding the use of packaging (ie, the
triple element) in EML. Some would prefer inclusion via namespaces directly to
make the schema more explicit. But using triples to associate data and metadata
files is more flexible and allows new types of metadata to be added over time
without changes to the original structure.

One complaint is that the current structure requires multiple files to deliver
all of the metadata. One possible solution is to include an element 'metadata'
with content model 'ANY' as the root element, which can contain all of the other
modules, and they in turn can use namespaces to indicate how validation can be
performed.

revise eml-resource


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 229, https://projects.ecoinformatics.org/ecoinfo/issues/229
Original Date: 2001-04-12
Original Assignee: Matt Jones


Incorporate revisions to eml-resource.xsd that were suggested in the EML 2
workshop. Includes separating dataset, literature, and software out as separate
modules that extend ResourceBase, and adding triplets to ResourceBase.

entityType and format


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 450, https://projects.ecoinformatics.org/ecoinfo/issues/450
Original Date: 2002-03-27
Original Assignee: Chad Berkley


The eml-entity module has a field called "entityType" that is supposed to
contain the type of the entity for "other" entities. The eml-physical file has
a field called "format" that is supposed to contain the name of the data forat
for the physical file. We need to clarify the difference between these fields.

If one is using a mime-type to indicate the format (e.g., image/gif), where
should that go? My guess is eml-physical/format.

eml-physical changes needed


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 485, https://projects.ecoinformatics.org/ecoinfo/issues/485
Original Date: 2002-05-01
Original Assignee: Dan Higgins


Changes as decided upon at the Sevilleta EML meeting, April 24-25, 2002:
Responsible: Dan

  1. add version and citation of format definition
  2. add ability to describe BIP and BIL formats for binary raster data -- see the
    IPW header format for the info needed
  3. rearrange for better control of required elements when using fixed vs.
    variable formats. Do this by creating "fixed" and "delimited" elements with
    proper content models.
  4. add "objectName" element to contain the filename or other name of the
    physical object
  5. add field for pointer for which connection to use to get this physical object
    (using "objectName"). Question as to how the semantics of that combo work --
    how does one add an object name together with connection info for arbitrary
    connection types?

eml-party changes needed


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 481, https://projects.ecoinformatics.org/ecoinfo/issues/481
Original Date: 2002-05-01
Original Assignee: Chad Berkley


Changes as decided upon at the Sevilleta EML meeting, April 24-25, 2002:
Responsible: Chad

  1. Remove role from ResponsibleParty, but keep the simple type for use elsewhere
  2. Review and define the values in the role type -- we may have decided to
    eliminate these enumerated values and just use a free-text element when needed,
    but I can't tell from my notes.

bounding box vs point data in eml-coverage


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 339, https://projects.ecoinformatics.org/ecoinfo/issues/339
Original Date: 2001-11-29
Original Assignee: Matt Jones


The current eml-coverage requires a bounding box described by two points. Many
ecological data sets are collected at a site with a point location but no know
bounding box. How can we accomodate point coverage? Two possibilities: 1)
change the content model to make one of the points in the bounding box optional,
or 2) change the documentation to tell the user to fill in identical points in
both bounding box coordinates if it is a point.

mixed content not apprpriate in resource.xsd


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 42, https://projects.ecoinformatics.org/ecoinfo/issues/42
Original Date: 2000-07-26
Original Assignee: Chad Berkley


A content model of "mixed" was used for several of the complex types in the
resource.xsd XML Schema documents. In general, because mixed content models
cannot be validated, I think they should not be used. In all of the cases here
the model should be changed to "elementOnly".

referencing complex types is not done consistently


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 477, https://projects.ecoinformatics.org/ecoinfo/issues/477
Original Date: 2002-04-20
Original Assignee: Matt Jones


In all modules we import other schemas and use components. Sometimes we define
elements that use a ComplexType, other times we import and use an element that
uses the complex type. We need to go through the modules systematically and
make sure that all inter-namespace references are done in the same way. This
most often involves the ways we use ResponsibleParty, the various coverage
types, and citations, but there are several others too. Check them all and fix
systematically.

eml-attribute changes needed


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 484, https://projects.ecoinformatics.org/ecoinfo/issues/484
Original Date: 2002-05-01
Original Assignee: Chad Berkley


Changes as decided upon at the Sevilleta EML meeting, April 24-25, 2002:
Responsible: Chad, Dan, David

  1. rename dataType to storageType -- minimally suggest use fo XML Schema DT as
    base for storageType, add attribute "typeSystem" for referencing the system used
  2. add "unitSystem" attribute to w/ default of STMML, make "unit" required with
    default of "dimensionless"
  3. add measurementScale element for documenting scale (ordinal, ratio, interval,
    nominal). We discussed whether this was implied by dataType/unit, but decided to
    add it, even though it probably is somehow implied by the dimension of the
    measurement
  4. add "accuracy" element, use FGDC "dataQuality" model for it
  5. how do we express precision & accuracy -- need to be explicit in our field
    documentation
  6. generally resolve the storageType/dimension/unit/measurementScale morass
  7. add explanation/reason field to the "missingValueCode" so that people can
    explain what "-9999" means in their data set
  8. move "textDomain" and "enumeratedDomain" up so that they are siblings of
    numeric domain, remove the choice
  9. add "enforced" attribute to enumeratedDomain element with allowable values
    "yes", "no", defaults to "yes"
  10. enumeratedDomain: need externally referenced codeset, reference to dataTable
    entity that contains codes (2 columns). below is a content model from my notes
    that doesn't make a lot of sense to me right now. Corinna says that the sequence
    after enumeratedDomain should be optional,repeatable
    enumeratedDomain
    |- list
    | |- code
    | |- def
    | |- source
    |
    |- entity
    | |- entityID
    | |- codeColumn
    | |- codeDefinitionColumn
    |
    |- externalSource

storage type issues


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 337, https://projects.ecoinformatics.org/ecoinfo/issues/337
Original Date: 2001-11-29
Original Assignee: Chad Berkley


KNB scientists wanted to classify storage type for attributes as "nominal",
"ordinal", "interval", rather than using the physical storage types we had
considered (e.g., test, integer, floating point). Need to clarify what the
contents of this field should be and possibly define a domain for the value-space.

revisions for distribution metadata


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 141, https://projects.ecoinformatics.org/ecoinfo/issues/141
Original Date: 2000-09-22
Original Assignee: Chad Berkley


Need revisions to the current eml-package and a possible new "eml-distribution"
metadata module that allows specification of the metadata about distributing
datasets. The distribution info can include both online information (a la
eml-package), offline information (e.g., addresses), contact information,
licensing, use constraints, copyright, and other information about distribution.

EMLParser is slow to process large EML documents

The org.ecoinformatics.eml.EMLParser does not perform well when processing large EML documents (for instance, a document with 250 to 1000 attribute fully fleshed out elements defined). It can take 10, 30, 45 or more minutes to validate a document -- the duration scales with document size.

To try to alleviate this, change the parser to use a SAX-based model rather than a DOM.

org.ecoinformatics.eml.EMLParser uses two methods to validate a document: parseKeys() and parseKeyrefs(), both of which call getPathContent() and pass in an XPath selector. getPathContent() creates a DOM and passes back an org.w3.dom.NodeList.

See the attached file as an example.

eml250.xml.txt

eml-entity changes needed


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 483, https://projects.ecoinformatics.org/ecoinfo/issues/483
Original Date: 2002-05-01
Original Assignee: Chad Berkley


Changes as decided upon at the Sevilleta EML meeting, April 24-25, 2002:
Responsible: Chad, Peter

  1. name all entity types using camel caps (e.g., dataTable)
  2. otherEntity stays in eml-entity module, others separate modules
  3. entity types are: otherEntity, dataTable, dataView, StoredProcedure,
    spatialImage, spatialGrid, spatialVector
  4. expand names to use long names for all elements, including spatial types
  5. add "spatialReference" module -- not sure how this will be tied to the other
    spatial types
  6. move "spatialRepresentationType" to the spatialRepresentation ComplexType
  7. Possibly put connection info into the entity description, although I think we
    decided against this by the end (putting it in resource instead)

Peter's notes include a mention of "change logs" which I don't understand. Peter?

Eml documentation for Seminars & LTER sites


Author Name: David Blankman (David Blankman)
Original Redmine Issue: 365, https://projects.ecoinformatics.org/ecoinfo/issues/365
Original Date: 2001-12-03
Original Assignee: David Blankman


This will be an overview of EML and its relationship to Morpho. It will also be practical guide for
users to understand how to take conventionally reported metadata, such as text documents or
other legacy systems, and manually enter it into Morpho. (This is not to be confused with
automated conversions of metadata into EML).

eml-constraint overlaps with packaging concepts


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 428, https://projects.ecoinformatics.org/ecoinfo/issues/428
Original Date: 2002-02-14
Original Assignee: Matt Jones


The current incarnation of eml-constraint allows the enumeration and definition
of integrity constraints that apply to entities. These are currently drawn from
the relational model, including UNIQUE, PRIMARY KEY, FOREIGN KEY, and CHECK
constraints. It may also be extended to include other types of relationships
between entities that are not part of the relational model.

The "triple" element allows us to create arbitrary relationships between
identifiable objects in EML, and is used for associating data with metadata, and
groups of metadata and data objects together as a "package". This usage is very
similar to the relational model, in that it allows us to define 3-valued tuples
in a graph structure. Constraints between entities could conceivable be modeled
using this infrastructure, probably with some modifications to the concept of a
"relationship".

So, the question arises. Should we try to develop a unified approach to the
specification of constraints and the specification of packages? It might be
more elegant, but possibly at the cost of simplicity and ease-of-use. My gut
feeling is that this is not something we whould pursue, but would like to hear
other people's reasons for or against it.

eml-entity needs header boolean


Author Name: Chad Berkley (Chad Berkley)
Original Redmine Issue: 440, https://projects.ecoinformatics.org/ecoinfo/issues/440
Original Date: 2002-03-05
Original Assignee: Matt Jones


eml-entity needs a field to indicate whether the first line in the entity is a header row. currently, we have no way to know which row to start a process on. morpho collects this metadata but has no place to store it.

need revisions to eml-file and eml-variable


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 142, https://projects.ecoinformatics.org/ecoinfo/issues/142
Original Date: 2000-09-22
Original Assignee: Chad Berkley


Need to revise eml-file and eml-variable to more fully support the description
of the structure and content of ASCII data files. Need to be able to specify
relational constraints among various data entities, possibly using a new
module. Need to be able to specify data formats for binary formats on a
per-data-entity basis.

decompose eml identifiers into familyid and revision


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 335, https://projects.ecoinformatics.org/ecoinfo/issues/335
Original Date: 2001-11-29
Original Assignee: Matt Jones


Current eml identifiers are a string that symbolizes a unique revision of an
object (e.g., jones.14.1). The same identifer should always be associated with
the same stream of bytes (ie, checksums would match).

Suggestion that eml identifiers should be decomposed into two parts. The first
part is a "family" id (string) that represents a group of related objects. The
second is a revision # (integer) that indicates the revision number of one of
the objects in the family. The combination of the familyid and revisionnum
would always be unique, and would be usable as an accession number. In XML,
this could look something like:

jones.43 13

Questions remain.

  1. Would revision be required in eml, or optional?
    If optional, then EML would allow description of objects that are not unique.
    Is this a good thing that we want to encourage/allow as a community?
  2. For citation in print publications or other non-xml environments, how would
    one refer to the combination of familyid and revisionid?
    Previously we were able to use the whole string -- how do we combine the parts
    together now? Can we still concatenate them with a separator character?

eml spec overview document


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 471, https://projects.ecoinformatics.org/ecoinfo/issues/471
Original Date: 2002-04-15
Original Assignee: Chris Jones


Need an overview document that gives the background and rationale for eml. This
would likely have both normative and non-normative sections. Would include an
overview of the structure of EML and the rationale for that structure, and its
intended use. Descriptes packaging and triples in detail. Probably would have
a normative appendix that defines the semantics of every field, which could be
auto-generated from the XSD source documentation.

Chris -- you started an outline for this. Can you recreate it here or in a
document in CVS?

We should have this for the 2.0.0 release but will not likely have it for the
beta7 release.

Revise eml-access module


Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 140, https://projects.ecoinformatics.org/ecoinfo/issues/140
Original Date: 2000-09-22
Original Assignee: Chris Jones


Need revisions to eml-access to allow XML-based communication between the
dmanclient and metacat servlet. Probably need to move all of the "distribution"
related fields in the old eml-access to a separate "distribution" module.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.