Giter VIP home page Giter VIP logo

Comments (4)

rogerburks avatar rogerburks commented on July 20, 2024

I have a vision of how a one-stop-shop for taxon name operations. It builds from how nomenclatural systems have a number of standard properties built into them. Family group names end in a standardized way. Genus group names must be capitalized. Specific epithets are not capitalized. Additionally, nomenclature has a number of rules that are meant to make name formation objective. We always want to use these properties when they can help us. I have a rule for this:

If something should be done the same way every time, a computer should be doing it.

something follows from this:

If we are telling the computer which objectively determined choice we are making, we are doing it wrong. The computer should automatically move forward with the objectively determined choice.

This is because computers carry out operations in a far more consistent way than people do. People treat objective decisions in about the same way that they treat subjective decisions, which can lead to errors and bad decisions.

Additionally, we can suggest that data entry has some hoped-for properties: efficiency, accuracy, and completeness. My suggestion aims to improve efficiency and accuracy, and arguably completeness.

  • The task should open with a single autocomplete field. You start typing the name into the field. If it autocompletes to the correct choice, you move forward with an edit of it. If it cannot autocomplete, you enter a new name.

  • Since names are standardized, the task can attempt to determine information about it. Let us say that it ends in -inae, such as Cleonyminae. The task suggests a few conclusions based on this:

    • This is a name governed by the ICZN Code
    • This is a family-group name
    • This is a subfamily name
    • Its genitive stem is the part excluding the -inae suffix
  • These suggested conclusions should have an option to be overridden, but they would usually be true and they save time if they are true.

  • If the name has been found, the option is to edit.

    • The task should automatically place at the top of suggestions two options: elevating one rank, or lowering one rank within the rank group. Let us say that we are elevating its rank. The suggested rank is family, and the spelling Cleonymidae is automatically suggested. Given that this name has appeared in the database before, and that the current parent Pteromalidae is still a family, the task automatically suggests that the name Cleonymidae no longer has the parent taxon Pteromalidae, and now instead has the next-highest parent not of family rank: Chalcidoidea. This is because a valid family cannot have another family as parent. It is always possible that someone is doing things in the wrong order of course, and so this can be overridden. What cannot be overridden is the current valid spelling. The family name is automatically Cleonymidae because this is determined objectively through the ICZN Code. No possibility of typing it wrong: the computer handles it.
    • The task then suggests that all tribes of Cleonyminae be elevated to subfamily rank. These tribes are all listed with options. We override this for all except one: Cleonymini. The task changes nothing at that point, since Cleonymini has no subtribes, and because both Cleonyminae and Cleonymini are Coordinated names.
    • At this stage my concept of highest_valid_rank of Cleonym- is used, and it changes from ICZN subfamily to ICZN family. Because of this, Cleonym- is automatically valid at all family group ranks below highest_valid_rank. Cleonym- remains not valid at superfamily rank, since Chalcidoidea is its parent, and since highest_valid_rank=family already excludes that possibility.
    • Cleonym- is also automatically used at its highest_valid_rank, since it makes no sense to elevate a name to a rank and then deprecate its use at that rank. The user of the task does not need to tell the task things like this. It should be automatic. If we see something like this listed wrongly, it is a sign that we entered something incorrectly, which is a wonderful thing to have visible in a task. The logic should also be sequential, so that the origin of an incorrect entry can quickly be found.

Let us say instead that we are decreasing the rank of a family but keeping it as a subfamily. The task knows that a taxon cannot be the same rank as its parent. Therefore, it automatically suggests that any included subfamilies be treated as tribes. This can be overridden, but when true it saves time.

Now I can suggest some features of the model that should help things tremendously:

  • I suggest that the family-group stem is an important part of the model of a family-group name. It can have a property highest_valid_rank that controls some aspects of how it is displayed. The child taxa of a family-group name need only be children of the stem. They do not need to be children of the rank, because this is automatic in nearly all cases (except incertae sedis, discussed below). The children of Cleonym- are also children of whatever coordinate forms it has. Children of other tribes have another family group stem as the parent, and that family group stem has a highest_valid_rank that is lower than family and has another family-group name within Cleonymidae, such as Cleonym- itself or another stem with priority, as its parent.

In the case of incertae sedis this could be overridden, but this also gives an obvious way to check that incertae sedis is properly applied. Any subfamilies of Cleonym- are also valid names, but their highest_valid_rank is subfamily. There are no issues with included genera, because generic names cannot elevate into the family group of ranks. If Cleonyminae had a number of subfamilies and no used tribes, genera would have the stem of their subfamily as the parent. If any of these family-group names had a number of tribes, genera would have the stem of their tribe as parent. This constructs a resilient system that makes it easy to perform housekeeping actions with them later. If we designate that there are multiple valid subfamilies in Cleonymidae, then Cleonym- is automatically switched to having used_as_subfamily=true. Otherwise, if Cleonymidae has only one valid subfamily (Cleonyminae) then the stem automatically is switched to having the property used_as_subfamily=false.

  • For subgenera and subspecies, the highest_valid_rank can be used to disambiguate their rank. The nominotypical subgenus has a highest_valid_rank of genus, but is automatically valid at all genus group ranks. The parent of species will be the generic name, and it does not break, no matter if we use the subgenus coordinated name or not. If the subgenus coordinated form is not used (again this is automatic, we don't use subgenera when there is nothing but the nominotypical subgenus), then the italicized name does not display it. Any other subgenera have a highest_valid_rank of subgenus. This should also make housekeeping much easier when genera and subgenera change rank. There should be no issues, since the italicized name is formed through an objective process, and again takes into account the property used_as_subgenus or used_as_subspecies when determining if a subgenus or subspecies name is displayed.

*Also, any name that is not valid is not used. Any name that is used is valid. However, some valid coordinated names are not used. This can probably be made automatic. If the parent has no other valid children of that rank, and the parent has the same stem (for family group names) or name (for genus group and species group names) as the child taxon, then the child taxon is not used, although in reality coordinate forms need not be actually separate names, and should not be. They simply have a different (automatically determined), suffix in the case of family group names, or a different (automatically determined) position in italicized names. There is no need to create a separate name Cleonymini to hold genera. The tribe Cleonymini will never be in a separate group from its other coordinate forms, and therefore does not need the properties of a separate name. Cleonym- therefore serves for all its coordinates, taking on different suffixes when used at different ranks. Likewise, genus group names and species group names should not be different from their parent nominotypical form. The genus Pteromalus will never be in a separate group from the subgenus Pteromalus, and therefore the subgenus does not need the full suite of name properties, and the name of its parent can be duplicated in italicized names when needed.

**If one cites the parent, what happens? If a family group name is the parent and is used at a tribe level, the parent is the tribe. If it is not used at any level other than family, then the parent is the family. What about Chalcidoidea? In this case, Chalcidoidea can be a parent of families normally, and this is fine as long as family is their highest_valid_rank. If Chalcidoidea is parent of an incertae sedis subfamily, then only the superfamily rank form Chalcidoidea is allowed to be the parent, the highest_valid_rank. Likewise, if Pteromalidae is the parent of an incertae sedis genus, then Pteromalidae is the parent, not Pteromalini. I suppose it is possible that one would want to have an incertae sedis genus that has Pteromalinae (the subfamily) as its parent, and not Pteromalidae--in that case it is probably a good idea to be able to specify "Pteromalinae (subfamily rank form of Pteromal-)" as the parent only for incertae sedis taxa--which also are not allowed to have the next higher rank as parents. Therefore, an incertae sedis genus cannot have "Pteromalini (tribe rank form of Pteromal-)" as its parent, or it would not logically be incertae sedis. It makes no sense to provide this special option for taxa that are not incertae sedis. I also do not want to expand this special option to function for everything--the system mostly runs itself when incertae sedis does not come into play, and I would prefer not to lose that convenience.

In these senses, this procedure helps indicate what family-group names actually are. Chalcidoidea represents "Chalcid- at superfamily rank". Chalcidini represents "Chalcid- at tribe rank", etc. Fitting logically in how priority works, Chalcidini does not separately compete with other tribes for priority, and Chalcidinae does not independently compete with subfamilies for priority, and so forth. The family group stem Chalcid- competes for priority exactly one time with all included family group names. This is an additional reason why its coordinate family-group forms should not be separate taxon names.

What if the name is invalid? highest_valid_rank is null.

These should construct a completely solid and logically consistent system that will make nomenclatural housekeeping relatively easy.

It also can handle scenarios such as "we started using subtribes", or "we stopped using subtribes", or "we started using infratribes" and so forth.

from taxonworks.

JimWoolley avatar JimWoolley commented on July 20, 2024

from taxonworks.

rogerburks avatar rogerburks commented on July 20, 2024

At least from my perspective I think it does help. Since all coordinate forms have the same type (genus in the case of family-group names), they would all be automatically linked. Location of non-types could be done using the table that you suggest, although it would also likely be rapid enough through assigning parent family-group stems as well—autocompletes work with taxon ID numbers by the way, which is probably the fastest and most failsafe way to use them.

Additionally, the coordinate forms are treated as valid (whenever they are actually valid at a given rank). This should result in proper behavior of the names within the model.

from taxonworks.

mjy avatar mjy commented on July 20, 2024

Duplicated as #3584.

from taxonworks.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.