Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing nomenclatural code #815

Open
mdoering opened this issue Nov 14, 2024 · 14 comments
Open

Add missing nomenclatural code #815

mdoering opened this issue Nov 14, 2024 · 14 comments
Assignees
Labels
feedback User feedback

Comments

@mdoering
Copy link
Member

Some sources/sectors are missing a nomenclatural code which makes it difficult for rendering, but more importantly for querying for bad homonyms like accepted duplicate genera within the same code:

https://www.checklistbank.org/dataset/3/duplicates?category=uninomial&codeDifferent=false&limit=500&minSize=2&mode=STRICT&offset=0&rank=genus&sourceDatasetKey=1141&status=accepted

There are 42 sources with missing nomenclatural code:

clb=> select s.subject_dataset_key, count(*) as cnt from name n join sector s on s.dataset_key=n.dataset_key and s.id=n.sector_key where n.dataset_key=3 and n.code is null group by 1 order by 1;
 subject_dataset_key |  cnt  
---------------------+-------
                1010 |  5759
                1011 |  2607
                1026 |  1065
                1031 |   307
                1033 |    28
                1037 |   323
                1042 |   502
                1046 |   118
                1052 |   377
                1054 |   286
                1055 |  1315
                1068 |  2597
                1070 |    86
                1078 |    34
                1080 |    12
                1082 |    76
                1104 |     4
                1113 |  1262
                1118 |     8
                1119 |    23
                1120 |   491
                1138 |   133
                1139 |    18
                1142 |   160
                1143 |    66
                1144 |   305
                1146 |     1
                1161 |   303
                1171 |  1532
                1172 |    33
                1190 |   562
                1201 |    21
                2007 |   673
                2073 |   230
                2141 |   284
               54170 |     1
              124661 |  8184
              125101 |     6
              268676 | 24620
              279229 |     8
              296427 |    33
              298081 |  4016
(42 rows)

With the following 136 sectors:

clb=> select s.subject_dataset_key, s.id, count(*) as cnt from name n join sector s on s.dataset_key=n.dataset_key and s.id=n.sector_key where n.dataset_key=3 and n.code is null group by 1,2 order by 1,2;
 subject_dataset_key |  id  |  cnt  
---------------------+------+-------
                1010 | 1543 |   272
                1010 | 1544 |    11
                1010 | 1545 |     9
                1010 | 1595 |    14
                1010 | 2013 |  5443
                1010 | 2014 |     3
                1010 | 2015 |     3
                1010 | 2016 |     2
                1010 | 2017 |     2
                1011 | 2139 |  2607
                1026 | 2106 |     6
                1026 | 2107 |     4
                1026 | 2108 |     5
                1026 | 2109 |     4
                1026 | 2110 |   160
                1026 | 2111 |     5
                1026 | 2112 |     2
                1026 | 2113 |   387
                1026 | 2114 |    69
                1026 | 2115 |     6
                1026 | 2116 |    11
                1026 | 2117 |    11
                1026 | 2118 |    13
                1026 | 2119 |    77
                1026 | 2120 |     3
                1026 | 2121 |    21
                1026 | 2122 |     2
                1026 | 2123 |     2
                1026 | 2124 |   273
                1026 | 2125 |     4
                1031 | 1761 |   307
                1033 |  584 |     1
                1033 |  613 |    27
                1037 |   56 |    52
                1037 |  295 |   246
                1037 |  296 |    18
                1037 |  297 |     1
                1037 |  298 |     1
                1037 |  299 |     1
                1037 |  300 |     1
                1037 |  301 |     1
                1037 |  302 |     1
                1037 |  303 |     1
                1042 |  260 |   502
                1046 | 1745 |    86
                1046 | 1746 |    32
                1052 | 2137 |   377
                1054 | 2092 |   286
                1055 |  255 |  1103
                1055 |  256 |    61
                1055 |  257 |   123
                1055 |  292 |    26
                1055 |  293 |     2
                1068 |  291 |  2597
                1070 |  730 |    86
                1078 |  736 |    34
                1080 |  733 |    12
                1082 |  735 |    76
                1104 |  521 |     4
                1113 |  472 |  1262
                1118 |  280 |     8
                1119 |  471 |    23
                1120 |  646 |   491
                1138 | 1979 |     8
                1138 | 1980 |     2
                1138 | 1981 |     2
                1138 | 1982 |    73
                1138 | 1983 |     5
                1138 | 1984 |     3
                1138 | 1985 |    12
                1138 | 1986 |     2
                1138 | 1987 |     2
                1138 | 1988 |     2
                1138 | 1989 |     2
                1138 | 1990 |     8
                1138 | 1991 |     3
                1138 | 1992 |     2
                1138 | 1993 |     2
                1138 | 1994 |     2
                1138 | 1995 |     3
                1139 |  738 |    18
                1142 | 2127 |   160
                1143 | 2126 |    66
                1144 | 2089 |   305
                1146 | 1778 |     1
                1161 | 1367 |   303
                1171 | 2091 |  1532
                1172 | 1753 |    33
                1190 |  532 |   562
                1201 |  619 |    21
                2007 |   25 |     2
                2007 |   59 |     3
                2007 |   60 |     5
                2007 |   61 |     2
                2007 |   62 |     8
                2007 |   63 |     4
                2007 |   64 |     2
                2007 |   73 |     2
                2007 |  443 |     2
                2007 |  444 |     2
                2007 |  445 |   292
                2007 |  446 |    21
                2007 |  448 |     6
                2007 |  449 |    16
                2007 |  450 |     4
                2007 |  451 |    88
                2007 |  452 |    60
                2007 |  453 |    31
                2007 |  454 |     8
                2007 |  455 |    12
                2007 |  460 |     9
                2007 |  461 |     8
                2007 |  462 |     5
                2007 |  463 |     2
                2007 |  576 |    36
                2007 |  577 |     1
                2007 |  578 |     2
                2007 |  579 |     2
                2007 |  580 |    38
                2073 |  669 |    30
                2073 |  670 |   182
                2073 |  671 |    18
                2141 |   55 |   284
               54170 | 1842 |     1
              124661 | 2164 |  8182
              124661 | 2167 |     2
              125101 | 1808 |     6
              268676 | 2007 |   221
              268676 | 2008 | 16733
              268676 | 2159 |  7666
              279229 | 2018 |     2
              279229 | 2019 |     2
              279229 | 2020 |     2
              279229 | 2021 |     2
              296427 | 2248 |    33
              298081 | 2274 |  4016
(136 rows)

As all names from a given sector usually are following the same code, the sector code setting can be used to specify a default code which will be applied during a sync.

@mdoering mdoering added the feedback User feedback label Nov 14, 2024
@mdoering
Copy link
Member Author

@yroskov if you could give me a list of sector keys per code I could update their settings in the database directly to save manual config work

@yroskov
Copy link

yroskov commented Nov 14, 2024

@mdoering, unfortunately, I cannot recognize neither GSD names nor sector Latin names. I can't do anything about it. Could you please give me a list of GSDs & sectors with missing Codes? Direct links to appropriate CLB locations would be even better.

Most importantly, how do these missing nomenclatural code cases reappear? As far as I remember, we applied Codes to all spotted cases a year ago. How do we have "non-Coded" sectors again? Where is the problem and how to avoid it?

@mdoering
Copy link
Member Author

you would need to lookup the dataset or sector by its key.
For example the list of 136 sectors take the middle id column and use it in this link instead of the 1543:

https://www.checklistbank.org/catalogue/3/sector/sync?sectorKey=1543

@mdoering
Copy link
Member Author

Are you sure you did change all sectors before? I don't think sector settings have changed, they simply were never there I bet.

@yroskov
Copy link

yroskov commented Nov 15, 2024

I am sure, we fixed Code assignment for all GSDs (well, not for sectors).

@yroskov
Copy link

yroskov commented Nov 15, 2024

Code application in the CoL should depend on the kingdom:

kingdom Animalia = ICZN
kingdom Archaea = ICNP
kingdom Bacteria = ICNP
kingdom Chromista = ICN
kingdom Fungi = ICN
kingdom Plantae = ICN
kingdom Protozoa = ICZN
Viruses = ICVCN

If you implement this feature in the CLB, we'll have no problem with Code assignment in the present and in the future, I guess.

@mdoering
Copy link
Member Author

Chromista and Protozoa is probably not that simple. We have over 12k zoological Chromista names already and 4,3k botanical Protozoan algaes.

@mdoering
Copy link
Member Author

I am sure, we fixed Code assignment for all GSDs (well, not for sectors).

GSDs might be, but then you clearly miss out the larger mixed ones like ITIS. And it requires a new import after that setting has been applied, not sure if that was done for all?

@mdoering
Copy link
Member Author

maybe we can split the task and I will try to solve all sectors with clear code from kingdom assignment and you can focus on the Chromista and Protozoa sectors?

@yroskov
Copy link

yroskov commented Nov 15, 2024

My point is simple, it doesn't matter in which Code chromistean and protozoan taxa were originally described. Once they are placed in CoL under the kingdom Chromista they are regulated by ICN, and if taxa placed in the kingdom Protozoa, they are governed by ICZN.

Would you agree, there is a logic behind such simplification? This pragmatic approach will make your and my tasks much easier, and also will not confuse CoL users.

@mdoering
Copy link
Member Author

mdoering commented Nov 15, 2024

If that reflects reality, yes. It would be a lot simpler. When is a name published according to a specific code? Can all names be governed by any code or do they have to be explicitly published with one code in mind? What about the authorship style? I am still not entirely sure how to best deal with ambiregnal names.

@DaveNicolson
Copy link

Not to further add to the complexity, but the bacterial group cyanobacteria (which we are adding as GSD in ITIS this month) follows the ICN ('botanical' Code) rather than the ICNP, despite its placement within Bacteria. For some time it was treated under both Codes re rules for availability, effective/valid publication, and so on. Hopefully those days are over, as it made everything much messier & harder to deal with, which led us to add this comment to the group a decade ago:

Cyanobacteria are now considered to be bacteria, but their nomenclature has traditionally been treated under the 'botanical' Code of nomenclature, rather than under the separate bacterial Code (see Preamble item 8 of http://www.iapt-taxon.org/nomen/main.php?page=pre&emph=cyanobacteria). However, there are also indications that recently the 'bacteriological' Code has begun to ALSO cover the cyanobacteria (see 'General Consideration 5' at http://www.bacterio.net/-code.html ). In any case, there are only a handful of cyanobacterial names that have any formal standing under the 'bacteriological' Code. Further work will be required before ITIS can consider this group complete

@yroskov
Copy link

yroskov commented Nov 15, 2024

Our task is to ensure the integrity of the final product. If you look through different journals, Floras/Faunas, monographs/books, you'll find that the style of presentation of data (including scientific names, synonymy, nomenclatural comments & citations) may vary and and be defined by the editorial rules. Thus, the provisions of the Code are always interpreted by editorial practice.

If CoL decides that presentation style of scientific names is actual Kingdom=Code related, then so be it.

Can all names be governed by any code or do they have to be explicitly published with one code in mind?

At the time of the nomenclatural act, the name might be regulated by one Code, and later by another Code, after their actual placement in the classification.

What about the authorship style?

I'd like to see CoL embrace innovation and add the year in the authorstrings of botanical names: Amoria hybrida (L., 1753) C.Presl, 1831.
Paul Kirk introduced this style in the Species Fungorum many years ago: http://www.catalogueoflife.org/annual-checklist/2019/search/all/key/boletus/fossil/1/match/1.
Technically, you can do this now for most (if not all) botanical names in the CoL. We and users will get a lot of benefits from management and usage of enhanced names (priory, homonymy, etc.).

It would be more difficult to convince zoologists to add the author of the subsequent combination in zoological names. Millions of combinations in Zoology do not keep this important information. However, we have such inspiring example from Erik van Nieukerken http://www.catalogueoflife.org/annual-checklist/2019/browse/classification/kingdom/Animalia/phylum/Arthropoda/class/Insecta/order/Lepidoptera/superfamily/Nepticuloidea/fossil/1/match/1

@mdoering
Copy link
Member Author

mdoering commented Nov 18, 2024

I don't think COL should apply the simple approach code by kingdom.
Instead we should follow what the specialists in those groups do and what the codes tell us.
To me that clearly means mixing the codes for the names under Protozoa and Chromista.

Citing from ICZN:

The ICZN only applies to animal names, and not to names of plants, fungi, bacteria or viruses, which are covered by separate codes of nomenclature.

Animals include metazoans and protists that have been historically considered in the Kingdom Animalia (i.e., protists that do not primarily use photosynthesis as an energy source, if so they are generally considered to be plants and will fall under the botanical Code, ICN). Although higher-level classifications have changed with modern research, which Code (Zoological, Botanical or Bacterial) covers a particular taxon generally remains constant, as it is agreed that the ultimate goal of nomenclatural rules is to maintain stability in names and not to reflect perspectives on phylogenetic relationships. Thus, for example, fungi remain covered by the Botanical Code, and there is little interest in changing this, despite modern consensus that they have a sister relationship with animals and not plants. Protists that have characteristics of both animals and plants are considered ‘ambiregnal’ and are treated following the rules of both the ICZN and ICN.

Citing ICN

The provisions of this Code apply to all organisms traditionally treated as algae, fungi, or plants, whether fossil or non-fossil, including blue-green algae (Cyanobacteria)3, chytrids, oomycetes, slime moulds, and photosynthetic protists with their taxonomically related non-photosynthetic groups (but excluding Microsporidia).

Other resources:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feedback User feedback
Projects
None yet
Development

No branches or pull requests

3 participants