The long-term goal of this project is to create and maintain a FishBase-like information system for all non-fish marine organisms, ca. 400,000 spp. Of these, marine organisms (about 240,000 spp) are the target of the current project phase. It will not provide yet another authority list of species, but rather, for each species included, make available the biological and ecological information necessary to conduct biodiversity and ecosystem studies, taking advantage of lists of species already available on paper and electronically, and using the scientific names as ‘hook’ to organize biodiversity information.
Since the number of species is huge, SeaLifeBase has made a list of priorities in its encoding strategy with short-term goals being set on an annual basis. Working on one or two island ecosystems at a time, the project gears closer toward its goal to assign species to large marine ecosystems (66 ecosystems worldwide).
Our short-term objectives this 2015 are:
SeaLifeBase has established links with relevant data providers (as a policy, the websites linked are preferably those that mention their sources):
LAMP: Linux, Apache, MySQL, PHP (resp. operating system, web server, DBMS, programming language).
The FishBase database and website structures were used as a shell. Graphical charts were progressively modified and fields were adapted in tables where some aspects of the taxonomic groups required changes.
The FishBase IT Team is consulted for suggested and new developments in SeaLifeBase. Changes in SeaLifeBase as approved by the FishBase IT Team are adapted in FishBase and all changes in FishBase are concurrently adapted in SeaLifeBase.
A classification to the class level, and when available to the order level, is the taxonomic backbone, and is primarily based on the Catalogue of Life/Species 2000 higher hierarchy, and then follows Tree of Life for groups not yet in the a classification. A classification to the order level (when no stable phylogeny exists) follows primarily the Catalogue of Life/Species 2000 higher hierarchy, then ITIS, then dedicated published classification for groups not yet in these databases.
Subspecies is not taken into account, but mentioned in a comment field.
Search for information is conducted in the databases with relevant information: Zoological Record, ASFA, CISTI, FishLit, etc. and the priorities for data encoding is informed by the short- and long term objectives for SeaLifeBase.
Search on the web with a guideline strategy, starting from some well-known biodiversity portals (CBD, GBIF, Diversitas, UNEP, WWF, some dedicated websites of universities, museum, and research institutions).
Development of a web crawler to check the updates in important websites deep-linked.
All information at species level is hooked to scientific names. It is crucial to have the list of species very quickly at the beginning of the project, if possible from electronic lists.
It is also crucial to identify taxonomic references to validate the choice of the current accepted names, and to link the names to these references when available on the web.
The lists are extracted in the following decreasing order:
Unless synonyms are under electronic format, they are not entered as a priority; the only synonyms encoded are those that are used in sources of other information.
The source is always recorded, as well as the type to allow the user to assess the reliability of the name.
Common NamesThe common names in English and other languages are entered only when available in electronic format or from compilations. However, some groups are prioritized when common names are well known such as, e.g., in marine mammals.
Some common names were already entered in Species 2000 by a FishBase team member.
The lists are extracted in the following decreasing priority:
In addition to country, the state/provincial levels are considered. A geographic standard was established for the marine areas on the same model as the TDWG geographic standard for the terrestrial areas.
Distribution by country and subdivisions: from printed compilations and monographs (FAO publications first), from published distribution maps, from country lists, in that order.
Distribution by FAO areas: from FAO publications, from published distribution maps, from distribution by country above, in that order.
Distribution by ecosystem: from distribution by FAO area and country above and from printed compilations. Distribution by Large Marine Ecosystems will be by oceans and then by principal seas.
Distribution by depth: from printed compilations and monographs (FAO publications first).
Maximal SizeThis data is a crucial key point for biodiversity and ecosystem studies, but rarely available from electronic sources.
This information is extracted on opportunistic basis mainly from FAO publications and printed monographs. Targeted species searches were performed for important species, e.g., threatened, invasive and commercially important species.
Conservation StatusIUCN and CITES database are explored and linked for the threatened and commercial status of species, respectively.
HabitatsThis information is rarely available from electronic sources. Moreover, various standards are used, and may depend on the taxonomic group. The FishBase standard is used after it is reassessed and completed for invertebrates.
Targeted and Prioritized Taxonomic GroupsThe phylum- to class-group levels are classified as small, medium or large groups.
This encoding strategy has changed since the completion of all the small and many of the medium groups in 2007. Each encoder’s weekly programme now consists of encoding data for 50% of the remaining groups and encoding life history parameters for targeted species groups. In addition, the 5 remaining encoders are each in charge of special topics, viz.: faunal lists, life-history parameters, ecological parameters, pictures and targeted reference-searches.
Note that the rapid completion and availability of results/data on small and medium groups were psychologically important to the encoder and the donor alike as they measure and assess the achievements of the project. Moving forward from encoding scientific names to encoding, e.g., life-history parameters, gave the encoders a sense of accomplishment in spite of the huge tasks still ahead. Our short-term doable targets and their completion provide us with milestones with which we measure our accomplishments. This strategy has so far been proven useful.
Data Encoding Progress IndicatorsSome indicators of data encoding progress were established at the beginning of the project reflecting the completion of data encoding by taxonomic group, and the advancement of the project relatively to the expected number of species.
It is important to consider these indicators at various taxonomic levels from phylum to species as we can show rapid completion at phylum to family levels, whereas genus and species level are a long-term goal.