--- title: "Comparison of CoordinateCleaner to other tools" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Comparison of CoordinateCleaner to other tools} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ##Background Erroneous database entries and problematic geographic coordinates are a central issue in biogeography and there is a set of tools available to address different dimensions of the problem. CoordinateCleaner focuses on the fast and reproducible flagging of large amounts of records, and additional functions to detect dataset-level and fossil-specific biases. In the R-environment the [scrubr](https://github.com/ropensci-archive/scrubr) and [biogeo](https://CRAN.R-project.org/package=biogeo) offer cleaning approaches complementary to CoordinateCleaner. The scrubr package combines basic geographic cleaning (comparable to cc_dupl, cc_zero and cc_count in CoordinateCleaner) but adds options to clean taxonomic names (See also [taxize](https://github.com/ropensci/taxize)) and date information. biogeo includes some basic automated geographic cleaning (similar to `cc_val`, `cc_count` and `cc_outl`) but rather focusses on correcting suspicious coordinates on a manual basis using environmental information. Table 1. Function by function comparison of CoordinateCleaner, scrubr and biogeo. | Functionality | CoordinateCleaner 2.0-2 | scrubr 0.1.1 |biogeo 1.0| Percent overlap | |--------------------------------|--------------|------------------|------------------|--------------------| | Missing coordinates | cc_val | coord_incomplete |missingvalsexclude| 100% | | Coordinates outside CRS | cc_val | coord_impossible |-| 100% | | Duplicated records | cc_dupl | dedup |duplicatesexclude| The aim is identical, methods differ | | 0/0 coordinates | cc_zero | coord_unlikely |-| 100% | | Identical lon/lat | cc_equ | - |-| 0% | | Country capitals | cc_cap | - |-| 0% | | Political unit centroids | cc_cen | - |-| 0% | | Coordinates in-congruent with additional location information | cc_count | coord_within |errorcheck, quickclean| 100% | | Coordinates assigned to GBIF headquarters | cc_gbif | - |-| 0% | | Coordinates assigned to the location of biodiversity institutions | cc_inst | - |-| 0% | | Coordinates outside natural range | cc_iucn |- |-| 0% | | Spatial outliers | cc_outl | - |outliers| 50%, biogeo uses environmental distance | | Coordinates within the ocean | cc_sea | - |-| 0% | | Coordinates in urban area | cc_urb | - |-| 0% | | Coordinate conversion error | dc_ddmm | - |-| 0% | | Rounded coordinates/rasterized collection | dc_round | - |precisioncheck| 20%, biogeo test for predefined rasters | | Fossils: invalid age range | tc_equal | - |-| 0% | | Fossils: excessive age range | tc_range | - |-| 0% | | Fossils: temporal outlier | tc_outl | - |-| 0% | | Fossils: PyRate interface | WritePyrate | - |-| 0% | | Wrapper functions to run all test | CleanCoordinates, CleanCoordinatesDS, CleanCoordinatesFOS | - |-| 0% | | Database of biodiversity institutions | institutions | - |-| 0% | | Taxonomic cleaning | - | tax_no_epithet |-| 0% | | Missing date | - | date_missing |-| 0% | | Add date | - | date_create |-| 0% | | Date format | - | date_standardize |-| 0% | Reformatting coordinate annotation |-|-|a large set of functions|0 %| | Correcting coordinates using guessing and environmental distance |-|-|a large set of functions|0 %|