--- title: "Finding data" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Finding data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ``` r library(rnaturalearth) library(sf) #> Linking to GEOS 3.12.1, GDAL 3.8.5, PROJ 9.4.0; sf_use_s2() is TRUE ``` ## Available data There are a lot of data that can be downloaded from [Natural Earth](https://www.naturalearthdata.com/) with `ne_download()`. These data are divided into two main categories: _physical_ and _cultural_ vector data. The `df_layers_physical` and `df_layers_cultural` data frames included in the `rnaturalearth` packages show what layer of data can be downloaded. ### Physical vector data ``` r data(df_layers_physical) knitr::kable( df_layers_physical, caption = "physical vector data available via ne_download()" ) ``` Table: physical vector data available via ne_download() |layer | scale10| scale50| scale110| |:----------------------------------|-------:|-------:|--------:| |antarctic_ice_shelves_lines | 1| 1| 0| |antarctic_ice_shelves_polys | 1| 1| 0| |coastline | 1| 1| 1| |geographic_lines | 1| 1| 1| |geography_marine_polys | 1| 1| 1| |geography_regions_elevation_points | 1| 1| 1| |geography_regions_points | 1| 1| 1| |geography_regions_polys | 1| 1| 1| |glaciated_areas | 1| 1| 1| |lakes | 1| 1| 1| |lakes_europe | 1| 0| 0| |lakes_historic | 1| 1| 0| |lakes_north_america | 1| 0| 0| |lakes_pluvial | 1| 0| 0| |land | 1| 1| 1| |land_ocean_label_points | 1| 0| 0| |land_ocean_seams | 1| 0| 0| |land_scale_rank | 1| 0| 0| |minor_islands | 1| 0| 0| |minor_islands_coastline | 1| 0| 0| |minor_islands_label_points | 1| 0| 0| |ocean | 1| 1| 1| |ocean_scale_rank | 1| 0| 0| |playas | 1| 1| 0| |reefs | 1| 0| 0| |rivers_europe | 1| 0| 0| |rivers_lake_centerlines | 1| 1| 1| |rivers_lake_centerlines_scale_rank | 1| 1| 0| |rivers_north_america | 1| 0| 0| Based on the previous table, we know that we can download the `ocean` vector at small scale (110). Note that scales are defined as one of `110`, `50`, `10` or `small`, `medium`, `large`. ``` r plot( ne_download(type = "ocean", category = "physical", scale = "small")[ "geometry" ], col = "lightblue" ) #> Reading 'ne_110m_ocean.zip' from naturalearth... ``` ![](finding-data.Rmd-3-1.png) ### Cultural vector data ``` r data(df_layers_cultural) knitr::kable( df_layers_cultural, caption = "cultural vector data available via ne_download()" ) ``` Table: cultural vector data available via ne_download() |layer | scale10| scale50| scale110| |:-----------------------------------------------|-------:|-------:|--------:| |admin_0_antarctic_claim_limit_lines | 1| 0| 0| |admin_0_antarctic_claims | 1| 0| 0| |admin_0_boundary_lines_disputed_areas | 1| 1| 0| |admin_0_boundary_lines_land | 1| 1| 1| |admin_0_boundary_lines_map_units | 1| 0| 0| |admin_0_boundary_lines_maritime_indicator | 1| 1| 0| |admin_0_boundary_map_units | 0| 1| 0| |admin_0_breakaway_disputed_areas | 0| 1| 0| |admin_0_countries | 1| 1| 1| |admin_0_countries_lakes | 1| 1| 1| |admin_0_disputed_areas | 1| 0| 0| |admin_0_disputed_areas_scale_rank_minor_islands | 1| 0| 0| |admin_0_label_points | 1| 0| 0| |admin_0_map_subunits | 1| 1| 0| |admin_0_map_units | 1| 1| 1| |admin_0_pacific_groupings | 1| 1| 1| |admin_0_scale_rank | 1| 1| 1| |admin_0_scale_rank_minor_islands | 1| 0| 0| |admin_0_seams | 1| 0| 0| |admin_0_sovereignty | 1| 1| 1| |admin_0_tiny_countries | 0| 1| 1| |admin_0_tiny_countries_scale_rank | 0| 1| 0| |admin_1_label_points | 1| 0| 0| |admin_1_seams | 1| 0| 0| |admin_1_states_provinces | 1| 1| 1| |admin_1_states_provinces_lakes | 1| 1| 1| |admin_1_states_provinces_lines | 1| 1| 1| |admin_1_states_provinces_scale_rank | 1| 1| 1| |airports | 1| 1| 0| |parks_and_protected_lands_area | 1| 0| 0| |parks_and_protected_lands_line | 1| 0| 0| |parks_and_protected_lands_point | 1| 0| 0| |parks_and_protected_lands_scale_rank | 1| 0| 0| |populated_places | 1| 1| 1| |populated_places_simple | 1| 1| 1| |ports | 1| 1| 0| |railroads | 1| 0| 0| |railroads_north_america | 1| 0| 0| |roads | 1| 0| 0| |roads_north_america | 1| 0| 0| |time_zones | 1| 0| 0| |urban_areas | 1| 1| 0| |urban_areas_landscan | 1| 0| 0| ``` r plot( ne_download( type = "airports", category = "cultural", scale = 10L )["geometry"], pch = 21L, bg = "grey" ) #> Reading 'ne_10m_airports.zip' from naturalearth... ``` ![](finding-data.Rmd-5-1.png) ## Searching for countries and continents In this article, we explore how we can search for data available to download within `rnaturalearth`. Let's begin by loading country data using the `read_sf()` function from the `sf` package. In the following code snippet, we read the Natural Earth dataset, which contains information about the sovereignty of countries. ``` r df <- read_sf( "/vsizip/vsicurl/https://naciscdn.org/naturalearth/10m/cultural/ne_10m_admin_0_sovereignty.zip" ) head(df) #> Simple feature collection with 6 features and 168 fields #> Geometry type: MULTIPOLYGON #> Dimension: XY #> Bounding box: xmin: -109.4537 ymin: -55.9185 xmax: 140.9776 ymax: 7.35578 #> Geodetic CRS: WGS 84 #> # A tibble: 6 × 169 #> featurecla scalerank LABELRANK SOVEREIGNT SOV_A3 ADM0_DIF LEVEL TYPE TLC ADMIN ADM0_A3 #> #> 1 Admin-0 sover… 5 2 Indonesia IDN 0 2 Sove… 1 Indo… IDN #> 2 Admin-0 sover… 5 3 Malaysia MYS 0 2 Sove… 1 Mala… MYS #> 3 Admin-0 sover… 0 2 Chile CHL 0 2 Sove… 1 Chile CHL #> 4 Admin-0 sover… 0 3 Bolivia BOL 0 2 Sove… 1 Boli… BOL #> 5 Admin-0 sover… 0 2 Peru PER 0 2 Sove… 1 Peru PER #> 6 Admin-0 sover… 0 2 Argentina ARG 0 2 Sove… 1 Arge… ARG #> # ℹ 158 more variables: GEOU_DIF , GEOUNIT , GU_A3 , SU_DIF , #> # SUBUNIT , SU_A3 , BRK_DIFF , NAME , NAME_LONG , BRK_A3 , #> # BRK_NAME , BRK_GROUP , ABBREV , POSTAL , FORMAL_EN , #> # FORMAL_FR , NAME_CIAWF , NOTE_ADM0 , NOTE_BRK , NAME_SORT , #> # NAME_ALT , MAPCOLOR7 , MAPCOLOR8 , MAPCOLOR9 , MAPCOLOR13 , #> # POP_EST , POP_RANK , POP_YEAR , GDP_MD , GDP_YEAR , #> # ECONOMY , INCOME_GRP , FIPS_10 , ISO_A2 , ISO_A2_EH , #> # ISO_A3 , ISO_A3_EH , ISO_N3 , ISO_N3_EH , UN_A3 , WB_A2 , #> # WB_A3 , WOE_ID , WOE_ID_EH , WOE_NOTE , ADM0_ISO , #> # ADM0_DIFF , ADM0_TLC , ADM0_A3_US , ADM0_A3_FR , ADM0_A3_RU , #> # ADM0_A3_ES , ADM0_A3_CN , ADM0_A3_TW , ADM0_A3_IN , ADM0_A3_NP , #> # ADM0_A3_PK , ADM0_A3_DE , ADM0_A3_GB , ADM0_A3_BR , ADM0_A3_IL , #> # ADM0_A3_PS , ADM0_A3_SA , ADM0_A3_EG , ADM0_A3_MA , ADM0_A3_PT , #> # ADM0_A3_AR , ADM0_A3_JP , ADM0_A3_KO , ADM0_A3_VN , ADM0_A3_TR , #> # ADM0_A3_ID , ADM0_A3_PL , ADM0_A3_GR , ADM0_A3_IT , ADM0_A3_NL , #> # ADM0_A3_SE , ADM0_A3_BD , ADM0_A3_UA , ADM0_A3_UN , ADM0_A3_WB , #> # CONTINENT , REGION_UN , SUBREGION , REGION_WB , NAME_LEN , #> # LONG_LEN , ABBREV_LEN , TINY , HOMEPART , MIN_ZOOM , #> # MIN_LABEL , MAX_LABEL , LABEL_X , LABEL_Y , NE_ID , #> # WIKIDATAID , NAME_AR , NAME_BN , NAME_DE , … ``` ### Finding countries One way to search for countries is to search within the `ADMIN` vector. Let's start by plotting some of the first countries. ``` r lapply( df$ADMIN[1L:6L], \(x) plot(ne_countries(country = x)["geometry"], main = x) ) ``` Suppose that we want to search the polygons for the US, how should we spell it? ``` r ne_countries(country = "USA") ne_countries(country = "United States") ne_countries(country = "United States Of America") ne_countries(country = "United States of America") ``` One possibility consists to search within the `ADMIN` vector using a regular expression to find all occurrences of the word _states_. ``` r grep("states", df$ADMIN, ignore.case = TRUE, value = TRUE) #> [1] "United States of America" "Federated States of Micronesia" ``` We can now get the data. ``` r plot(ne_countries(country = "United States of America")["geometry"]) ``` ![](finding-data.Rmd-10-1.png) ### Continents Finally, let's create plots for each continent using the `ne_countries` function with the continent parameter. ``` r unique(df$CONTINENT) #> [1] "Asia" "South America" "Europe" #> [4] "Africa" "North America" "Oceania" #> [7] "Antarctica" "Seven seas (open ocean)" ``` ``` r lapply( unique(df$CONTINENT), \(x) plot( ne_countries( continent = x, scale = "medium" )["geometry"], main = x ) ) ```