Title: | Download Data from the European Social Survey on the Fly |
---|---|
Description: | Download data from the European Social Survey directly from their website <http://www.europeansocialsurvey.org/>. There are two families of functions that allow you to download and interactively check all countries and rounds available. |
Authors: | Jorge Cimentada [aut, cre], Thomas Leeper [rev] (Thomas reviewed the package for rOpensci,see https://github.com/ropensci/software-review/issues/201), Nujcharee Haswell [rev] (Nujcharee reviewed the package for rOpensci, see https://github.com/ropensci/software-review/issues/201), Jorge Lopez [ctb], François Briatte [ctb] |
Maintainer: | Jorge Cimentada <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.8 |
Built: | 2024-11-27 03:54:20 UTC |
Source: | https://github.com/ropensci/essurvey |
Download integrated rounds separately for countries from the European Social Survey
import_country(country, rounds, ess_email = NULL, format = NULL) import_all_cntrounds(country, ess_email = NULL, format = NULL) download_country( country, rounds, ess_email = NULL, output_dir = getwd(), format = "stata" )
import_country(country, rounds, ess_email = NULL, format = NULL) import_all_cntrounds(country, ess_email = NULL, format = NULL) download_country( country, rounds, ess_email = NULL, output_dir = getwd(), format = "stata" )
country |
a character of length 1 with the full name of the country.
Use |
rounds |
a numeric vector with the rounds to download. See |
ess_email |
a character vector with your email, such as "[email protected]".
If you haven't registered in the ESS website, create an account at
http://www.europeansocialsurvey.org/user/new. A preferred method is to login
through |
format |
the format from which to download the data. By default it is NULL for |
output_dir |
a character vector with the output directory in case you want to
only download the files using |
Use import_country
to download specified rounds for a given country and
import them to R.
import_all_cntrounds
will download all rounds for a given country by default
and download_country
will download rounds and save them in a specified
format
in the supplied directory.
The format
argument from import_country
should not matter to the user
because the data is read into R either way. However, different formats might have
different handling of the encoding of some questions. This option was preserved
so that the user
can switch between formats if any encoding errors are found in the data. For more
details see the discussion here.
For this particular argument, 'sas' is not supported because the data formats have
changed between ESS waves and separate formats require different functions to be
read. To preserve parsimony and format errors between waves, the user should use
'spss' or 'stata'.
for import_country
if length(rounds)
is 1, it returns a tibble
with the latest version of that round. Otherwise it returns a list of length(rounds)
containing the latest version of each round. For download_country
, if
output_dir
is a valid directory, it returns the saved directories invisibly
and saves all the rounds in the chosen format
in output_dir
## Not run: set_email("[email protected]") # Get first three rounds for Denmark dk_three <- import_country("Denmark", 1:3) # Only download the files, this will return nothing temp_dir <- tempdir() download_country( "Turkey", rounds = c(2, 4), output_dir = temp_dir ) # By default, download_country downloads 'stata' files but # you can also download 'spss' or 'sas' files. download_country( "Turkey", rounds = c(2, 4), output_dir = temp_dir, format = 'spss' ) # If email is not registered at ESS website, error will arise uk_one <- import_country("United Kingdom", 5, "[email protected]") # Error in authenticate(ess_email) : # The email address you provided is not associated with any registered user. # Create an account at http://www.europeansocialsurvey.org/user/new # If selected rounds don't exist, error will arise czech_two <- import_country("Czech Republic", c(1, 22)) # Error in country_url(country, rounds) : # Only rounds ESS1, ESS2, ESS4, ESS5, ESS6, ESS7, ESS8 available # for Czech Republic ## End(Not run)
## Not run: set_email("[email protected]") # Get first three rounds for Denmark dk_three <- import_country("Denmark", 1:3) # Only download the files, this will return nothing temp_dir <- tempdir() download_country( "Turkey", rounds = c(2, 4), output_dir = temp_dir ) # By default, download_country downloads 'stata' files but # you can also download 'spss' or 'sas' files. download_country( "Turkey", rounds = c(2, 4), output_dir = temp_dir, format = 'spss' ) # If email is not registered at ESS website, error will arise uk_one <- import_country("United Kingdom", 5, "[email protected]") # Error in authenticate(ess_email) : # The email address you provided is not associated with any registered user. # Create an account at http://www.europeansocialsurvey.org/user/new # If selected rounds don't exist, error will arise czech_two <- import_country("Czech Republic", c(1, 22)) # Error in country_url(country, rounds) : # Only rounds ESS1, ESS2, ESS4, ESS5, ESS6, ESS7, ESS8 available # for Czech Republic ## End(Not run)
Download integrated rounds from the European Social Survey
import_rounds(rounds, ess_email = NULL, format = NULL) import_all_rounds(ess_email = NULL, format = NULL) download_rounds( rounds, ess_email = NULL, output_dir = getwd(), format = "stata" )
import_rounds(rounds, ess_email = NULL, format = NULL) import_all_rounds(ess_email = NULL, format = NULL) download_rounds( rounds, ess_email = NULL, output_dir = getwd(), format = "stata" )
rounds |
a numeric vector with the rounds to download. See |
ess_email |
a character vector with your email, such as "[email protected]".
If you haven't registered in the ESS website, create an account at
http://www.europeansocialsurvey.org/user/new. A preferred method is to login
through |
format |
the format from which to download the data. By default it is NULL for |
output_dir |
a character vector with the output directory in case you want to only download the files using
the |
Use import_rounds
to download specified rounds and import them to R.
import_all_rounds
will download all rounds by default and download_rounds
will download rounds and save them in a specified format
in the supplied
directory.
The format
argument from import_rounds
should not matter to the user
because the data is read into R either way. However, different formats might have
different handling of the encoding of some questions. This option was preserved
so that the user
can switch between formats if any encoding errors are found in the data. For more
details see the discussion here.
For this particular argument in, 'sas' is not supported because the data formats have
changed between ESS waves and separate formats require different functions to be
read. To preserve parsimony and format errors between waves, the user should use
'spss' or 'stata'.
for import_rounds
if length(rounds)
is 1, it returns a tibble
with the latest version of that round. Otherwise it returns a list of length(rounds)
containing the latest version of each round. For download_rounds
, if
output_dir
is a valid directory, it returns the saved directories invisibly
and saves all the rounds in the chosen format
in output_dir
## Not run: set_email("[email protected]") # Get first three rounds three_rounds <- import_rounds(1:3) temp_dir <- tempdir() # Only download the files to output_dir, this will return nothing. download_rounds( rounds = 1:3, output_dir = temp_dir, ) # By default, download_rounds saves a 'stata' file. You can # also download 'spss' and 'sas' files. download_rounds( rounds = 1:3, output_dir = temp_dir, format = 'spss' ) # If rounds are repeated, will download only unique ones two_rounds <- import_rounds(c(1, 1)) # If email is not registered at ESS website, error will arise two_rounds <- import_rounds(c(1, 2), "[email protected]") # Error in authenticate(ess_email) : # The email address you provided is not associated with any registered user. # Create an account at https://www.europeansocialsurvey.org/user/new # If selected rounds don't exist, error will arise two_rounds <- import_rounds(c(1, 22)) # Error in round_url(rounds) : # ESS round 22 is not a available. Check show_rounds() ## End(Not run)
## Not run: set_email("[email protected]") # Get first three rounds three_rounds <- import_rounds(1:3) temp_dir <- tempdir() # Only download the files to output_dir, this will return nothing. download_rounds( rounds = 1:3, output_dir = temp_dir, ) # By default, download_rounds saves a 'stata' file. You can # also download 'spss' and 'sas' files. download_rounds( rounds = 1:3, output_dir = temp_dir, format = 'spss' ) # If rounds are repeated, will download only unique ones two_rounds <- import_rounds(c(1, 1)) # If email is not registered at ESS website, error will arise two_rounds <- import_rounds(c(1, 2), "[email protected]") # Error in authenticate(ess_email) : # The email address you provided is not associated with any registered user. # Create an account at https://www.europeansocialsurvey.org/user/new # If selected rounds don't exist, error will arise two_rounds <- import_rounds(c(1, 22)) # Error in round_url(rounds) : # ESS round 22 is not a available. Check show_rounds() ## End(Not run)
Download SDDF data by round for countries from the European Social Survey
import_sddf_country(country, rounds, ess_email = NULL, format = NULL) import_all_sddf_cntrounds(country, ess_email = NULL, format = NULL) download_sddf_country( country, rounds, ess_email = NULL, output_dir = getwd(), format = "stata" )
import_sddf_country(country, rounds, ess_email = NULL, format = NULL) import_all_sddf_cntrounds(country, ess_email = NULL, format = NULL) download_sddf_country( country, rounds, ess_email = NULL, output_dir = getwd(), format = "stata" )
country |
a character of length 1 with the full name of the country.
Use |
rounds |
a numeric vector with the rounds to download. See |
ess_email |
a character vector with your email, such as "[email protected]".
If you haven't registered in the ESS website, create an account at
http://www.europeansocialsurvey.org/user/new. A preferred method is to login
through |
format |
the format from which to download the data. By default it is NULL for |
output_dir |
a character vector with the output directory in case you want to
only download the files using |
SDDF data (Sample Design Data Files) are data sets that contain additional columns with the sample design and weights for a given country in a given round. These additional columns are required to perform any complex weighted analysis of the ESS data. Users interested in using this data should read the description of SDDF files here and should read here for the sampling design of the country of analysis for that specific round.
Use import_sddf_country
to download the SDDF data by country into R.
import_all_sddf_cntrounds
will download all available SDDF data for a given country by
default and download_sddf_country
will download SDDF data and save them in a specified
format
in the supplied directory.
The format
argument from import_country
should not matter to the user
because the data is read into R either way. However, different formats might have
different handling of the encoding of some questions. This option was preserved
so that the user can switch between formats if any encoding errors are found in the data. For more
details see the discussion here.
Additionally, given that the SDDF data is not very complete, some countries do not have SDDF data
in Stata or SPSS formats. For that reason, the format
argument is not used in import_sddf_country
.
Internally, Stata
is chosen over SPSS
and SPSS
over SAS
in that
order of preference.
For this particular argument, 'sas' is not supported because the data formats have changed between ESS waves and separate formats require different functions to be read. To preserve parsimony and format errors between waves, the user should use 'stata' or 'spss'.
Starting from round 7 (including), the ESS switched the layout of SDDF data.
Before the rounds, SDDF data was published separately by wave-country
combination. From round 7 onwards, all SDDF data is released as a single
integrated file with all countries combined for that given round. import_sddf_country
takes care of this nuance by reading the data and filtering the chosen
country automatically. download_sddf_country
downloads the raw file but also
reads the data into memory to subset the specific country requested. This
process should be transparent to the user but beware that reading/writing the data might delete
some of it's properties such as dropping the labels or label attribute.
for import_sddf_country
if length(rounds)
is 1, it returns a tibble with
the latest version of that round. Otherwise it returns a list of length(rounds)
containing the latest version of each round. For download_sddf_country
, if
output_dir
is a valid directory, it returns the saved directories invisibly and saves
all the rounds in the chosen format
in output_dir
## Not run: set_email("[email protected]") sp_three <- import_sddf_country("Spain", 5:6) show_sddf_cntrounds("Spain") # Only download the files, this will return nothing temp_dir <- tempdir() download_sddf_country( "Spain", rounds = 5:6, output_dir = temp_dir ) # By default, download_sddf_country downloads 'stata' files but # you can also download 'spss' or 'sas' files. download_sddf_country( "Spain", rounds = 1:8, output_dir = temp_dir, format = 'spss' ) ## End(Not run)
## Not run: set_email("[email protected]") sp_three <- import_sddf_country("Spain", 5:6) show_sddf_cntrounds("Spain") # Only download the files, this will return nothing temp_dir <- tempdir() download_sddf_country( "Spain", rounds = 5:6, output_dir = temp_dir ) # By default, download_sddf_country downloads 'stata' files but # you can also download 'spss' or 'sas' files. download_sddf_country( "Spain", rounds = 1:8, output_dir = temp_dir, format = 'spss' ) ## End(Not run)
This function is not needed any more, please see the details section.
recode_missings(ess_data, missing_codes) recode_numeric_missing(x, missing_codes) recode_strings_missing(y, missing_codes)
recode_missings(ess_data, missing_codes) recode_numeric_missing(x, missing_codes) recode_strings_missing(y, missing_codes)
ess_data |
data frame or |
missing_codes |
a character vector with values 'Not applicable', 'Refusal', 'Don't Know', 'No answer' or 'Not available'. By default all values are chosen. Note that the wording is case sensitive. |
x |
a |
y |
a character vector |
Data from the European Social Survey is always accompanied by a script that recodes the categories 'Not applicable', 'Refusal', 'Don't Know', 'No answer' and 'Not available' to missing. This function recodes these categories to NA
The European Social Survey now provides these values recoded automatically
in Stata data files. These missing categories are now read as missing values
by read_dta
, reading the missing categories correctly from Stata.For an example on how these values are coded, see here.
Old details:
When downloading data directly from the European Social Survey's website, the downloaded .zip file contains a script that recodes some categories as missings in Stata and SPSS formats.
For recoding numeric variables recode_numeric_missings
uses the labels provided by the labelled
class to delete the labels matched in missing_codes
. For the
character variables matching is done with the underlying number assigned to
each category, namely 6, 7, 8, 9 and 9 for 'Not applicable', Refusal',
'Don't Know', No answer' and 'Not available'.
The functions are a direct translation of the Stata script that comes along when downloading one of the rounds. The Stata script is the same for all rounds and all countries, meaning that these functions work for all rounds.
The same data frame or tibble
but with values 'Not applicable',
'Refusal', 'Don't Know', 'No answer' and 'Not available' recoded
as NA.
## Not run: seven <- import_rounds(7, your_email) attr(seven$tvtot, "labels") mean(seven$tvtot, na.rm = TRUE) names(table(seven$lnghom1)) # First three are actually missing values seven_recoded <- recode_missings(seven) attr(seven_recoded$tvtot, "labels") # All missings have been removed mean(seven_recoded$tvtot, na.rm = TRUE) names(table(seven_recoded$lnghom1)) # All missings have been removed # If you want to operate on specific variables # you can use other recode_*_missing seven$tvtot <- recode_numeric_missing(seven$tvtot) # Recode only 'Don't know' and 'No answer' to missing seven$tvpol <- recode_numeric_missing(seven$tvpol, c("Don't know", "No answer")) # The same can be done with recode_strings_missing ## End(Not run)
## Not run: seven <- import_rounds(7, your_email) attr(seven$tvtot, "labels") mean(seven$tvtot, na.rm = TRUE) names(table(seven$lnghom1)) # First three are actually missing values seven_recoded <- recode_missings(seven) attr(seven_recoded$tvtot, "labels") # All missings have been removed mean(seven_recoded$tvtot, na.rm = TRUE) names(table(seven_recoded$lnghom1)) # All missings have been removed # If you want to operate on specific variables # you can use other recode_*_missing seven$tvtot <- recode_numeric_missing(seven$tvtot) # Recode only 'Don't know' and 'No answer' to missing seven$tvpol <- recode_numeric_missing(seven$tvpol, c("Don't know", "No answer")) # The same can be done with recode_strings_missing ## End(Not run)
Save your ESS email as an environment variable
set_email(ess_email)
set_email(ess_email)
ess_email |
a character string with your registered email. |
You should only run set_email()
once and every import_
and download_
function
should work fine. Make sure your email is registered at
http://www.europeansocialsurvey.org/ before setting the email.
## Not run: set_email("[email protected]") import_rounds(1) ## End(Not run)
## Not run: set_email("[email protected]") import_rounds(1) ## End(Not run)
Return available countries in the European Social Survey
show_countries()
show_countries()
character vector with available countries
## Not run: show_countries() ## End(Not run)
## Not run: show_countries() ## End(Not run)
Return available rounds for a country in the European Social Survey
show_country_rounds(country)
show_country_rounds(country)
country |
A character of length 1 with the full name of the country.
Use |
numeric vector with available rounds for country
## Not run: show_country_rounds("Spain") show_country_rounds("Turkey") ## End(Not run)
## Not run: show_country_rounds("Spain") show_country_rounds("Turkey") ## End(Not run)
Return available rounds in the European Social Survey
show_rounds()
show_rounds()
numeric vector with available rounds
## Not run: show_rounds() ## End(Not run)
## Not run: show_rounds() ## End(Not run)
Return countries that participated in all of the specified rounds.
show_rounds_country(rounds, participate = TRUE)
show_rounds_country(rounds, participate = TRUE)
rounds |
A numeric vector specifying the rounds from which to return the countries.
Use |
participate |
A logical that controls whether to show participating countries in that/those
rounds or countries that didn't participate. Set to |
show_rounds_country
returns the countries that participated in
all of the specified rounds. That is, show_rounds_country(1:2)
will return countries that participated both in round 1 and round 2. Conversely,
if participate = FALSE
it will return the countries that did not
participate in both round 1 and round 2.
A character vector with the country names
## Not run: # Return countries that participated in round 2 show_rounds_country(2) # Return countries that participated in all rounds show_rounds_country(1:8) # Return countries that didn't participate in the first three rounds show_rounds_country(1:3, participate = FALSE) ## End(Not run)
## Not run: # Return countries that participated in round 2 show_rounds_country(2) # Return countries that participated in all rounds show_rounds_country(1:8) # Return countries that didn't participate in the first three rounds show_rounds_country(1:3, participate = FALSE) ## End(Not run)
Return available SDDF rounds for a country in the European Social Survey
show_sddf_cntrounds(country, ess_email = NULL)
show_sddf_cntrounds(country, ess_email = NULL)
country |
A character of length 1 with the full name of the country.
Use |
ess_email |
a character vector with your email, such as "[email protected]".
If you haven't registered in the ESS website, create an account at
http://www.europeansocialsurvey.org/user/new. A preferred method is to login
through |
SDDF data are the equivalent weight data used to analyze the European Social Survey
properly. For more information, see the details section of import_sddf_country
.
As an exception to the show_*
family of functions, show_sddf rounds
needs your ESS email to check which rounds are available. Be sure to add it
with set_email
.
numeric vector with available rounds for country
## Not run: set_email("[email protected]") show_sddf_cntrounds("Spain") ## End(Not run)
## Not run: set_email("[email protected]") show_sddf_cntrounds("Spain") ## End(Not run)
This function returns the available rounds for any theme from
show_themes
. However, contrary to show_country_rounds
themes can not be downloaded as separate datasets. This and the
show_themes
function serve purely for informative purposes.
show_theme_rounds(theme)
show_theme_rounds(theme)
theme |
A character of length 1 with the full name of the theme.
Use |
numeric vector with available rounds for country
## Not run: chosen_theme <- show_themes()[3] # In which rounds was the topic of 'Democracy' asked? show_theme_rounds(chosen_theme) # And politics? show_theme_rounds("Politics") ## End(Not run)
## Not run: chosen_theme <- show_themes()[3] # In which rounds was the topic of 'Democracy' asked? show_theme_rounds(chosen_theme) # And politics? show_theme_rounds("Politics") ## End(Not run)
This function returns the available themes in the European Social Survey.
However, contrary to show_countries
and show_country_rounds
,
themes can not be downloaded as separate datasets. This and
show_theme_rounds
serve purely for informative purposes.
show_themes()
show_themes()
character vector with available themes
## Not run: show_themes() ## End(Not run)
## Not run: show_themes() ## End(Not run)