Using rscorecard to download data from the College Scorecard API requires two steps:

  1. Setting your API key
  2. Making a request

1. Setting your API key

If you don’t already have one, reqest your (free) API key from https://api.data.gov/signup. It should only take a few moments to register and receive your key.

Once you’ve gotten your key, you can store it usig sc_key(). In the absence of a key value argument, sc_get() will search your R environment for DATAGOV_API_KEY. It will complete the data request if found. sc_key() command will store your key in DATAGOV_API_KEY, which will persist until the R session is closed.

# NB: You must use a real key, of course... 
sc_key("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")

If you want a more permanent solution, you can add the following line (with your actual key, of course) to your .Renviron file. See this appendix for more information.

# NB: You must use a real key, of course... 
DATAGOV_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

2. Simple request

Each request requires the following four commands piped together using %>%:

  1. sc_init()
  2. sc_filter()
  3. sc_select()
  4. sc_get()

The command chain must begin with sc_init() and end with sc_get. All other commands can come in any order.

The request belower should return a tibble with the name, IPEDS ID, state, and degree-seeking undergrad enrollment of all primarily Baccalaureate colleges in the Mid East region located in rural areas:

df <- sc_init() %>% 
    sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) %>% 
    sc_select(unitid, instnm, stabbr, ugds) %>% 
    sc_get()
#> Request complete!
df
#> # A tibble: 6 x 5
#>   unitid instnm                                               stabbr  ugds year 
#>    <int> <chr>                                                <chr>  <int> <chr>
#> 1 191676 Houghton College                                     NY       981 late…
#> 2 194392 Paul Smiths College of Arts and Science              NY       740 late…
#> 3 196051 SUNY Morrisville                                     NY      2758 late…
#> 4 197230 Wells College                                        NY       463 late…
#> 5 214625 Pennsylvania State University-Penn State New Kensin… PA       567 late…
#> 6 214643 Pennsylvania State University-Penn State Wilkes-Bar… PA       401 late…

Because we didn’t include a specific year, the latest data are returned. We could have specifically asked for the latest data using sc_year('latest'):

df <- sc_init() %>% 
    sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) %>% 
    sc_select(unitid, instnm, stabbr, ugds) %>%
    sc_year("latest") %>% 
    sc_get()
#> Request complete!
df
#> # A tibble: 6 x 5
#>   unitid instnm                                               stabbr  ugds year 
#>    <int> <chr>                                                <chr>  <int> <chr>
#> 1 191676 Houghton College                                     NY       981 late…
#> 2 194392 Paul Smiths College of Arts and Science              NY       740 late…
#> 3 196051 SUNY Morrisville                                     NY      2758 late…
#> 4 197230 Wells College                                        NY       463 late…
#> 5 214625 Pennsylvania State University-Penn State New Kensin… PA       567 late…
#> 6 214643 Pennsylvania State University-Penn State Wilkes-Bar… PA       401 late…

For a prior year’s data, change the value in sc_year():

df <- sc_init() %>% 
    sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) %>% 
    sc_select(unitid, instnm, stabbr, ugds) %>%
    sc_year(2005) %>% 
    sc_get()
#> Request complete!
df
#> # A tibble: 6 x 5
#>   unitid instnm                                               stabbr  ugds  year
#>    <int> <chr>                                                <chr>  <int> <dbl>
#> 1 191676 Houghton College                                     NY      1368  2005
#> 2 194392 Paul Smiths College of Arts and Science              NY       841  2005
#> 3 196051 SUNY Morrisville                                     NY      2964  2005
#> 4 197230 Wells College                                        NY       407  2005
#> 5 214625 Pennsylvania State University-Penn State New Kensin… PA       733  2005
#> 6 214643 Pennsylvania State University-Penn State Wilkes-Bar… PA       565  2005

Field of study data

In the fall of 2019, the College Scorecard released field of study-level data elements (4 digit CIP code level). These data elements can be requested alongside institution-level data:

df <- sc_init() %>% 
    sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) %>% 
    sc_select(unitid, instnm, stabbr, ugds, cipcode, cipdesc, debt_mdn) %>%
    sc_year("latest") %>% 
    sc_get()
#> Request complete!
## filter to show only those with non-NA values for median debt
df %>% dplyr::filter(!is.na(debt_mdn))
#> # A tibble: 712 x 8
#>    unitid instnm    stabbr  ugds cipcode cipdesc                  debt_mdn year 
#>     <int> <chr>     <chr>  <int> <chr>   <chr>                       <int> <chr>
#>  1 191676 Houghton… NY       981 0105    Agricultural and Domest…    19500 late…
#>  2 191676 Houghton… NY       981 0501    Area Studies.               19500 late…
#>  3 191676 Houghton… NY       981 0901    Communication and Media…    19500 late…
#>  4 191676 Houghton… NY       981 1101    Computer and Informatio…    19500 late…
#>  5 191676 Houghton… NY       981 1104    Information Science/Stu…    19500 late…
#>  6 191676 Houghton… NY       981 1312    Teacher Education and P…    19500 late…
#>  7 191676 Houghton… NY       981 1313    Teacher Education and P…    19500 late…
#>  8 191676 Houghton… NY       981 1314    Teaching English or Fre…    19500 late…
#>  9 191676 Houghton… NY       981 1412    Engineering Physics.        19500 late…
#> 10 191676 Houghton… NY       981 1609    Romance Languages, Lite…    19500 late…
#> # … with 702 more rows

Important note:

The mapping scheme of data across years isn’t consistent across data elements. From the technical documentation for institution-level data:

The data contain diverse measures of institutional performance constructed both with an eye towards the type of information that would be most useful to prospective students, as well as towards how the measures might promote accountability for institutions. The measures require different definitions of cohorts. Users of the data should be aware of this, particularly when constructing analyses of the relationship between different measures. Moreover, reporting inaccuracies in some data elements used for cohort definitions are also important. (p. 37)

That is, while the reporting year (e.g., sc_year(2016)) may be the same, the measurement year may not directly align. The same holds true when trying to align institution-level data with field of study-level data (see the technical documentation for field of study-level data for more information).

The upshot is that rscorecard will return data based on what the API call returns, but the user should take care to ensure that returned data elements align with expectations and project needs.

More information and examples

For more information about each command, see Commands.

For more examples, see More examples.