Using rscorecard to download data from the College Scorecard API requires two steps:

  1. Setting your API key
  2. Making a request

1. Setting your API key

If you don’t already have one, reqest your (free) API key from https://api.data.gov/signup. It should only take a few moments to register and receive your key.

Once you’ve gotten your key, you can store it usig sc_key(). In the absence of a key value argument, sc_get() will search your R environment for DATAGOV_API_KEY. It will complete the data request if found. sc_key() command will store your key in DATAGOV_API_KEY, which will persist until the R session is closed.

# NB: You must use a real key, of course... 
sc_key('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')

If you want a more permanent solution, you can add the following line (with your actual key, of course) to your .Renviron file. See this appendix for more information.

# NB: You must use a real key, of course... 
DATAGOV_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Simple request

Each request requires the following four commands piped together using %>%:

  1. sc_init()
  2. sc_filter()
  3. sc_select()
  4. sc_get()

The command chain must begin with sc_init() and end with sc_get. All other commands can come in any order.

The request belower should return a tibble with the name, IPEDS ID, state, and degree-seeking undergrad enrollment of all primarily Baccalaureate colleges in the Mid East region located in rural areas:

df <- sc_init() %>% 
    sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) %>% 
    sc_select(unitid, instnm, stabbr, ugds) %>% 
    sc_get()
#> Request complete!
df
#> # A tibble: 6 x 5
#>   unitid instnm                                               stabbr  ugds year 
#>    <int> <chr>                                                <chr>  <int> <chr>
#> 1 196051 SUNY Morrisville                                     NY      2758 late…
#> 2 214625 Pennsylvania State University-Penn State New Kensin… PA       567 late…
#> 3 194392 Paul Smiths College of Arts and Science              NY       740 late…
#> 4 214643 Pennsylvania State University-Penn State Wilkes-Bar… PA       401 late…
#> 5 191676 Houghton College                                     NY       981 late…
#> 6 197230 Wells College                                        NY       463 late…

Because we didn’t include a specific year, the latest data are returned. We could have specifically asked for the latest data using sc_year('latest'):

df <- sc_init() %>% 
    sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) %>% 
    sc_select(unitid, instnm, stabbr, ugds) %>%
    sc_year('latest') %>% 
    sc_get()
#> Request complete!
df
#> # A tibble: 6 x 5
#>   unitid instnm                                               stabbr  ugds year 
#>    <int> <chr>                                                <chr>  <int> <chr>
#> 1 196051 SUNY Morrisville                                     NY      2758 late…
#> 2 214625 Pennsylvania State University-Penn State New Kensin… PA       567 late…
#> 3 194392 Paul Smiths College of Arts and Science              NY       740 late…
#> 4 214643 Pennsylvania State University-Penn State Wilkes-Bar… PA       401 late…
#> 5 191676 Houghton College                                     NY       981 late…
#> 6 197230 Wells College                                        NY       463 late…

For a prior year’s data, change the value in sc_year():

df <- sc_init() %>% 
    sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) %>% 
    sc_select(unitid, instnm, stabbr, ugds) %>%
    sc_year(2005) %>% 
    sc_get()
#> Request complete!
df
#> # A tibble: 6 x 5
#>   unitid instnm                                               stabbr  ugds  year
#>    <int> <chr>                                                <chr>  <int> <dbl>
#> 1 196051 SUNY Morrisville                                     NY      2964  2005
#> 2 214625 Pennsylvania State University-Penn State New Kensin… PA       733  2005
#> 3 194392 Paul Smiths College of Arts and Science              NY       841  2005
#> 4 214643 Pennsylvania State University-Penn State Wilkes-Bar… PA       565  2005
#> 5 191676 Houghton College                                     NY      1368  2005
#> 6 197230 Wells College                                        NY       407  2005

Field of study data

In the fall of 2019, the College Scorecard released field of study-level data elements (4 digit CIP code level). These data elements can be requested alongside institution-level data:

df <- sc_init() %>% 
    sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) %>% 
    sc_select(unitid, instnm, stabbr, ugds, cipcode, cipdesc, debt_mdn) %>%
    sc_year("latest") %>% 
    sc_get()
#> Request complete!
## filter to show only those with non-NA values for median debt
df %>% dplyr::filter(!is.na(debt_mdn))
#> # A tibble: 216 x 8
#>    unitid instnm      stabbr  ugds cipcode cipdesc                debt_mdn year 
#>     <int> <chr>       <chr>  <int> <chr>   <chr>                     <dbl> <chr>
#>  1 196051 SUNY Morri… NY      2758 0100    Agriculture, General.     12000 late…
#>  2 196051 SUNY Morri… NY      2758 0101    Agricultural Business…    12000 late…
#>  3 196051 SUNY Morri… NY      2758 0101    Agricultural Business…    12000 late…
#>  4 196051 SUNY Morri… NY      2758 0102    Agricultural Mechaniz…    12000 late…
#>  5 196051 SUNY Morri… NY      2758 0102    Agricultural Mechaniz…    12000 late…
#>  6 196051 SUNY Morri… NY      2758 0103    Agricultural Producti…    12000 late…
#>  7 196051 SUNY Morri… NY      2758 0103    Agricultural Producti…    12000 late…
#>  8 196051 SUNY Morri… NY      2758 0105    Agricultural and Dome…    12000 late…
#>  9 196051 SUNY Morri… NY      2758 0109    Animal Sciences.          12000 late…
#> 10 196051 SUNY Morri… NY      2758 0111    Plant Sciences.           12000 late…
#> # … with 206 more rows

Important note:

The mapping scheme of data across years isn’t consistent across data elements. From the technical documentation for institution-level data:

The data contain diverse measures of institutional performance constructed both with an eye towards the type of information that would be most useful to prospective students, as well as towards how the measures might promote accountability for institutions. The measures require different definitions of cohorts. Users of the data should be aware of this, particularly when constructing analyses of the relationship between different measures. Moreover, reporting inaccuracies in some data elements used for cohort definitions are also important. (p. 37)

That is, while the reporting year (e.g., sc_year(2016)) may be the same, the measurement year may not directly align. The same holds true when trying to align institution-level data with field of study-level data (see the technical documentation for field of study-level data for more information).

The upshot is that rscorecard will return data based on what the API call returns, but the user should take care to ensure that returned data elements align with expectations and project needs.

More information and examples

For more information about each command, see Commands.

For more examples, see More examples.