Securing data with duawranglr • duawranglr

This vignette walks through the process for using duawranglr. It assumes that the data administrator and researcher have executed a data usage agreement (DUA) with three potential levels of data restrictions and created a crosswalk spreadsheet in Excel.

Administrative file to be wrangled

The raw administrative data file that needs to be processed looks like this:

sid	sname	dob	gender	raceeth	tid	tname	zip	mathscr	readscr
000-00-0001	Schaefer	19900114	0	2	1	Smith	22906	515	496
000-00-0002	Hodges	19900225	0	1	1	Smith	22906	488	489
000-00-0003	Kirby	19900305	0	4	1	Smith	22906	522	498
000-00-0004	Estrada	19900419	0	3	1	Smith	22906	516	524
000-00-0005	Nielsen	19900530	1	2	1	Smith	22906	483	509
000-00-0006	Dean	19900621	1	1	2	Brown	22906	503	523
000-00-0007	Hickman	19900712	1	1	2	Brown	22906	539	509
000-00-0008	Bryant	19900826	0	2	2	Brown	22906	499	490
000-00-0009	Lynch	19900902	1	3	2	Brown	22906	499	493

And we have a codebook:

sid: Student social security number
sname: Student’s last name
dob: Student’s date of birth
gender: Indicator for student gender identification
raceeth: Factor variable indicatings student’s racial/ethnic identification
tid: ID variable for student’s teacher
tname: Last name of student’s teacher
zip: Student’s home address zip code
mathscr: Student’s end-of-year test math score
readscr: Student’s end-of-year test reading score

The admin_data.csv file contains observations for 9 students and has 10 variables associated with each observation. Of these, 1 uniquely identifies each student, 6 are associated with the student’s personal characteristics, 2 with each student’s teacher, and 2 with the student’s test scores in reading and math.

It appears that the school uses the student’s social security number to uniquely identify each student. As researchers interested in test scores, we have no need for this highly protected data element other than for its ability to uniquely identify a student or allow linking to other records. Since we do not need to link to other records at the moment, any unique number or string will work for our purposes. Similarly, we don’t really need the student’s last name.

Besides math (mathscr) and reading (readscr) scores, we may be interested in some of the other covariates. It’s likely that many of these data elements, however, also carry restrictions of varying severity. For example, the school may be able to share the student’s race/ethnicity and gender (provided the student is not otherwise identified) with most approved researchers, but can only share teachers’ names (tid) under more tightly restricted scenarios.

This is where our DUA crosswalk file comes in handy.

Set DUA

The first step in the process is to set the DUA crosswalk file. The crosswalk file can be in many different formats and, in most cases, will be read in automatically no matter the type. (If using a delimited file that isn’t a comma- or tab-separated value format, give the delimiter argument the delimiter string; if using an Excel file with more than one sheet, give the sheet argument the sheet name or number.) If successful, you will get message telling you so.

library(tidyverse)
library(duawranglr)

## get crosswalk and admin data files
dua_cw_file <- system.file('extdata', 'dua_cw.csv', package = 'duawranglr')
admin_file <- system.file('extdata', 'admin_data.csv', package = 'duawranglr')

## set the DUA crosswalk
set_dua_cw(dua_cw_file)

-- duawranglr note -----------------------------------------------------------------------
DUA crosswalk has been set!

Check DUA options

In case you’ve forgotten the data elements that are restricted at a particular level, you can check them using the see_dua_options() function with the level argument set to the appropriate level. If you want to compare restrictions across more than one level, you can give the level argument a vector.

## compare level II and III restrictions
see_dua_options(level = c('level_ii', 'level_iii'))
------------------------------------------------------------------------------------------
LEVEL NAME: level_ii
------------------------------------------------------------------------------------------

RESTRICTED VARIABLE NAMES:
 - dob
 - sid
 - sname
 - tname
 - zip
 
------------------------------------------------------------------------------------------
LEVEL NAME: level_iii
------------------------------------------------------------------------------------------

RESTRICTED VARIABLE NAMES:
 - sid
 - sname
 - tname
 
------------------------------------------------------------------------------------------

Alternately, you can see restrictions at all levels if you leave the level argument at its default NULL value.

## check all level restrictions
see_dua_options()
------------------------------------------------------------------------------------------
LEVEL NAME: level_i
------------------------------------------------------------------------------------------

RESTRICTED VARIABLE NAMES:
 - dob
 - gender
 - raceeth
 - sid
 - sname
 - tid
 - tname
 - zip
 
------------------------------------------------------------------------------------------
LEVEL NAME: level_ii
------------------------------------------------------------------------------------------

RESTRICTED VARIABLE NAMES:
 - dob
 - sid
 - sname
 - tname
 - zip
 
------------------------------------------------------------------------------------------
LEVEL NAME: level_iii
------------------------------------------------------------------------------------------

RESTRICTED VARIABLE NAMES:
 - sid
 - sname
 - tname
 
------------------------------------------------------------------------------------------

Set DUA level

After consultation with our data partner, we’ve decided that data for this project need to be set at Level II. Because no level allows us to use the current unique ID, sid, we also need to deidentify the data. We could just delete the sid column, but for reasons discussed below, it will be better if we use it to make new, non-identifiable but unique IDs. Therefore, we use additional arguments in set_dua_level() to note that deidentification is required and set the targeted ID column.

## set DUA level
set_dua_level('level_ii', deidentify_required = TRUE, id_column = 'sid')
-- duawranglr note -----------------------------------------------------------------------
Unique IDs in [ sid ] must be deidentified; use -deid_dua()-.

Check DUA level

As we’re preparing the data, we can check our restriction level and the data element names it restricts using see_dua_level().

## see set DUA level 
see_dua_level(show_restrictions = TRUE)
------------------------------------------------------------------------------------------
You have set restrictions at [ level_ii ]
------------------------------------------------------------------------------------------

RESTRICTED VARIABLE NAMES:
 - dob
 - sid
 - sname
 - tname
 - zip
 
------------------------------------------------------------------------------------------

Administrative data

After loading some libraries, we’ll first read in the raw administrative data file and confirm that it has nine observations and the data elements we expect.

## read in raw administrative data
df <- read_dua_file(admin_file)
df

 [38;5;246m# A tibble: 9 × 10 [39m
  sid         sname    dob      gender raceeth   tid tname   zip mathscr readscr
   [3m [38;5;246m<chr> [39m [23m        [3m [38;5;246m<chr> [39m [23m     [3m [38;5;246m<chr> [39m [23m      [3m [38;5;246m<int> [39m [23m    [3m [38;5;246m<int> [39m [23m  [3m [38;5;246m<int> [39m [23m  [3m [38;5;246m<chr> [39m [23m  [3m [38;5;246m<int> [39m [23m    [3m [38;5;246m<int> [39m [23m    [3m [38;5;246m<int> [39m [23m
 [38;5;250m1 [39m 000-00-0001 Schaefer 19900114      0       2     1 Smith  [4m2 [24m [4m2 [24m906     515     496
 [38;5;250m2 [39m 000-00-0002 Hodges   19900225      0       1     1 Smith  [4m2 [24m [4m2 [24m906     488     489
 [38;5;250m3 [39m 000-00-0003 Kirby    19900305      0       4     1 Smith  [4m2 [24m [4m2 [24m906     522     498
 [38;5;250m4 [39m 000-00-0004 Estrada  19900419      0       3     1 Smith  [4m2 [24m [4m2 [24m906     516     524
 [38;5;250m5 [39m 000-00-0005 Nielsen  19900530      1       2     1 Smith  [4m2 [24m [4m2 [24m906     483     509
 [38;5;250m6 [39m 000-00-0006 Dean     19900621      1       1     2 Brown  [4m2 [24m [4m2 [24m906     503     523
 [38;5;250m7 [39m 000-00-0007 Hickman  19900712      1       1     2 Brown  [4m2 [24m [4m2 [24m906     539     509
 [38;5;250m8 [39m 000-00-0008 Bryant   19900826      0       2     2 Brown  [4m2 [24m [4m2 [24m906     499     490
 [38;5;250m9 [39m 000-00-0009 Lynch    19900902      1       3     2 Brown  [4m2 [24m [4m2 [24m906     499     493

Deidentify data

Single file or no existing crosswalk

We indicated that the data need to be deidentified, so a good first step in cleaning the raw data is to convert unique student id, sid, into a similarly unique, but unidentifiable value.

Why not just generate some random string for each value? Though we don’t care to merge these data with other files, we may need to do so in the future. If we randomly generate new IDs, discarding the old ones in the process, we will be stuck.

The deid_dua() function does two things:

It uses a secure SHA-2 algorithm to convert sensitive IDs into unique hexadecimal strings that cannot be reverted back to the originial IDs (important in the case such as ours when the unique ID is the student’s social security number);
It has the option to save a crosswalk file that links the old secure IDs to the new IDs.

Clearly, it defeats the purpose of deidentifying IDs if a crosswalk between old and new travels with the new data. But if the crosswalk file is keep in a secure location, perhaps on the same server that hosts the raw administrative data, then old IDs can be retrieved if necessary by those with the proper clearance to do so.

## deidentify data
df <- deid_dua(df, write_crosswalk = TRUE, id_length = 20)

Here’s what the saved crosswalk looks like:

 [38;5;246m# A tibble: 9 × 2 [39m
  sid         id                  
   [3m [38;5;246m<chr> [39m [23m        [3m [38;5;246m<chr> [39m [23m               
 [38;5;250m1 [39m 000-00-0001 a14856cc5d00f4513132
 [38;5;250m2 [39m 000-00-0002 a141ce13114a97751e1b
 [38;5;250m3 [39m 000-00-0003 de520f632a2c321ce5e6
 [38;5;250m4 [39m 000-00-0004 889d833f94ed7f37fd28
 [38;5;250m5 [39m 000-00-0005 2993f4bda3cd8631f5c2
 [38;5;250m6 [39m 000-00-0006 86c8de9a8d63f7d9f59b
 [38;5;250m7 [39m 000-00-0007 cdb300787c0b998890f7
 [38;5;250m8 [39m 000-00-0008 ef91ae029e71a6e0dbcc
 [38;5;250m9 [39m 000-00-0009 0fb2736cec2cd1214303

And here now is the data frame:

## show data frame
df
 [38;5;246m# A tibble: 9 × 10 [39m
  id                   sname    dob      gender raceeth   tid tname   zip mathscr readscr
   [3m [38;5;246m<chr> [39m [23m                 [3m [38;5;246m<chr> [39m [23m     [3m [38;5;246m<chr> [39m [23m      [3m [38;5;246m<int> [39m [23m    [3m [38;5;246m<int> [39m [23m  [3m [38;5;246m<int> [39m [23m  [3m [38;5;246m<chr> [39m [23m  [3m [38;5;246m<int> [39m [23m    [3m [38;5;246m<int> [39m [23m    [3m [38;5;246m<int> [39m [23m
 [38;5;250m1 [39m a14856cc5d00f4513132 Schaefer 19900114      0       2     1 Smith  [4m2 [24m [4m2 [24m906     515     496
 [38;5;250m2 [39m a141ce13114a97751e1b Hodges   19900225      0       1     1 Smith  [4m2 [24m [4m2 [24m906     488     489
 [38;5;250m3 [39m de520f632a2c321ce5e6 Kirby    19900305      0       4     1 Smith  [4m2 [24m [4m2 [24m906     522     498
 [38;5;250m4 [39m 889d833f94ed7f37fd28 Estrada  19900419      0       3     1 Smith  [4m2 [24m [4m2 [24m906     516     524
 [38;5;250m5 [39m 2993f4bda3cd8631f5c2 Nielsen  19900530      1       2     1 Smith  [4m2 [24m [4m2 [24m906     483     509
 [38;5;250m6 [39m 86c8de9a8d63f7d9f59b Dean     19900621      1       1     2 Brown  [4m2 [24m [4m2 [24m906     503     523
 [38;5;250m7 [39m cdb300787c0b998890f7 Hickman  19900712      1       1     2 Brown  [4m2 [24m [4m2 [24m906     539     509
 [38;5;250m8 [39m ef91ae029e71a6e0dbcc Bryant   19900826      0       2     2 Brown  [4m2 [24m [4m2 [24m906     499     490
 [38;5;250m9 [39m 0fb2736cec2cd1214303 Lynch    19900902      1       3     2 Brown  [4m2 [24m [4m2 [24m906     499     493

Links across multiple files with existing crosswalk

If the deidentified data frame is built from multiple files (e.g., a panel data set of observations across years), then we’ll want to reuse an existing crosswalk. Otherwise, the same original ID will end up with multiple new IDs and we won’t be able to link observations across data sets.

Let’s say we already have master crosswalk file that looks like this:

 [38;5;246m# A tibble: 9 × 2 [39m
  sid         id                  
   [3m [38;5;246m<chr> [39m [23m        [3m [38;5;246m<chr> [39m [23m               
 [38;5;250m1 [39m 000-00-0001 db3681caa7e4789c9a99
 [38;5;250m2 [39m 000-00-0002 8e13af4fbb998c26348f
 [38;5;250m3 [39m 000-00-0003 2c7f2f98f9ee0e3b69ba
 [38;5;250m4 [39m 000-00-0004 ed7041ab2076a84fe611
 [38;5;250m5 [39m 000-00-0005 d4180e00af840a7a8e29
 [38;5;250m6 [39m 000-00-0006 9d42b365e2e49989b620
 [38;5;250m7 [39m 000-00-0007 a997bd9ffc4ee8030081
 [38;5;250m8 [39m 000-00-0008 43fc27899df21721b0c5
 [38;5;250m9 [39m 000-00-0009 e3150a7010e9a08d52f0

Rather than create new IDs, we can use the existing_crosswalk argument to read in and use the new IDs we’ve already made. Everything else works the same as before.

df <- deid_dua(df, existing_crosswalk = 'master_crosswalk.csv')

The new ID values now match those from the crosswalk.

df
 [38;5;246m# A tibble: 9 × 10 [39m
  id                   sname    dob      gender raceeth   tid tname   zip mathscr readscr
   [3m [38;5;246m<chr> [39m [23m                 [3m [38;5;246m<chr> [39m [23m     [3m [38;5;246m<chr> [39m [23m      [3m [38;5;246m<int> [39m [23m    [3m [38;5;246m<int> [39m [23m  [3m [38;5;246m<int> [39m [23m  [3m [38;5;246m<chr> [39m [23m  [3m [38;5;246m<int> [39m [23m    [3m [38;5;246m<int> [39m [23m    [3m [38;5;246m<int> [39m [23m
 [38;5;250m1 [39m db3681caa7e4789c9a99 Schaefer 19900114      0       2     1 Smith  [4m2 [24m [4m2 [24m906     515     496
 [38;5;250m2 [39m 8e13af4fbb998c26348f Hodges   19900225      0       1     1 Smith  [4m2 [24m [4m2 [24m906     488     489
 [38;5;250m3 [39m 2c7f2f98f9ee0e3b69ba Kirby    19900305      0       4     1 Smith  [4m2 [24m [4m2 [24m906     522     498
 [38;5;250m4 [39m ed7041ab2076a84fe611 Estrada  19900419      0       3     1 Smith  [4m2 [24m [4m2 [24m906     516     524
 [38;5;250m5 [39m d4180e00af840a7a8e29 Nielsen  19900530      1       2     1 Smith  [4m2 [24m [4m2 [24m906     483     509
 [38;5;250m6 [39m 9d42b365e2e49989b620 Dean     19900621      1       1     2 Brown  [4m2 [24m [4m2 [24m906     503     523
 [38;5;250m7 [39m a997bd9ffc4ee8030081 Hickman  19900712      1       1     2 Brown  [4m2 [24m [4m2 [24m906     539     509
 [38;5;250m8 [39m 43fc27899df21721b0c5 Bryant   19900826      0       2     2 Brown  [4m2 [24m [4m2 [24m906     499     490
 [38;5;250m9 [39m e3150a7010e9a08d52f0 Lynch    19900902      1       3     2 Brown  [4m2 [24m [4m2 [24m906     499     493

Updates to existing crosswalk

In our example, we have nine students in the current file. Let’s say that though we have a crosswalk, it only has new IDs for the first five observations:

 [38;5;246m# A tibble: 5 × 2 [39m
  sid         id                  
   [3m [38;5;246m<chr> [39m [23m        [3m [38;5;246m<chr> [39m [23m               
 [38;5;250m1 [39m 000-00-0001 db3681caa7e4789c9a99
 [38;5;250m2 [39m 000-00-0002 8e13af4fbb998c26348f
 [38;5;250m3 [39m 000-00-0003 2c7f2f98f9ee0e3b69ba
 [38;5;250m4 [39m 000-00-0004 ed7041ab2076a84fe611
 [38;5;250m5 [39m 000-00-0005 d4180e00af840a7a8e29

If the existing crosswalk doesn’t have values for all observations, then deid_dua() will:

Match old IDs with new IDs that do exist in the crosswalk
Generate new IDs for the old IDs that don’t exist in the crosswalk
Update and save the crosswalk

The command is the same for a partial crosswalk as for a complete crosswalk.

df <- deid_dua(df, existing_crosswalk = 'crosswalk_partial.csv')

Notice that the new IDs for the first five observations match those that were already in the existing crosswalk. The last four are new.

df
 [38;5;246m# A tibble: 9 × 10 [39m
  id                   sname    dob      gender raceeth   tid tname   zip mathscr readscr
   [3m [38;5;246m<chr> [39m [23m                 [3m [38;5;246m<chr> [39m [23m     [3m [38;5;246m<chr> [39m [23m      [3m [38;5;246m<int> [39m [23m    [3m [38;5;246m<int> [39m [23m  [3m [38;5;246m<int> [39m [23m  [3m [38;5;246m<chr> [39m [23m  [3m [38;5;246m<int> [39m [23m    [3m [38;5;246m<int> [39m [23m    [3m [38;5;246m<int> [39m [23m
 [38;5;250m1 [39m db3681caa7e4789c9a99 Schaefer 19900114      0       2     1 Smith  [4m2 [24m [4m2 [24m906     515     496
 [38;5;250m2 [39m 8e13af4fbb998c26348f Hodges   19900225      0       1     1 Smith  [4m2 [24m [4m2 [24m906     488     489
 [38;5;250m3 [39m 2c7f2f98f9ee0e3b69ba Kirby    19900305      0       4     1 Smith  [4m2 [24m [4m2 [24m906     522     498
 [38;5;250m4 [39m ed7041ab2076a84fe611 Estrada  19900419      0       3     1 Smith  [4m2 [24m [4m2 [24m906     516     524
 [38;5;250m5 [39m d4180e00af840a7a8e29 Nielsen  19900530      1       2     1 Smith  [4m2 [24m [4m2 [24m906     483     509
 [38;5;250m6 [39m cab9fea86378585886d9 Dean     19900621      1       1     2 Brown  [4m2 [24m [4m2 [24m906     503     523
 [38;5;250m7 [39m f00b55e09be40f300f42 Hickman  19900712      1       1     2 Brown  [4m2 [24m [4m2 [24m906     539     509
 [38;5;250m8 [39m 8fa969f0b60f92534a85 Bryant   19900826      0       2     2 Brown  [4m2 [24m [4m2 [24m906     499     490
 [38;5;250m9 [39m de1a5fbc783036fd711b Lynch    19900902      1       3     2 Brown  [4m2 [24m [4m2 [24m906     499     493

Looking at the partial crosswalk, we see that it now has four new rows with new IDs each for the observations it didn’t have before.

 [38;5;246m# A tibble: 9 × 2 [39m
  sid         id                  
   [3m [38;5;246m<chr> [39m [23m        [3m [38;5;246m<chr> [39m [23m               
 [38;5;250m1 [39m 000-00-0001 db3681caa7e4789c9a99
 [38;5;250m2 [39m 000-00-0002 8e13af4fbb998c26348f
 [38;5;250m3 [39m 000-00-0003 2c7f2f98f9ee0e3b69ba
 [38;5;250m4 [39m 000-00-0004 ed7041ab2076a84fe611
 [38;5;250m5 [39m 000-00-0005 d4180e00af840a7a8e29
 [38;5;250m6 [39m 000-00-0006 cab9fea86378585886d9
 [38;5;250m7 [39m 000-00-0007 f00b55e09be40f300f42
 [38;5;250m8 [39m 000-00-0008 8fa969f0b60f92534a85
 [38;5;250m9 [39m 000-00-0009 de1a5fbc783036fd711b

Should we encounter those students in future files, deid_dua() will use the new IDs we just created.

Check data frame

If we try to write the data frame using the write_dua_df() function, we get an error.

## write data to disk with one last check
write_dua_df(df, 'cleaned_data.csv', output_type = 'csv')
-- duawranglr note -----------------------------------------------------------------------
Data set has not yet passed check. Run -check_dua_restrictions()- to check status.

Right, we haven’t removed all the restricted data elements. Following the directions, we can check to see what still needs to be removed using the check_dua_restrictions() function.

## check
check_dua_restrictions(df)
-- duawranglr note -----------------------------------------------------------------------
The following variables are not allowed at the current data usage level restriction [
level_ii ] and MUST BE REMOVED before saving:
 - sname
 - dob
 - tname
 - zip

We’ve successfully removed sid already (when we deidentified the data frame), but still have to remove the student’s last name, date of birth, teacher’s name, and zip code to meet level II restrictions. Once we remove those columns, we can check again.

## remove restricted columns
df <- df %>% select(-c(sname, dob, tname, zip))

## check again
check_dua_restrictions(df)
-- duawranglr note -----------------------------------------------------------------------
Data set has passed check and may be saved.

Success! And to be sure, here’s what our data frame looks like now:

df
 [38;5;246m# A tibble: 9 × 6 [39m
  id                   gender raceeth   tid mathscr readscr
   [3m [38;5;246m<chr> [39m [23m                  [3m [38;5;246m<int> [39m [23m    [3m [38;5;246m<int> [39m [23m  [3m [38;5;246m<int> [39m [23m    [3m [38;5;246m<int> [39m [23m    [3m [38;5;246m<int> [39m [23m
 [38;5;250m1 [39m db3681caa7e4789c9a99      0       2     1     515     496
 [38;5;250m2 [39m 8e13af4fbb998c26348f      0       1     1     488     489
 [38;5;250m3 [39m 2c7f2f98f9ee0e3b69ba      0       4     1     522     498
 [38;5;250m4 [39m ed7041ab2076a84fe611      0       3     1     516     524
 [38;5;250m5 [39m d4180e00af840a7a8e29      1       2     1     483     509
 [38;5;250m6 [39m cab9fea86378585886d9      1       1     2     503     523
 [38;5;250m7 [39m f00b55e09be40f300f42      1       1     2     539     509
 [38;5;250m8 [39m 8fa969f0b60f92534a85      0       2     2     499     490
 [38;5;250m9 [39m de1a5fbc783036fd711b      1       3     2     499     493

Write cleaned data frame to disk

Now that we’ve passed our check, we can write the level II secure data frame to disk. Just like the set_dua_cw() function, which automates reading in many types of files, write_dua_df() will write many types of files. See ?write_dua_df for options.

## write data to disk 
write_dua_df(df, 'cleaned_data_lev_ii.csv', output_type = 'csv')

Interactive template

Particularly for the first few times you use this package, you may need help remembering the steps. To help the process, the interactive make_dua_template() function will help you make a template script that you can then modify to meet your data cleaning needs. When called, the function will ask you a few yes or no questions and, based on your answers, build a template script that pre-fills some function arguments.

An example template script is printed below.

## save template to disk
make_dua_template('clean_data.R')

EXAMPLE

################################################################################
##
## [ Proj ] < general project name >
## [ File ] clean_data.R
## [ Auth ] < author name >
## [ Init ] 13 July 2026
##
################################################################################
 
## ---------------------------
## libraries
## ---------------------------
 
## NOTES: Include additional libraries using either -library()- or -require()-
## functions here.
 
## ---------------------------
## set DUA crosswalk
## ---------------------------
 
## NOTES: Choose the DUA agreement crosswalk file if you didn't when setting up
## the template. If the file is a delimited file that isn't a CSV or TSV, be
## sure to indicate the delimiter string with the -delimiter- argument.
## Similarly if the crosswalk is in an Excel file on any sheet beyond the
## first, set the -sheet- argument to the correct sheet.
 
set_dua(dua = '< dua crosswalk file name >')
 
## ---------------------------
## set DUA level
## ---------------------------
 
## NOTES: Choose the DUA agreement crosswalk level. If you indicated that the
## data should be deidentified, those options, including the ID column if
## choosen, are included below. If you did not indicate the name of the ID
## column to be deidentified, add its name after the -id_column- argument.
## 
## If you did not indicate that the data should be deidentified, but they
## should be, see ?deid_dua().
 
set_dua_level(level = '< level name >')
 
## ---------------------------
## data cleaning
## ---------------------------
 
## NOTES: Use standard scripts to build and clean data set here.
 
## ---------------------------
## check DUA restrictions
## ---------------------------
 
## NOTES: If your data frame includes restricted data elements or should have
## been deidentified and has not been, -check_dua_restrictions()- will return
## an error and stop. Fix above and rerun or set -remove_protected- arguement
## to TRUE to automatically remove restricted columns.
 
check_dua_restrictions(df = '< data frame >')
 
## ---------------------------
## write cleaned file
## ---------------------------
 
## NOTES: Write cleaned file to disk. Select the file type (e.g., CSV, TSV,
## Stata, Rdata) and include additional arguments required by -haven- or base R
## writing functions.
 
write_dua_df(df = '< data frame >', output_type = '< output file type >'
 
## -----------------------------------------------------------------------------
## end script
################################################################################