Collapsing

In some situations, you may want to use encodefrom() to collapse values, that is, group unique raw values into a smaller set of clean values / labels. For example, say you have the following data set, which gives each state’s census division number and name:

Data

id	state	cendiv	cendiv_name
1	AL	6	East South Central
2	AK	9	Pacific
3	AZ	8	Mountain
4	AR	7	West South Central
5	CA	9	Pacific
6	CO	8	Mountain
7	CT	1	New England
8	DE	5	South Atlantic
10	FL	5	South Atlantic
12	HI	9	Pacific
14	IL	3	East North Central
15	IN	3	East North Central
16	IA	4	West North Central
31	NJ	2	Middle Atlantic
33	NY	2	Middle Atlantic

Rather than using the nine census divisions, you would rather group states by their regions. You have the following crosswalk:

Crosswalk

cendiv	cenreg	cenregnm
1	1	Northeast
2	1	Northeast
3	2	Midwest
4	2	Midwest
5	3	South
6	3	South
7	3	South
8	4	West
9	4	West

As long as

raw values are unique in the crosswalk
clean and label columns have a 1:1 match

Then you can use encodefrom() to collapse categories as you move from raw to clean values.

library(crosswalkr)
library(dplyr)
library(haven)

## data
df <- tibble(id = c(1:8,10,12,14:16,31,33),
             state = c('AL','AK','AZ','AR','CA','CO','CT','DE','FL','HI',
                       'IL','IN','IA','NJ','NY'),
             cendiv = c(6,9,8,7,9,8,1,5,5,9,3,3,4,2,2),
             cendiv_name = c('East South Central','Pacific','Mountain',
                        'West South Central','Pacific','Mountain','New England',
                        'South Atlantic','South Atlantic','Pacific',
                        'East North Central','East North Central',
                        'West North Central','Middle Atlantic',
                        'Middle Atlantic'))
             
## crosswalk
cw <- tibble(cendiv = 1:9,
             cenreg = c(1,1,2,2,3,3,3,4,4),
             cenregnm = c('Northeast','Northeast','Midwest','Midwest',
                          'South','South','South','West','West'))

## encode new column
df <- df %>% 
    mutate(cenreg = encodefrom(., var = cendiv, cw_file = cw, raw = cendiv,
                               clean = cenreg, label = cenregnm))

df

## # A tibble: 15 × 5
##       id state cendiv cendiv_name        cenreg       
##    <dbl> <chr>  <dbl> <chr>              <dbl+lbl>    
##  1     1 AL         6 East South Central 3 [South]    
##  2     2 AK         9 Pacific            4 [West]     
##  3     3 AZ         8 Mountain           4 [West]     
##  4     4 AR         7 West South Central 3 [South]    
##  5     5 CA         9 Pacific            4 [West]     
##  6     6 CO         8 Mountain           4 [West]     
##  7     7 CT         1 New England        1 [Northeast]
##  8     8 DE         5 South Atlantic     3 [South]    
##  9    10 FL         5 South Atlantic     3 [South]    
## 10    12 HI         9 Pacific            4 [West]     
## 11    14 IL         3 East North Central 2 [Midwest]  
## 12    15 IN         3 East North Central 2 [Midwest]  
## 13    16 IA         4 West North Central 2 [Midwest]  
## 14    31 NJ         2 Middle Atlantic    1 [Northeast]
## 15    33 NY         2 Middle Atlantic    1 [Northeast]

2026-02-06

Data

Crosswalk