#Example code below: Read in excel file and rename columns so that it plays well with the other years' files.
<- read_xlsx("Fis_Fut_Rev_2023_Final.xlsx") %>%
revenue_fy23 rename(fy = 'FY',
fund = 'FUND',
fund_name = 'FUND NAME',
agency = 'AGENCY',
agency_name = 'AGENCY NAME',
source = 'REVENUE SOURCE',
source_name = 'REV SRC NAME',
receipts = 'AMOUNT'
)
<- read_xlsx("Fis_Fut_Exp_2023_Final.xlsx") %>%
exp_fy23 rename(fy = 'FY',
fund = 'FUND',
fund_name = 'FUND NAME',
agency = 'AGENCY',
agency_name = 'AGENCY NAME',
appr_org = 'ORGANIZATION',
org_name = 'ORGANIZATION NAME',
obj_seq_type = 'APPROPRIATION',
wh_approp_name = 'APPROPRIATION NAME',
# exp_net_xfer = 'NET OF TRANS AMOUNT',
expenditure = 'EXPENDED'
)
# %>%
# # these come from ioc_source file after merging
# mutate(data_source = "exp IOC Aug 2022",
# object = ,
# seq = ,
# type = ,
# fund_cat = FIND_COLUMN, #create fund_cat column
# fund_cat_name = FIND_NAME) # create fund_cat_name column
13 Gathering and Cleaning Data!
Information for the Fiscal Futures all funds database comes from detailed information provided by the Illinois Comptroller’s office. Currently the database includes 300,000 individual records for fiscal years 1998 to 2022.
- Do the FOIA request. In a week or so, they send the expenditure and revenue data as excel files.
- Checks whether there are any new agencies, re-used funds etc. Create a list of funds, agencies, fund names, etc. for the new year and compare it to the immediate prior year to identify new funds.
- Update the funds_ab_in file which shows the use of funds. Use criteria to determine if the new funds should be in or out of the all-funds frame.
- Change the variable names to be consistent with other files such as AGENCYNAME –> agency_name
- Once variable names are shared over all years of data, combine past years with newest year. All revenue files are in a `revenue` folder that I reference when I set the working directory. When adding new fiscal years, put the the newest year of data for revenue and expenditures in their respective folders.
Normally, when your receive the new fiscal year files from the Comptrollers office, you will need to change the variable names so that they are consistent with past years. This is an example of reading in the new file and changing the variable names. They seem to change almost every year in the file received from the FOIA so if the code breaks here, check to make sure that the columns you are trying to rename exist and are spelled correctly! Once variables are the same, you will want to save the file as a csv file in its Revenue/Expenditure file and bind all past years and the current year together in one dataframe.
The code chunk below takes the .dta files for all fiscal years before FY 2022 and binds them together. Variable names were manually changed by past researchers so that they were consistent across years.
For FY 2023 and after, .dta files can be avoided entirely and .csv or excel files will be used. All files before this year had been saved and passed on as .dta files for Stata code before the transition to R in Fall 2022.
For years after fy22: add line of code to bind csv files after binding the dta files together. Variable names must be identical to merge files together.
Code below reads in the csv files created in chunks above (allrevfiles.csv and allrexpfiles.csv). These files contain all years of data combined into one file BEFORE any recoding is done. Do not use this file for summing categories because it is just an inbetween step before recoding revenue and expenditure categories. It also does not contain changes in funds or agencies.
Code
# combined in past chunks called create-rev-csv and create-exp-csv
<- read_csv("./data/allrevfiles_Nov132023.csv") #combined but not recoded
allrevfiles23
<- read_csv("./data/allexpfiles_Nov132023.csv") #combined but not recoded allexpfiles23
13.0.1 Inspecting new FY23 data files
Data files for closed years have been obtained from IOC. The numbers of funds, agencies, organizations, and revenue sources below were found by using pivot tables in the codebook files in the FY2022 Box folder. This could also be done using R and grouping new files by fund, agency, source number, source names, etc..
Revenue File:
- 684 Fund Numbers
- 80 Agencies
- 1184 Revenue source numbers
- 1156 revenue source names
Expenditure File:
- 708 Fund Numbers
- 107 Agencies
- 98 Organization Numbers
- 313 Organization names
13.0.2 Finding new agencies and funds
General steps:
- Identify new and reused funds for newest fiscal year.
- Recode funds to take into account different fund numbers/names over the years. See [Recoding New and Reused Funds] for code chunk that does this.
- Update
fund_ab_in_CURRENTFY.xlsx
with any changes from previous fiscal year.
New Agencies, Funds, and Organizations from Expenditure files:
- Using the code below, I found 2 agencies, multiple funds, and a couple organizations that had not been specifically mentioned in the funds_ab_in file - AWM, FY2022.
- Note: One new org name and org number combo has an expenditure of $600 million for pensions payment.
Code
#allrevfiles23 <- read_csv("allrevfiles23.csv") #combined but not recoded
#allexpfiles23 <- read_csv("allexpfiles23.csv") #combined but not recoded
<- 2023 # Create variable for current fiscal year to save time for updating in future years.
cur_fy
#### From Expenditure Data #####
# agencies referenced in any year before 2023:
<- allexpfiles23 %>%
agencies_past filter(fy < cur_fy) %>%
mutate(agency == as.character(agency)) %>%
group_by(agency, agency_name) %>% unique() %>%
summarize(expenditure = sum(expenditure, na.rm = TRUE)) %>%
drop_na() %>%
arrange(agency)
# agencies_past # 148 agencies ever
# agencies in 2023 data:
<- allexpfiles23 %>%
agencies23 filter(fy == cur_fy) %>%
mutate(agency == as.character(agency)) %>%
group_by(agency, agency_name) %>%
summarize(expenditure = sum(expenditure, na.rm = TRUE))
# 280 and 533 are new agency codes from old code cleaning methodology.
# When creating fiscal future categories for the fiscal gap, 280 is assigned to Group = 920 ( Judicial) expenditure category
# 533 was assigned to "949. Other Boards and Commissions"
# no new agencies from fy22 to fy23:
anti_join(agencies23, agencies_past,
by = c("agency", "agency_name")) %>%
arrange(agency)
<- allexpfiles23 %>%
funds_past filter(fy < cur_fy) %>%
mutate(fund == as.character(fund)) %>%
group_by(fund, fund_name) %>%
summarize(count = n(), Expenditure = sum(expenditure, na.rm = TRUE)) %>%
drop_na()
<- allexpfiles23 %>%
funds23 filter(fy == cur_fy) %>%
mutate(fund == as.character(fund)) %>%
group_by(fund, fund_name) %>%
summarize(count = n(), Expenditure = sum(expenditure, na.rm = TRUE)) %>%
unique()
# 26 funds were in FY23 data that were not in past data:
anti_join(funds23, funds_past,
by = c("fund", "fund_name")) %>%
arrange(fund)
# Each year these must examined closely to determine if they are a new fund, a reused fund, or if there is just a slight difference in the name of the fund
# orgs_past for orgs in the past = 920 org groups ever
<- allexpfiles23 %>%
orgs_past filter(fy < cur_fy) %>%
mutate(appr_org == as.character(appr_org)) %>%
group_by(appr_org, org_name) %>% unique() %>%
summarize(Expenditure = sum(expenditure, na.rm = TRUE)) %>%
drop_na()
# orgs_past # 916 org groups ever
<- allexpfiles23 %>%
orgs23 filter(fy == cur_fy) %>%
mutate(appr_org = as.character(appr_org)) %>%
group_by(appr_org, org_name) %>%
summarize(Expenditure = sum(expenditure, na.rm = TRUE))
# orgs22 # 399 org groups this year
# 3 org number and org name combos are new for FY2023:
# Usually not a big deal, orgs are not used much in the gap calculation.
# but good to stay aware and look for any change that may be important.
anti_join(orgs23, orgs_past,
by = c("appr_org", "org_name")) %>%
arrange(appr_org)
Largest change for FY23 was the $184 million Grocery Tax Replacement fund and $20 million for fund 0156.
New Revenue Funds, Sources, and New Agencies:
Code
#### From Revenue Data ####
# agencies_past # 108 agencies ever
<- allrevfiles23 %>% filter(fy < cur_fy) %>%
agencies_past mutate(agency == as.character(agency)) %>%
group_by(agency, agency_name) %>% unique() %>%
summarize(Receipts = sum(receipts, na.rm = TRUE)) %>%
drop_na()
<- allrevfiles23 %>%
agencies23 filter(fy == cur_fy) %>%
mutate(agency == as.character(agency)) %>%
group_by(agency, agency_name) %>%
summarize(Receipts = sum(receipts, na.rm = TRUE))
# Finds 533 and 280 again.
anti_join(agencies23, agencies_past, by = c("agency", "agency_name")) %>%
arrange(agency)
<- allrevfiles23 %>%
funds_past filter(fy < cur_fy) %>%
mutate(fund == as.character(fund)) %>%
group_by(fund, fund_name) %>%
summarize(count = n(), Receipts = sum(receipts, na.rm = TRUE)) %>%
drop_na()
<- allrevfiles23 %>%
funds23 filter(fy == cur_fy) %>%
mutate(fund == as.character(fund)) %>%
group_by(fund, fund_name) %>%
summarize(count = n(), Receipts = sum(receipts, na.rm = TRUE)) %>%
unique() %>%
drop_na()
# 13 revenue funds were in FY23 revenue data that were not in past data
# some could be small fund name changes
anti_join(funds23, funds_past, by = c("fund", "fund_name")) %>%
arrange(fund)
<- allrevfiles23 %>%
sources_past filter(fy < cur_fy) %>%
mutate(source == as.character(source)) %>%
group_by(source, source_name) %>%
summarize(count = n(), Receipts = sum(receipts, na.rm = TRUE)) %>%
drop_na()
<- allrevfiles23 %>%
sources23 filter(fy == cur_fy) %>%
mutate(source == as.character(source)) %>%
group_by(source, source_name) %>%
summarize(count = n(), Receipts = sum(receipts, na.rm = TRUE)) %>%
unique()
# 16 revenue sources were in FY23 data that were not in past data
# some could be small source name changes:
anti_join(sources23, sources_past, by = c("source", "source_name")) %>%
arrange(source)
13.0.3 Recoding New and Reused Funds
New or reused funds were manually added to the funds_ab_in.xlsx
file and determined if they should or should not be included in Fiscal Future calculations.
For funds that were reused once, a 9 replaces the 0 as the first digit. If reused twice, then the first two values are 10.
- Ex. 0350 –> 9350 because its use changed.
- Ex. 0367 becomes 10367 because its use has changed twice now. There was fund 0367 originally, then its use changed and it was recoded as 9367, and now it changed again so it is a 10367.
- Excel file also has alternative ways to name funds (e.g. 0397-A and 0397-B) and variables for the year that the fund stopped being used.
New or reused funds revenue file recoding:
# if first character is a 0, replace with a 9 if its purpose has changed
<- allrevfiles23 %>%
rev_1998_2023
mutate(fund = ifelse(fy < 2002 & fund %in% c("0730", "0241", "0350", "0367", "0381", "0382", "0526", "0603", "0734", "0913", "0379"), str_replace(fund, "0","9"), fund)) %>%
mutate(fund = ifelse (fy < 2008 & fund %in% c("0027", "0033", "0037", "0058", "0062", "0066", "0075", "0083", "0116", "0119", "0120", "0122", "0148", "0149", "0157", "0158", "0166", "0194", "0201", "0209", "0211", "0217", "0223", "0231", "0234", "0253", "0320", "0503", "0505", "0512", "0516", "0531", "0532", "0533", "0547", "0563", "0579", "0591", "0606", "0616", "0624", "0659", "0662", "0665", "0676", "0710",
"0068", "0076", "0115", "0119", "0168", "0182", "0199", "0241", "0307", "0506", "0509", "0513"), str_replace(fund, "0","9"), fund)) %>%
mutate(fund = ifelse(fy < 2016 & fund %in% c("0263", "0399", "0409"), str_replace(fund, "0","9"), fund)) %>%
mutate(fund = ifelse(fy < 2017 & fund == "0364", str_replace(fund, "0","9"), fund)) %>%
mutate(fund = ifelse(fy < 2018 & fund %in% c("0818", "0767", "0671", "0593", "0578"), str_replace(fund, "0","9"), fund)) %>%
mutate(fund = ifelse(fy>1999 & fy < 2018 & fund == "0231", "10231", fund) ) %>%
mutate(fund = ifelse(fy < 2019 & fund %in% c("0161", "0489", "0500", "0612", "0893", "0766"), str_replace(fund, "0","9"), fund)) %>%
mutate(fund = ifelse(fy < 2020 & fund %in% c("0254", "0304", "0324", "0610", "0887", "0908", "0939", "0968"), str_replace(fund, "0","9"), fund)) %>%
mutate(fund = ifelse(fy < 2021 & fund %in% c("0255", "0325", "0348", "0967", "0972"), str_replace(fund, "0","9"), fund) ) %>%
#2022 changes
mutate(fund = ifelse(fy < 2022 & fund %in% c("0110","0165","0351", "0392", "0393", "0422", "0544", "0628", "0634", "0656", "0672", "0683", "0723", "0742", "0743"), str_replace(fund, "0","9"), as.character(fund))) %>% # replaces first 0 it finds with a 9
mutate(fund = ifelse(fy < 2022 & fund == "0367", "10367", as.character(fund)) # fund reused for 3rd time
%>%
)
# 2023 fund changes
mutate(fund = ifelse(fy < 2023 & fund %in% c("0099","0210","0246", "0250", "0264", "0268", "0300", "0311", "0403", "0448","0645", "0727", "0729", "0791"), str_replace(fund,"0", "9"), as.character(fund))) %>%
mutate(fund = ifelse(fy < 2023 & fund == "0734" , "10734", as.character(fund)),
fund = ifelse(fy<2023 & fund == "0820", "10820", as.character(fund) ) # fund reused for 3rd time
%>%
)
# other fund changes noticed on April 1, 2024
mutate(fund = ifelse(fy < 2023 & fund %in% c("0490", "0684", "0747", "0869"), str_replace(fund,"0", "9"), as.character(fund)))
Expenditure recoding:
Code
# if first character is a 0, replace with a 9
<- allexpfiles23 %>%
exp_1998_2023
mutate(fund = ifelse(fy < 2002 & fund %in% c("0730", "0241", "0350", "0367", "0381", "0382", "0526", "0603", "0734", "0913", "0379"), str_replace(fund, "0","9"), fund)) %>%
mutate(fund = ifelse(fy < 2008 & fund %in% c("0027", "0033", "0037", "0058", "0062", "0066", "0075", "0083", "0116", "0119", "0120", "0122", "0148", "0149", "0157", "0158", "0166", "0194", "0201", "0209", "0211", "0217", "0223", "0231", "0234", "0253", "0320", "0503", "0505", "0512", "0516", "0531", "0532", "0533", "0547", "0563", "0579", "0591", "0606", "0616", "0624", "0659", "0662", "0665", "0676", "0710",
"0068", "0076", "0115", "0119", "0168", "0182", "0199", "0241", "0307", "0506", "0509", "0513"), str_replace(fund, "0","9"), fund)) %>%
mutate(fund = ifelse(fy < 2016 & fund %in% c("0263", "0399", "0409"), str_replace(fund, "0","9"), fund)) %>%
mutate(fund = ifelse(fy < 2017 & fund == "0364", str_replace(fund, "0","9"), fund)) %>%
mutate(fund = ifelse(fy < 2018 & fund %in% c("0818", "0767", "0671", "0593", "0578"), str_replace(fund, "0","9"), fund)) %>%
mutate(fund = ifelse(fy>1999 & fy < 2018 & fund == "0231", "10231", fund) ) %>%
mutate(fund = ifelse(fy < 2019 & fund %in% c("0161", "0489", "0500", "0612", "0893", "0766"), str_replace(fund, "0","9"), fund)) %>%
mutate(fund = ifelse(fy < 2020 & fund %in% c("0254", "0304", "0324", "0610", "0887", "0908", "0939", "0968"), str_replace(fund, "0","9"), fund)) %>%
mutate(fund = ifelse(fy < 2021 & fund %in% c("0255", "0325", "0348", "0967", "0972"), str_replace(fund, "0","9"), fund)) %>%
#2022 changes
mutate(fund = ifelse(fy < 2022 & fund %in% c("0110","0165","0351", "0392", "0393", "0422", "0544", "0628", "0634", "0656", "0672", "0683","0723", "0742", "0743"), str_replace(fund, "0","9"), as.character(fund))) %>% # replaces first 0 it finds with a 9
mutate(fund = ifelse(fy < 2022 & fund == "0367", "10367", as.character(fund)) # fund reused for 3rd time
%>%
)
# 2023 fund changes
mutate(fund = ifelse(fy < 2023 & fund %in% c("0099","0210","0246", "0250", "0264", "0268", "0300", "0311", "0403", "0448","0645", "0727", "0729", "0791"), str_replace(fund,"0", "9"), as.character(fund))) %>%
mutate(fund = ifelse(fy < 2023 & fund == "0734" , "10734", as.character(fund)),
fund = ifelse(fy<2023 & fund == "0820", "10820", as.character(fund) ) # fund reused for 3rd time
%>%
)
# other fund changes noticed on April 1, 2024
mutate(fund = ifelse(fy < 2023 & fund %in% c("0490", "0684", "0747", "0869"), str_replace(fund,"0", "9"), as.character(fund)))
Code
<- exp_1998_2023 %>%
agencies_exp distinct(agency, agency_name) %>%
filter(!is.na(agency_name)) %>%
mutate(agency = as.character(agency))
<- exp_1998_2023 %>%
funds_exp distinct(fund, fund_name) %>%
filter(!is.na(fund_name))
= readxl::read_excel("data/funds_ab_in_2023.xlsx")
funds_ab_in_2023
<- exp_1998_2023 %>%
exp_temp select(-fund_name) %>%
arrange(fund, fy) %>%
# join funds_ab_in_2023 to exp_temp
left_join(funds_ab_in_2023, by = "fund")
# Agency == 799 for Statutory transfers
# Object == 1993 is for Interfund cash transfers
<- exp_temp %>%
exp_temp mutate(transfer = ifelse(org_name == "TRANSFERS", 1, 0),
trans_agency = ifelse(org_name == "TRANSFERS",
str_sub(obj_seq_type,1,3), NA),
trans_type = ifelse(org_name == "TRANSFERS",
str_sub(obj_seq_type, 4,9), NA)) %>%
mutate(
object = ifelse(transfer == 0, as.character(str_sub(obj_seq_type, 1, 4)), NA_character_), # appropriation object
sequence = ifelse(transfer == 0, as.character(str_sub(obj_seq_type, 5,6)), NA_character_),
type = ifelse(transfer == 0, str_sub(obj_seq_type, 7,8), NA_character_) # appropriation type
)
- the initial combined and years of data are saved as dataframes named
exp_1998_2023
andrev_1998_2023
. These are then saved as exp_temp and rev_temp while recoding variables. This is BEFORE creating Fiscal Future category groups and further data cleaning done in the Calculating the Fiscal Gap page.
Update Agencies: Some agencies have merged with others or changed names over time.
[[ TO DO: Add appendix item of Current agencies (with indented past agencies) from the Methodology Document. ]]
Code
# recodes old agency numbers to consistent agency number
<- exp_temp %>%
exp_temp
mutate(agency = case_when(
=="438"| agency=="475" |agency == "505") ~ "440",
(agency# financial institution & professional regulation &
# banks and real estate --> coded as financial and professional reg
== "473" ~ "588", # nuclear safety moved into IEMA
agency
=="531" | agency =="577") ~ "532", # coded as EPA
(agency
=="556" | agency == "538") ~ "406", # coded as agriculture
(agency
== "560" ~ "592", # IL finance authority (fire trucks and agriculture stuff)to state fire marshal
agency
== "570" & fund == "0011" ~ "494", # city of Chicago road fund to transportation
agency TRUE ~ (as.character(agency))))
#
# exp_temp <- exp_temp %>% filter(!is.na(fy))
# ## Fill in missing names for funds
# exp_temp <- exp_temp %>%
# select(-c(agency_name)) %>%
# left_join(funds_exp) %>%
# left_join(agencies_exp)
For aggregating revenue, use the rev_1998_2023
file, join the funds_ab_in_2023
file to it, and then join the ioc_source_type
file. Remember: You need to update the funds_ab_in
and ioc_source_type
file every year!
<- inner_join(rev_1998_2023, funds_ab_in_2023, by = "fund") %>%
rev_temp arrange(source)
<- rev_temp %>% distinct(fund, fund_name_ab) %>% filter(!is.na(fund_name_ab))
funds_rev
<- rev_temp %>% distinct(agency, agency_name) %>% filter(!is.na(agency_name)) %>%
agencies_rev mutate(agency = as.character(agency))
# need to update the ioc_source_type file every year!
<- readxl::read_xlsx("./data/ioc_source_updated23_AWM.xlsx")
ioc_source_type
<- left_join(rev_temp, ioc_source_type, by = "source")
rev_temp
<- rev_temp %>% distinct(source, source_name_AWM) %>% filter(!is.na(source_name_AWM))
sources_rev
# fund info to revenue for all years
<- rev_temp %>%
rev_temp select(-c(fund_name, agency_name,
%>%
source_name_AWM)) left_join(funds_rev) %>%
left_join(sources_rev)
# automatically used source, source name does not match for the join to work using source_name
# recodes old agency numbers to consistent agency number
<- rev_temp %>%
rev_temp
mutate(agency = case_when(
=="438"| agency=="475" |agency == "505") ~ "440", # financial institution & professional regulation &
(agency# banks and real estate --> coded as financial and professional reg
== "473" ~ "588", # nuclear safety moved into IEMA
agency =="531" | agency =="577") ~ "532", # coded as EPA
(agency =="556" | agency == "538") ~ "406", # coded as agriculture
(agency == "560" ~ "592", # IL finance authority (fire trucks and agriculture stuff)to state fire marshal
agency == "570" & fund == "0011" ~ "494", # city of Chicago road fund to transportation
agency
TRUE ~ (as.character(agency)))) %>%
#left_join(sources_rev) %>%
left_join(agencies_rev) # add correct agency names back in
13.0.4 Export Intermediate Files
Code
write_csv(rev_temp, file = "./data/rev_temp.csv")
write_csv(exp_temp, file = "./data/exp_temp.csv")