A note on the p-value: the p-value is a test of significance for the null hypothesis H0 that
there is no difference in the log-odds of the outcome between the reference group (captured by the intercept) and the explanatory variable (or one of its categories), or that the difference between the two groups equals zero: H0:b1=0 and Ha:b1≠0
If p<0.5, we reject H0 as we have evidence to suggest that the difference between the two groups does not equal zero.
Log-odds are not the most intuitive to interpret. Instead of discussing the change in the log-odds, we can calculate the odds ratio for a given variable by exponentiating the coefficient.
Odds ratio is read “have x times the odds of the outcome of interest compared to those in the reference group”.
Reference group of the outcome variable: by default, R creates uses the lowest coded group as the reference. The reference category can be changed by using the ‘relevel()’.
Relationship between Odds and Probabilities:
Odds=P/(1-P)
P=odds/(1+odds)
Odds=exp(log-odds)
P=exp(log-odds)/(1+exp(log-odds))
6.1.1.1 Separate models for each year
Be careful when subsetting survey data. Messes up standard errors if filter() is used to select the subset instead using the subset = “” option for survey objects.
Code
m <-list("2019 Logged Odds"=svyglm(did_wfh~ sex_cat + age_cat + NCHILD + NCHLT5 + CIHISPEED + county_pop_type + race_cat,subset = AGE <45& CanWorkFromHome =="Can WFH"& occ_2dig_labels =="Management, Business, Science, Arts",family =quasibinomial(),design = dstrata2019),"2021 Logged Odds"=svyglm(did_wfh~ sex_cat+ age_cat + NCHILD + NCHLT5 + CIHISPEED + county_pop_type +race_cat,subset = AGE <45& CanWorkFromHome =="Can WFH"& occ_2dig_labels =="Management, Business, Science, Arts",family =quasibinomial(),design = dstrata2021),"2019 Odds Ratio"=svyglm(did_wfh~ sex_cat + age_cat + NCHILD + NCHLT5 + CIHISPEED + county_pop_type + race_cat,subset = AGE <45& CanWorkFromHome =="Can WFH"& occ_2dig_labels =="Management, Business, Science, Arts",family =quasibinomial(),design = dstrata2019),"2021 Odds Ratio"=svyglm(did_wfh~ sex_cat+ age_cat + NCHILD + NCHLT5 + CIHISPEED + county_pop_type +race_cat,subset = AGE <45& CanWorkFromHome =="Can WFH"& occ_2dig_labels =="Management, Business, Science, Arts",family =quasibinomial(),design = dstrata2021)) # # modelsummary(m, exponentiate = TRUE, # does standard and exponentiated models together# statistic = c("conf.int",# "s.e. = {std.error}",# "p = {p.value}"),# stars = TRUE, shape = term ~ model + statistic,# notes = list('Subset of ACS Survey Data for 2019 and 2021',# 'Odds Ratios Shown in Table'),# title = 'Predictions for WFH in 2019 vs 2021',# output = "table.docx")modelsummary(m, exponentiate =c(FALSE,FALSE,TRUE,TRUE), # does standard and exponentiated models together statistic =c("s.e. = {std.error}", "p = {p.value}"),stars =TRUE, shape = term ~ model + statistic,notes =list('Subset of ACS Survey Data for 2019 and 2021','Odds Ratios Shown in Table'),title ='Predictions for WFH in 2019 vs 2021')
Predictions for WFH in 2019 vs 2021
2019 Logged Odds
2021 Logged Odds
2019 Odds Ratio
2021 Odds Ratio
Est.
s.e. = S.E.
p = p
Est.
s.e. = S.E.
p = p
Est.
s.e. = S.E.
p = p
Est.
s.e. = S.E.
p = p
(Intercept)
−4.083***
s.e. = 0.375
p = <0.001
−2.201***
s.e. = 0.174
p = <0.001
0.017***
s.e. = 0.006
p = <0.001
0.111***
s.e. = 0.019
p = <0.001
sex_catMale
0.316**
s.e. = 0.113
p = 0.005
0.044
s.e. = 0.062
p = 0.476
1.372**
s.e. = 0.155
p = 0.005
1.045
s.e. = 0.065
p = 0.476
age_cat25to34
1.389***
s.e. = 0.338
p = <0.001
0.542***
s.e. = 0.140
p = <0.001
4.011***
s.e. = 1.355
p = <0.001
1.719***
s.e. = 0.240
p = <0.001
age_cat35to44
1.846***
s.e. = 0.340
p = <0.001
0.749***
s.e. = 0.147
p = <0.001
6.336***
s.e. = 2.151
p = <0.001
2.115***
s.e. = 0.310
p = <0.001
NCHILD
−0.166*
s.e. = 0.074
p = 0.024
−0.226***
s.e. = 0.040
p = <0.001
0.847*
s.e. = 0.062
p = 0.024
0.798***
s.e. = 0.032
p = <0.001
NCHLT5
0.167
s.e. = 0.107
p = 0.121
0.251***
s.e. = 0.066
p = <0.001
1.181
s.e. = 0.127
p = 0.121
1.285***
s.e. = 0.085
p = <0.001
CIHISPEEDLacks Access
−0.335
s.e. = 0.329
p = 0.308
−0.609***
s.e. = 0.153
p = <0.001
0.715
s.e. = 0.235
p = 0.308
0.544***
s.e. = 0.083
p = <0.001
county_pop_typeUrban Counties
0.018
s.e. = 0.245
p = 0.940
1.384***
s.e. = 0.127
p = <0.001
1.019
s.e. = 0.249
p = 0.940
3.989***
s.e. = 0.507
p = <0.001
race_catAsian
−0.220
s.e. = 0.199
p = 0.270
0.362***
s.e. = 0.102
p = <0.001
0.803
s.e. = 0.160
p = 0.270
1.436***
s.e. = 0.147
p = <0.001
race_catBlack
−0.238
s.e. = 0.291
p = 0.414
−0.330*
s.e. = 0.139
p = 0.018
0.788
s.e. = 0.230
p = 0.414
0.719*
s.e. = 0.100
p = 0.018
race_catOther
−0.365
s.e. = 0.300
p = 0.224
−0.478***
s.e. = 0.103
p = <0.001
0.694
s.e. = 0.208
p = 0.224
0.620***
s.e. = 0.064
p = <0.001
Num.Obs.
7344
7389
7344
7389
R2
0.024
0.049
0.024
0.049
R2 Adj.
−0.155
−0.114
−0.155
−0.114
AIC
3783.8
9246.9
3783.8
9246.9
BIC
3851.7
9319.8
3851.7
9319.8
Log.Lik.
−1872.464
−4606.447
−1872.464
−4606.447
RMSE
0.25
0.47
0.25
0.47
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
Subset of ACS Survey Data for 2019 and 2021
Odds Ratios Shown in Table
6.1.2 OLS Model
Code
m <-list("2019 OLS"=svyglm(did_wfh~ sex_cat + age_cat + NCHLT5 + CIHISPEED + county_pop_type +race_cat,subset = AGE <45& CanWorkFromHome =="Can WFH"& occ_2dig_labels =="Management, Business, Science, Arts",design = dstrata2019),"2021 OLS"=svyglm(did_wfh ~ sex_cat+ age_cat + NCHLT5 + CIHISPEED + county_pop_type +race_cat,subset = AGE <45& CanWorkFromHome =="Can WFH"& occ_2dig_labels =="Management, Business, Science, Arts",design = dstrata2021)) # # modelsummary(m, exponentiate = TRUE, # does standard and exponentiated models together# statistic = c("conf.int",# "s.e. = {std.error}",# "p = {p.value}"),# stars = TRUE, shape = term ~ model + statistic,# notes = list('Subset of ACS Survey Data for 2019 and 2021',# 'Odds Ratios Shown in Table'),# title = 'Predictions for WFH in 2019 vs 2021',# output = "table.docx")modelsummary(m,# exponentiate = TRUE, # does standard and exponentiated models together statistic =c("s.e. = {std.error}", "p = {p.value}"),stars =TRUE, shape = term ~ model + statistic,notes =list('Subset of ACS Survey Data for 2019 and 2021'),title ='OLS - Separate Models for WFH in 2019 vs 2021')
What is the predicted probability (and 95% CI) that someone worked from home in 2019 and male, and between 25 and 34 years old?
Code
# Always include the intercept for prediction.# Specify a 1 for the intercept, a # for each continuous predictor# and a 1 for each non-reference level of a categorical variable.# If a predictor is at its reference level, specify a 0 or exclude it.#install.packages("faraway")library(faraway)ilogit(svycontrast(m1_2019, c("(Intercept)"=1,"age_cat25to34"=1 )))
contrast SE
contrast 0.026317 0.1151
Recode marital status and add back into regression