Multiple Treatment Groups Impact Evaluation: Advanced Methods for Employment Policy Analysis
longworkR Package
2026-02-25
Source:vignettes/multiple-treatment-groups-evaluation.Rmd
multiple-treatment-groups-evaluation.RmdIntroduction
The Challenge of Multiple Treatment Groups
Employment policy evaluation frequently involves complex treatment scenarios where individuals may receive different types or intensities of interventions. Traditional impact evaluation methods designed for binary treatment scenarios (treatment vs. control) may be insufficient for understanding:
- Multiple Program Types: Job training vs. wage subsidies vs. placement assistance
- Treatment Intensity: Full-time vs. part-time training programs
- Sequential Treatments: Individuals receiving multiple interventions over time
- Bundled Interventions: Programs combining multiple components (training + subsidies + counseling)
- Dose-Response Relationships: How outcomes vary with treatment intensity or duration
This vignette demonstrates advanced methods for evaluating multiple
treatment scenarios using the longworkR package, focusing
on practical implementation with real-world employment data.
Methodological Framework
Core Challenges in Multiple Treatment Evaluation
- Selection Bias Across Treatment Groups: Different types of individuals may select into (or be selected for) different treatment types
- Confounding by Treatment Intensity: Individuals receiving more intensive treatments may have different baseline characteristics
- Multiple Comparison Problems: Testing multiple treatment effects increases the risk of false discoveries
- Heterogeneous Treatment Effects: Effects may vary across subgroups and treatment combinations
- Dynamic Treatment Assignment: Treatments may be assigned based on previous outcomes or characteristics
Methodological Approaches Covered
This vignette covers several methodological approaches:
- Generalized Difference-in-Differences: Extending DiD to multiple treatment groups and time periods
- Propensity Score Matching with Multiple Treatments: Using generalized propensity scores
- Dose-Response Analysis: Examining continuous treatment intensity
- Machine Learning Methods: Random forests and causal forests for heterogeneous effects
- Synthetic Control Extensions: Multiple donor pools and treatment timing
- Treatment Effect Heterogeneity Analysis: Exploring how effects vary across subgroups
Data Preparation and Setup
Sample Data Generation
Let’s begin by creating a comprehensive synthetic dataset that mirrors real-world employment policy scenarios:
# Set seed for reproducibility
set.seed(42)
# Generate comprehensive employment data with multiple treatments
generate_multi_treatment_data <- function(n_individuals = 2000,
n_time_periods = 8,
treatment_start_period = 4) {
cat("Generating multi-treatment employment data...\n")
cat("Individuals:", n_individuals, "\n")
cat("Time periods:", n_time_periods, "\n")
cat("Treatment start:", treatment_start_period, "\n\n")
# Individual-level characteristics (time-invariant)
individuals <- data.table(
cf = 1:n_individuals,
# Demographics
age = round(runif(n_individuals, 22, 55)),
gender = sample(c("M", "F"), n_individuals, replace = TRUE),
education = sample(c("basic", "secondary", "tertiary"), n_individuals,
replace = TRUE, prob = c(0.3, 0.5, 0.2)),
region = sample(c("north", "center", "south", "islands"), n_individuals,
replace = TRUE, prob = c(0.3, 0.25, 0.35, 0.1)),
# Baseline employment characteristics
baseline_employment_rate = pmax(0, pmin(1, rnorm(n_individuals, 0.6, 0.3))),
baseline_wage = pmax(800, rnorm(n_individuals, 2200, 600)),
previous_unemployment_duration = pmax(0, rexp(n_individuals, 1/180)), # days
sector_experience = sample(c("manufacturing", "services", "construction", "other"),
n_individuals, replace = TRUE, prob = c(0.25, 0.45, 0.15, 0.15)),
# Unobserved heterogeneity (affects both selection and outcomes)
motivation = rnorm(n_individuals, 0, 1),
ability = rnorm(n_individuals, 0, 1),
# Network effects and regional characteristics
regional_unemployment = rnorm(n_individuals, 0.08, 0.02),
network_quality = rnorm(n_individuals, 0, 1)
)
# Treatment assignment mechanism (realistic selection process)
# Probability of treatment depends on individual characteristics
individuals[, prob_any_treatment := plogis(
-1.2 +
0.3 * (age - mean(age)) / sd(age) + # Older individuals more likely
0.4 * (education == "basic") + # Lower education more likely
0.5 * (baseline_employment_rate < 0.4) + # Unemployed more likely
0.3 * (previous_unemployment_duration > 365) + # Long-term unemployed
-0.2 * ability + # Lower ability more likely
0.1 * motivation + # More motivated more likely
0.2 * (regional_unemployment > 0.08) # High unemployment regions
)]
# Assign treatments (mutually exclusive for simplicity)
individuals[, treatment_type := "control"]
# Sample treatment recipients
treatment_eligible <- which(runif(n_individuals) < individuals$prob_any_treatment)
# Among eligible, assign specific treatment types
treatment_types <- c("job_training", "wage_subsidy", "placement_assistance", "comprehensive")
for (i in treatment_eligible) {
# Conditional probabilities for different treatment types
probs <- c(
0.3, # job_training
0.25 + 0.2 * (individuals$baseline_wage[i] < 1800), # wage_subsidy (more likely for low-wage)
0.25 + 0.15 * (individuals$sector_experience[i] == "services"), # placement_assistance
0.2 + 0.1 * (individuals$education[i] == "basic") # comprehensive (more likely for low-educated)
)
probs <- probs / sum(probs)
individuals$treatment_type[i] <- sample(treatment_types, 1, prob = probs)
}
# Treatment intensity (for dose-response analysis)
individuals[treatment_type != "control", treatment_intensity := runif(.N, 0.3, 1.0)]
individuals[treatment_type == "control", treatment_intensity := 0]
# Treatment duration (realistic variation)
individuals[treatment_type == "job_training", treatment_duration :=
pmax(30, rnorm(.N, 180, 60))] # 3-9 months
individuals[treatment_type == "wage_subsidy", treatment_duration :=
pmax(60, rnorm(.N, 365, 90))] # 6-15 months
individuals[treatment_type == "placement_assistance", treatment_duration :=
pmax(14, rnorm(.N, 90, 30))] # 2-4 months
individuals[treatment_type == "comprehensive", treatment_duration :=
pmax(90, rnorm(.N, 270, 90))] # 6-12 months
individuals[treatment_type == "control", treatment_duration := 0]
cat("Treatment assignment summary:\n")
print(table(individuals$treatment_type))
cat("\nTreatment intensity by type:\n")
print(individuals[treatment_type != "control",
.(mean_intensity = mean(treatment_intensity),
mean_duration = mean(treatment_duration)),
by = treatment_type])
# Generate time-varying outcomes
panel_data <- data.table()
for (period in 1:n_time_periods) {
# Time period indicators
is_pre_treatment <- period < treatment_start_period
is_treatment_period <- period >= treatment_start_period
periods_since_treatment <- pmax(0, period - treatment_start_period)
period_data <- copy(individuals)
period_data[, period := period]
period_data[, post_treatment := as.numeric(is_treatment_period)]
period_data[, periods_since_treatment := periods_since_treatment]
# Generate employment outcomes with realistic dynamics
# Base employment rate (with individual fixed effects and time trends)
period_data[, base_employment := plogis(
# Individual fixed effects
-0.5 +
0.3 * (age - mean(age)) / sd(age) +
0.2 * (education == "tertiary") - 0.1 * (education == "basic") +
0.1 * ability +
0.05 * motivation +
# Time trends and regional effects
0.02 * period + # General time trend
-regional_unemployment * 2 +
# Lagged employment (persistence)
0.6 * baseline_employment_rate +
# Random variation
rnorm(.N, 0, 0.3)
)]
# Treatment effects (realistic timing and decay)
# Job training: builds up over time, permanent gains
period_data[treatment_type == "job_training" & is_treatment_period,
employment_effect := (
0.15 * treatment_intensity * # Max effect 15pp
pmin(1, periods_since_treatment / 3) * # Builds up over 3 periods
(1 + 0.1 * ability) # Heterogeneous by ability
)]
# Wage subsidy: immediate effect, decays after treatment ends
period_data[treatment_type == "wage_subsidy" & is_treatment_period,
employment_effect := (
0.20 * treatment_intensity * # Max effect 20pp
exp(-0.1 * pmax(0, periods_since_treatment - treatment_duration/30)) # Decay after treatment
)]
# Placement assistance: quick effect, moderate persistence
period_data[treatment_type == "placement_assistance" & is_treatment_period,
employment_effect := (
0.12 * treatment_intensity * # Max effect 12pp
(1 + 0.5 * network_quality / sd(network_quality)) # Heterogeneous by networks
)]
# Comprehensive: large, persistent effects
period_data[treatment_type == "comprehensive" & is_treatment_period,
employment_effect := (
0.25 * treatment_intensity * # Max effect 25pp
pmin(1, periods_since_treatment / 2) * # Builds up over 2 periods
(1 + 0.2 * (education == "basic")) # Larger for low-educated
)]
# Set effect to 0 for control and pre-treatment
period_data[treatment_type == "control" | is_pre_treatment, employment_effect := 0]
# Final employment outcome
period_data[, employment_rate := pmax(0, pmin(1, base_employment + employment_effect))]
# Generate other employment outcomes
# Contract quality (0-1 scale)
period_data[, contract_quality := pmax(0, pmin(1,
0.6 + 0.3 * employment_rate + 0.2 * (education == "tertiary") +
0.1 * ability + 0.5 * employment_effect + rnorm(.N, 0, 0.2)
))]
# Monthly wage (conditional on employment)
period_data[, monthly_wage := ifelse(employment_rate > runif(.N),
pmax(800, baseline_wage + 200 * employment_effect +
100 * (education == "tertiary") + 50 * ability + rnorm(.N, 0, 300)),
NA_real_
)]
# Employment stability (duration of current contract)
period_data[, stability_index := pmax(0, pmin(1,
0.5 + 0.4 * employment_rate + 0.2 * contract_quality +
0.3 * employment_effect + rnorm(.N, 0, 0.2)
))]
panel_data <- rbind(panel_data, period_data)
}
# Add realistic contract type information
panel_data[, COD_TIPOLOGIA_CONTRATTUALE :=
ifelse(employment_rate > runif(.N),
sample(c("A.01.00", "A.03.00", "A.03.01", "C.01.00", "A.07.00"),
.N, replace = TRUE,
prob = c(0.3, 0.25, 0.15, 0.2, 0.1)),
NA_character_)]
# Create date variables
period_data[, `:=`(
inizio = as.Date("2020-01-01") + (period - 1) * 90, # Quarterly data
fine = as.Date("2020-01-01") + period * 90 - 1,
durata = 90
)]
# Add unique identifiers
panel_data[, over_id := .I]
panel_data[, event_period := ifelse(post_treatment == 0, "pre", "post")]
cat("\nPanel data structure:\n")
cat("Total observations:", nrow(panel_data), "\n")
cat("Individuals:", length(unique(panel_data$cf)), "\n")
cat("Time periods:", length(unique(panel_data$period)), "\n")
cat("Treatment types:", paste(unique(panel_data$treatment_type), collapse = ", "), "\n")
return(panel_data[order(cf, period)])
}
# Generate the multi-treatment dataset
employment_data <- generate_multi_treatment_data(n_individuals = 1500,
n_time_periods = 8,
treatment_start_period = 4)
#> Generating multi-treatment employment data...
#> Individuals: 1500
#> Time periods: 8
#> Treatment start: 4
#>
#> Treatment assignment summary:
#>
#> comprehensive control job_training
#> 108 1005 119
#> placement_assistance wage_subsidy
#> 127 141
#>
#> Treatment intensity by type:
#> treatment_type mean_intensity mean_duration
#> <char> <num> <num>
#> 1: job_training 0.6523739 192.32671
#> 2: wage_subsidy 0.6565235 372.06084
#> 3: placement_assistance 0.6624302 88.64191
#> 4: comprehensive 0.6409595 272.93474
#>
#> Panel data structure:
#> Total observations: 12000
#> Individuals: 1500
#> Time periods: 8
#> Treatment types: job_training, control, wage_subsidy, placement_assistance, comprehensive
# Display first few observations
cat("\nFirst few observations:\n")
#>
#> First few observations:
print(head(employment_data[, .(cf, period, treatment_type, treatment_intensity,
employment_rate, contract_quality, monthly_wage)], 10))
#> cf period treatment_type treatment_intensity employment_rate
#> <int> <int> <char> <num> <num>
#> 1: 1 1 job_training 0.8059128 0.3930364
#> 2: 1 2 job_training 0.8059128 0.4846945
#> 3: 1 3 job_training 0.8059128 0.4185177
#> 4: 1 4 job_training 0.8059128 0.4804673
#> 5: 1 5 job_training 0.8059128 0.5570636
#> 6: 1 6 job_training 0.8059128 0.7195764
#> 7: 1 7 job_training 0.8059128 0.6854107
#> 8: 1 8 job_training 0.8059128 0.6077081
#> 9: 2 1 control 0.0000000 0.4471362
#> 10: 2 2 control 0.0000000 0.4786274
#> contract_quality monthly_wage
#> <num> <num>
#> 1: 0.4867909 2316.338
#> 2: 0.8120043 NA
#> 3: 0.7824502 NA
#> 4: 1.0000000 2004.666
#> 5: 0.6721608 2058.096
#> 6: 1.0000000 2549.175
#> 7: 1.0000000 2019.549
#> 8: 1.0000000 2137.574
#> 9: 1.0000000 NA
#> 10: 0.7549518 2018.670Exploratory Data Analysis
Before diving into causal inference, let’s explore the patterns in our multi-treatment data:
# Treatment assignment balance
cat("=== TREATMENT ASSIGNMENT ANALYSIS ===\n\n")
#> === TREATMENT ASSIGNMENT ANALYSIS ===
# Overall treatment distribution
treatment_summary <- employment_data[period == 1, .(
count = .N,
pct = round(100 * .N / nrow(employment_data[period == 1]), 1),
mean_age = round(mean(age), 1),
pct_basic_edu = round(100 * mean(education == "basic"), 1),
mean_baseline_employment = round(mean(baseline_employment_rate), 3),
mean_treatment_intensity = round(mean(treatment_intensity), 3)
), by = treatment_type][order(-count)]
cat("Treatment group characteristics:\n")
#> Treatment group characteristics:
print(treatment_summary)
#> treatment_type count pct mean_age pct_basic_edu
#> <char> <int> <num> <num> <num>
#> 1: control 1005 67.0 37.2 28.5
#> 2: wage_subsidy 141 9.4 39.3 35.5
#> 3: placement_assistance 127 8.5 40.0 34.6
#> 4: job_training 119 7.9 40.0 33.6
#> 5: comprehensive 108 7.2 40.6 32.4
#> mean_baseline_employment mean_treatment_intensity
#> <num> <num>
#> 1: 0.599 0.000
#> 2: 0.525 0.657
#> 3: 0.544 0.662
#> 4: 0.594 0.652
#> 5: 0.546 0.641
# Pre-treatment outcome trends
pre_treatment_trends <- employment_data[post_treatment == 0, .(
mean_employment = mean(employment_rate, na.rm = TRUE),
mean_contract_quality = mean(contract_quality, na.rm = TRUE),
mean_wage = mean(monthly_wage, na.rm = TRUE),
sd_employment = sd(employment_rate, na.rm = TRUE)
), by = .(treatment_type, period)]
cat("\nPre-treatment outcome trends:\n")
#>
#> Pre-treatment outcome trends:
print(pre_treatment_trends[order(treatment_type, period)])
#> treatment_type period mean_employment mean_contract_quality mean_wage
#> <char> <int> <num> <num> <num>
#> 1: comprehensive 1 0.4474222 0.7516225 2353.598
#> 2: comprehensive 2 0.4443377 0.7790758 2338.431
#> 3: comprehensive 3 0.4486113 0.7558793 2450.841
#> 4: control 1 0.4256400 0.7619149 2207.326
#> 5: control 2 0.4396384 0.7535936 2241.010
#> 6: control 3 0.4439875 0.7495433 2247.203
#> 7: job_training 1 0.4368132 0.6996987 2358.099
#> 8: job_training 2 0.4440037 0.7441888 2340.129
#> 9: job_training 3 0.4489339 0.6998427 2369.252
#> 10: placement_assistance 1 0.4483288 0.7708073 2269.537
#> 11: placement_assistance 2 0.4503942 0.7394647 2360.280
#> 12: placement_assistance 3 0.4467272 0.7455702 2232.589
#> 13: wage_subsidy 1 0.4347907 0.7328996 2201.072
#> 14: wage_subsidy 2 0.4500828 0.7355784 2038.727
#> 15: wage_subsidy 3 0.4426356 0.7298872 2205.894
#> sd_employment
#> <num>
#> 1: 0.1170884
#> 2: 0.1175563
#> 3: 0.1173348
#> 4: 0.1124019
#> 5: 0.1138138
#> 6: 0.1108270
#> 7: 0.1090347
#> 8: 0.1111828
#> 9: 0.1184854
#> 10: 0.1088876
#> 11: 0.1219554
#> 12: 0.1167565
#> 13: 0.1146889
#> 14: 0.1162271
#> 15: 0.1100313
# Visualize pre-treatment trends
trend_plot <- ggplot(pre_treatment_trends,
aes(x = period, y = mean_employment, color = treatment_type)) +
geom_line(size = 1) +
geom_point(size = 2) +
scale_color_viridis_d(name = "Treatment Type") +
labs(
title = "Pre-Treatment Employment Rate Trends by Treatment Group",
subtitle = "Assessing parallel trends assumption",
x = "Period",
y = "Mean Employment Rate"
) +
theme_minimal() +
theme(legend.position = "bottom")
print(trend_plot)
# Treatment intensity distribution
intensity_plot <- employment_data[period == 1 & treatment_type != "control", ] %>%
ggplot(aes(x = treatment_intensity, fill = treatment_type)) +
geom_histogram(bins = 20, alpha = 0.7, position = "identity") +
facet_wrap(~treatment_type, scales = "free_y") +
scale_fill_viridis_d() +
labs(
title = "Treatment Intensity Distribution by Treatment Type",
x = "Treatment Intensity",
y = "Count"
) +
theme_minimal() +
theme(legend.position = "none")
print(intensity_plot)
Generalized Difference-in-Differences
Multiple Treatment Groups DiD
The first approach for handling multiple treatments is to extend the standard DiD framework to accommodate multiple treatment groups and potentially multiple time periods.
cat("=== MULTIPLE TREATMENT GROUPS DIFFERENCE-IN-DIFFERENCES ===\n\n")
#> === MULTIPLE TREATMENT GROUPS DIFFERENCE-IN-DIFFERENCES ===
# Prepare data for multiple treatment DiD
# Create binary indicators for each treatment type
did_data <- copy(employment_data)
# Create treatment dummies
treatment_types <- c("job_training", "wage_subsidy", "placement_assistance", "comprehensive")
for (treatment in treatment_types) {
col_name <- paste0("treated_", gsub("_", "", treatment))
did_data[, (col_name) := as.numeric(treatment_type == treatment)]
}
# Create interaction terms (treatment × post)
for (treatment in treatment_types) {
treat_col <- paste0("treated_", gsub("_", "", treatment))
interact_col <- paste0(treat_col, "_post")
did_data[, (interact_col) := get(treat_col) * post_treatment]
}
cat("Multiple treatment DiD data structure:\n")
#> Multiple treatment DiD data structure:
cat("Observations:", nrow(did_data), "\n")
#> Observations: 12000
cat("Treatment columns created:", sum(grepl("treated_", names(did_data))), "\n")
#> Treatment columns created: 8
cat("Interaction columns created:", sum(grepl("_post", names(did_data))), "\n\n")
#> Interaction columns created: 4
# Run multiple treatment DiD using longworkR functions
# First, prepare the data in the format expected by difference_in_differences()
# Create outcome variables list
outcome_vars <- c("employment_rate", "contract_quality", "monthly_wage", "stability_index")
# Prepare treatment indicators
did_data[, is_treated := as.numeric(treatment_type != "control")]
# Display sample of prepared data
cat("Sample of prepared DiD data:\n")
#> Sample of prepared DiD data:
print(head(did_data[, .(cf, period, treatment_type, is_treated, post_treatment,
employment_rate, contract_quality)], 8))
#> cf period treatment_type is_treated post_treatment employment_rate
#> <int> <int> <char> <num> <num> <num>
#> 1: 1 1 job_training 1 0 0.3930364
#> 2: 1 2 job_training 1 0 0.4846945
#> 3: 1 3 job_training 1 0 0.4185177
#> 4: 1 4 job_training 1 1 0.4804673
#> 5: 1 5 job_training 1 1 0.5570636
#> 6: 1 6 job_training 1 1 0.7195764
#> 7: 1 7 job_training 1 1 0.6854107
#> 8: 1 8 job_training 1 1 0.6077081
#> contract_quality
#> <num>
#> 1: 0.4867909
#> 2: 0.8120043
#> 3: 0.7824502
#> 4: 1.0000000
#> 5: 0.6721608
#> 6: 1.0000000
#> 7: 1.0000000
#> 8: 1.0000000Basic Multiple Treatment DiD
# Run basic multiple treatment DiD
tryCatch({
# Basic DiD with longworkR
basic_multi_did <- difference_in_differences(
data = did_data,
outcome_vars = c("employment_rate", "contract_quality"),
treatment_var = "is_treated",
time_var = "post_treatment",
id_var = "cf",
control_vars = c("age", "baseline_employment_rate"),
fixed_effects = "both",
verbose = TRUE
)
cat("Basic multiple treatment DiD completed successfully\n")
if (!is.null(basic_multi_did$summary_table)) {
cat("\nBasic DiD Results:\n")
print(basic_multi_did$summary_table)
}
}, error = function(e) {
cat("Error in basic multiple treatment DiD:", e$message, "\n")
# Create manual DiD estimation as fallback
cat("Creating manual DiD estimation...\n")
# Manual DiD estimation for multiple outcomes
did_results <- list()
for (outcome in c("employment_rate", "contract_quality")) {
if (outcome %in% names(did_data)) {
# Calculate group means
group_means <- did_data[!is.na(get(outcome)), .(
mean_outcome = mean(get(outcome), na.rm = TRUE),
n_obs = .N
), by = .(treatment_type, post_treatment)]
cat(sprintf("\n%s - Group means:\n", outcome))
print(group_means)
# Calculate DiD estimates for each treatment vs control
control_pre <- group_means[treatment_type == "control" & post_treatment == 0, mean_outcome]
control_post <- group_means[treatment_type == "control" & post_treatment == 1, mean_outcome]
control_change <- control_post - control_pre
treatment_effects <- list()
for (treat_type in treatment_types) {
treat_pre <- group_means[treatment_type == treat_type & post_treatment == 0, mean_outcome]
treat_post <- group_means[treatment_type == treat_type & post_treatment == 1, mean_outcome]
if (length(treat_pre) > 0 && length(treat_post) > 0) {
treat_change <- treat_post - treat_pre
did_estimate <- treat_change - control_change
treatment_effects[[treat_type]] <- list(
pre_mean = treat_pre,
post_mean = treat_post,
change = treat_change,
did_estimate = did_estimate
)
cat(sprintf(" %s: %.4f (change: %.4f, control change: %.4f)\n",
treat_type, did_estimate, treat_change, control_change))
}
}
did_results[[outcome]] <- treatment_effects
}
}
basic_multi_did <- list(manual_results = did_results)
})
#> Basic multiple treatment DiD completed successfully
#>
#> Basic DiD Results:
#> outcome estimate std_error p_value conf_lower conf_upper
#> <char> <num> <num> <num> <num> <num>
#> 1: employment_rate 0.06031928 0.004780714 0.00000e+00 0.05094908 0.06968948
#> 2: contract_quality 0.04031798 0.010047870 6.00563e-05 0.02062415 0.06001180
#> significant n_obs
#> <lgcl> <int>
#> 1: TRUE 12000
#> 2: TRUE 12000
# Create visualization of multiple treatment effects
create_multi_treatment_plot <- function(data, outcome_var, title_suffix = "") {
if (!outcome_var %in% names(data)) {
cat("Outcome variable", outcome_var, "not found in data\n")
return(NULL)
}
# Calculate means by treatment and time
plot_data <- data[!is.na(get(outcome_var)), .(
mean_outcome = mean(get(outcome_var), na.rm = TRUE),
se_outcome = sd(get(outcome_var), na.rm = TRUE) / sqrt(.N),
n_obs = .N
), by = .(treatment_type, period, post_treatment)]
# Create the plot
p <- ggplot(plot_data, aes(x = period, y = mean_outcome, color = treatment_type)) +
geom_line(size = 1) +
geom_point(size = 2) +
geom_vline(xintercept = 3.5, linetype = "dashed", alpha = 0.7) +
scale_color_viridis_d(name = "Treatment Type") +
labs(
title = paste("Multiple Treatment Effects:", outcome_var, title_suffix),
subtitle = "Dashed line indicates treatment start",
x = "Period",
y = paste("Mean", tools::toTitleCase(gsub("_", " ", outcome_var)))
) +
theme_minimal() +
theme(legend.position = "bottom",
plot.title = element_text(size = 12))
return(p)
}
# Create plots for multiple outcomes
employment_plot <- create_multi_treatment_plot(did_data, "employment_rate")
if (!is.null(employment_plot)) print(employment_plot)
quality_plot <- create_multi_treatment_plot(did_data, "contract_quality")
if (!is.null(quality_plot)) print(quality_plot)
Treatment-Specific Analysis
Now let’s conduct pairwise comparisons between each treatment and the control group:
cat("=== TREATMENT-SPECIFIC PAIRWISE ANALYSIS ===\n\n")
#> === TREATMENT-SPECIFIC PAIRWISE ANALYSIS ===
# Function to run pairwise DiD
run_pairwise_did <- function(data, treatment_type_focus, outcome_vars) {
# Filter data to include only focal treatment and control
pairwise_data <- data[treatment_type %in% c("control", treatment_type_focus)]
if (nrow(pairwise_data) == 0) {
cat("No data available for treatment:", treatment_type_focus, "\n")
return(NULL)
}
# Create binary treatment indicator
pairwise_data[, is_focal_treatment := as.numeric(treatment_type == treatment_type_focus)]
cat(sprintf("Analyzing %s vs Control:\n", treatment_type_focus))
cat(sprintf(" Observations: %d\n", nrow(pairwise_data)))
cat(sprintf(" Treatment group size: %d\n", sum(pairwise_data$is_focal_treatment)))
cat(sprintf(" Control group size: %d\n", sum(1 - pairwise_data$is_focal_treatment)))
# Run DiD analysis
tryCatch({
pairwise_results <- difference_in_differences(
data = pairwise_data,
outcome_vars = outcome_vars[outcome_vars %in% names(pairwise_data)],
treatment_var = "is_focal_treatment",
time_var = "post_treatment",
id_var = "cf",
control_vars = c("age", "baseline_employment_rate"),
fixed_effects = "both",
verbose = FALSE
)
return(list(
treatment_type = treatment_type_focus,
results = pairwise_results,
n_treated = sum(pairwise_data$is_focal_treatment),
n_control = sum(1 - pairwise_data$is_focal_treatment)
))
}, error = function(e) {
cat("Error in pairwise DiD for", treatment_type_focus, ":", e$message, "\n")
# Manual calculation as fallback
manual_results <- list()
for (outcome in outcome_vars[outcome_vars %in% names(pairwise_data)]) {
group_means <- pairwise_data[!is.na(get(outcome)), .(
mean_outcome = mean(get(outcome), na.rm = TRUE)
), by = .(is_focal_treatment, post_treatment)]
if (nrow(group_means) == 4) { # Should have 4 combinations
control_pre <- group_means[is_focal_treatment == 0 & post_treatment == 0, mean_outcome]
control_post <- group_means[is_focal_treatment == 0 & post_treatment == 1, mean_outcome]
treat_pre <- group_means[is_focal_treatment == 1 & post_treatment == 0, mean_outcome]
treat_post <- group_means[is_focal_treatment == 1 & post_treatment == 1, mean_outcome]
did_estimate <- (treat_post - treat_pre) - (control_post - control_pre)
manual_results[[outcome]] <- list(
did_estimate = did_estimate,
control_change = control_post - control_pre,
treatment_change = treat_post - treat_pre
)
}
}
return(list(
treatment_type = treatment_type_focus,
manual_results = manual_results,
n_treated = sum(pairwise_data$is_focal_treatment),
n_control = sum(1 - pairwise_data$is_focal_treatment)
))
})
}
# Run pairwise analysis for each treatment type
pairwise_results <- list()
outcomes_to_analyze <- c("employment_rate", "contract_quality", "stability_index")
for (treatment in treatment_types) {
result <- run_pairwise_did(did_data, treatment, outcomes_to_analyze)
if (!is.null(result)) {
pairwise_results[[treatment]] <- result
}
}
#> Analyzing job_training vs Control:
#> Observations: 8992
#> Treatment group size: 952
#> Control group size: 8040
#> Analyzing wage_subsidy vs Control:
#> Observations: 9168
#> Treatment group size: 1128
#> Control group size: 8040
#> Analyzing placement_assistance vs Control:
#> Observations: 9056
#> Treatment group size: 1016
#> Control group size: 8040
#> Analyzing comprehensive vs Control:
#> Observations: 8904
#> Treatment group size: 864
#> Control group size: 8040
# Summarize pairwise results
cat("\n=== PAIRWISE DiD RESULTS SUMMARY ===\n\n")
#>
#> === PAIRWISE DiD RESULTS SUMMARY ===
pairwise_summary <- data.table()
for (treatment in names(pairwise_results)) {
result <- pairwise_results[[treatment]]
if ("results" %in% names(result) && !is.null(result$results$summary_table)) {
# Extract from longworkR results
summary_table <- result$results$summary_table
if (inherits(summary_table, "data.table") || is.data.frame(summary_table)) {
temp_summary <- as.data.table(summary_table)
temp_summary[, treatment_type := treatment]
temp_summary[, n_treated := result$n_treated]
temp_summary[, n_control := result$n_control]
pairwise_summary <- rbind(pairwise_summary, temp_summary, fill = TRUE)
}
} else if ("manual_results" %in% names(result)) {
# Extract from manual results
for (outcome in names(result$manual_results)) {
outcome_result <- result$manual_results[[outcome]]
temp_row <- data.table(
outcome = outcome,
estimate = outcome_result$did_estimate,
treatment_type = treatment,
n_treated = result$n_treated,
n_control = result$n_control
)
pairwise_summary <- rbind(pairwise_summary, temp_row, fill = TRUE)
}
}
}
if (nrow(pairwise_summary) > 0) {
cat("Pairwise DiD Results:\n")
print(pairwise_summary[, .(treatment_type, outcome, estimate, n_treated, n_control)])
# Create coefficient plot
if ("estimate" %in% names(pairwise_summary) && "outcome" %in% names(pairwise_summary)) {
coef_plot <- ggplot(pairwise_summary, aes(x = treatment_type, y = estimate, fill = outcome)) +
geom_col(position = "dodge", alpha = 0.8) +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
scale_fill_viridis_d(name = "Outcome") +
labs(
title = "Treatment-Specific DiD Estimates",
subtitle = "Pairwise comparisons vs. Control group",
x = "Treatment Type",
y = "DiD Estimate"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom")
print(coef_plot)
}
} else {
cat("No pairwise results available for summary\n")
}
#> Pairwise DiD Results:
#> treatment_type outcome estimate n_treated n_control
#> <char> <char> <num> <num> <num>
#> 1: job_training employment_rate 0.037899896 952 8040
#> 2: job_training contract_quality 0.053360635 952 8040
#> 3: job_training stability_index 0.055821762 952 8040
#> 4: wage_subsidy employment_rate 0.083214469 1128 8040
#> 5: wage_subsidy contract_quality 0.052172391 1128 8040
#> 6: wage_subsidy stability_index 0.026522770 1128 8040
#> 7: placement_assistance employment_rate 0.042858447 1016 8040
#> 8: placement_assistance contract_quality 0.009078897 1016 8040
#> 9: placement_assistance stability_index 0.043919354 1016 8040
#> 10: comprehensive employment_rate 0.075663835 864 8040
#> 11: comprehensive contract_quality 0.047205144 864 8040
#> 12: comprehensive stability_index 0.029855679 864 8040
Staggered Treatment Adoption
Many employment programs have staggered rollouts. Let’s analyze this scenario:
cat("=== STAGGERED TREATMENT ADOPTION ANALYSIS ===\n\n")
#> === STAGGERED TREATMENT ADOPTION ANALYSIS ===
# Create staggered treatment timing
staggered_data <- copy(employment_data)
# Assign different treatment start times by region
staggered_data[region == "north", treatment_start := 3]
staggered_data[region == "center", treatment_start := 4]
staggered_data[region == "south", treatment_start := 5]
staggered_data[region == "islands", treatment_start := 6]
# Update post_treatment indicator based on staggered timing
staggered_data[, post_treatment_staggered := as.numeric(period >= treatment_start)]
# Create relative time to treatment
staggered_data[, relative_period := period - treatment_start]
# Only keep observations within reasonable window around treatment
staggered_data <- staggered_data[relative_period >= -3 & relative_period <= 4]
cat("Staggered treatment structure:\n")
#> Staggered treatment structure:
staggered_summary <- staggered_data[, .(
n_obs = .N,
n_individuals = uniqueN(cf),
treatment_start = first(treatment_start),
n_treated = sum(treatment_type != "control")
), by = region]
print(staggered_summary)
#> region n_obs n_individuals treatment_start n_treated
#> <char> <int> <int> <num> <int>
#> 1: south 3584 512 5 1078
#> 2: islands 960 160 6 294
#> 3: north 3311 473 3 1162
#> 4: center 2840 355 4 1008
# Event study analysis for staggered adoption
event_study_data <- staggered_data[treatment_type != "control"] # Focus on treated units
if (nrow(event_study_data) > 0) {
# Calculate event study coefficients manually
# (This is a simplified version - real analysis would use proper event study methods)
event_coefficients <- event_study_data[, .(
mean_employment = mean(employment_rate, na.rm = TRUE),
mean_contract_quality = mean(contract_quality, na.rm = TRUE),
n_obs = .N
), by = relative_period]
# Reference period (period -1)
if (-1 %in% event_coefficients$relative_period) {
ref_employment <- event_coefficients[relative_period == -1, mean_employment]
ref_quality <- event_coefficients[relative_period == -1, mean_contract_quality]
event_coefficients[, employment_effect := mean_employment - ref_employment]
event_coefficients[, quality_effect := mean_contract_quality - ref_quality]
cat("\nEvent Study Coefficients (relative to period -1):\n")
print(event_coefficients[order(relative_period),
.(relative_period, employment_effect, quality_effect, n_obs)])
# Create event study plot
event_plot_data <- melt(event_coefficients[, .(relative_period, employment_effect, quality_effect)],
id.vars = "relative_period",
variable.name = "outcome",
value.name = "effect")
event_plot <- ggplot(event_plot_data, aes(x = relative_period, y = effect, color = outcome)) +
geom_line(size = 1) +
geom_point(size = 2) +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.7) +
geom_vline(xintercept = -0.5, linetype = "dotted", alpha = 0.7) +
scale_color_viridis_d(name = "Outcome",
labels = c("Employment Rate", "Contract Quality")) +
labs(
title = "Event Study: Treatment Effects Over Time",
subtitle = "Relative to treatment start (vertical line at period 0)",
x = "Relative Period (Treatment Start = 0)",
y = "Treatment Effect"
) +
theme_minimal() +
theme(legend.position = "bottom")
print(event_plot)
}
} else {
cat("No treated units available for event study analysis\n")
}
#>
#> Event Study Coefficients (relative to period -1):
#> relative_period employment_effect quality_effect n_obs
#> <num> <num> <num> <int>
#> 1: -3 -0.03605159 -0.01658141 329
#> 2: -2 -0.02113241 -0.01422966 495
#> 3: -1 0.00000000 0.00000000 495
#> 4: 0 0.02876828 0.01145003 495
#> 5: 1 0.06827231 0.01932633 495
#> 6: 2 0.09728313 0.04550405 495
#> 7: 3 0.10747513 0.04793280 446
#> 8: 4 0.12202976 0.04029082 292
Propensity Score Methods for Multiple Treatments
Generalized Propensity Score Matching
When dealing with multiple treatments, we need to extend propensity score methods to handle the multinomial treatment assignment.
cat("=== GENERALIZED PROPENSITY SCORE MATCHING ===\n\n")
#> === GENERALIZED PROPENSITY SCORE MATCHING ===
# Prepare data for propensity score analysis
# Use baseline (pre-treatment) characteristics only
baseline_data <- employment_data[period == 1] # Use first period as baseline
# Create comprehensive covariate set for propensity score estimation
covariates <- c("age", "education", "region", "baseline_employment_rate",
"baseline_wage", "previous_unemployment_duration", "sector_experience")
# Check covariate availability and create numeric versions
cat("Covariate preparation:\n")
#> Covariate preparation:
for (var in covariates) {
if (var %in% names(baseline_data)) {
cat(sprintf(" %s: available (%s)\n", var, class(baseline_data[[var]])[1]))
} else {
cat(sprintf(" %s: missing\n", var))
}
}
#> age: available (numeric)
#> education: available (character)
#> region: available (character)
#> baseline_employment_rate: available (numeric)
#> baseline_wage: available (numeric)
#> previous_unemployment_duration: available (numeric)
#> sector_experience: available (character)
# Create numeric versions of categorical variables for propensity score estimation
propensity_data <- copy(baseline_data)
# Convert categorical variables to dummy variables
propensity_data[, edu_secondary := as.numeric(education == "secondary")]
propensity_data[, edu_tertiary := as.numeric(education == "tertiary")]
propensity_data[, region_center := as.numeric(region == "center")]
propensity_data[, region_south := as.numeric(region == "south")]
propensity_data[, region_islands := as.numeric(region == "islands")]
propensity_data[, sector_services := as.numeric(sector_experience == "services")]
propensity_data[, sector_construction := as.numeric(sector_experience == "construction")]
propensity_data[, sector_other := as.numeric(sector_experience == "other")]
# Standardize continuous variables
continuous_vars <- c("age", "baseline_employment_rate", "baseline_wage", "previous_unemployment_duration")
for (var in continuous_vars) {
if (var %in% names(propensity_data)) {
mean_val <- mean(propensity_data[[var]], na.rm = TRUE)
sd_val <- sd(propensity_data[[var]], na.rm = TRUE)
new_var <- paste0(var, "_std")
propensity_data[, (new_var) := (get(var) - mean_val) / sd_val]
}
}
# Estimate generalized propensity scores using multinomial logit
# (Simplified implementation - in practice, use nnet::multinom or similar)
cat("\nEstimating generalized propensity scores...\n")
#>
#> Estimating generalized propensity scores...
# Create treatment type as factor
propensity_data[, treatment_factor := factor(treatment_type,
levels = c("control", treatment_types))]
# Calculate propensity scores manually using simple approach
# (In practice, use proper multinomial logit estimation)
calculate_simple_propensity <- function(data) {
# Count-based propensity scores (naive approach for demonstration)
treatment_counts <- table(data$treatment_type)
total_n <- nrow(data)
propensity_scores <- data.table(
treatment_type = names(treatment_counts),
raw_propensity = as.numeric(treatment_counts / total_n)
)
# Add some variation based on covariates (simplified)
result_data <- copy(data)
for (treat_type in names(treatment_counts)) {
prop_var <- paste0("prop_", gsub("_", "", treat_type))
base_prop <- treatment_counts[treat_type] / total_n
# Add covariate-based variation (simplified)
if (treat_type == "job_training") {
result_data[, (prop_var) := pmax(0.01, pmin(0.99,
base_prop + 0.1 * (education == "basic") - 0.05 * age_std))]
} else if (treat_type == "wage_subsidy") {
result_data[, (prop_var) := pmax(0.01, pmin(0.99,
base_prop + 0.08 * (baseline_employment_rate < 0.4) - 0.03 * baseline_wage_std))]
} else if (treat_type == "placement_assistance") {
result_data[, (prop_var) := pmax(0.01, pmin(0.99,
base_prop + 0.06 * sector_services - 0.02 * age_std))]
} else if (treat_type == "comprehensive") {
result_data[, (prop_var) := pmax(0.01, pmin(0.99,
base_prop + 0.12 * (education == "basic") + 0.05 * (previous_unemployment_duration > 365)))]
} else { # control
result_data[, (prop_var) := base_prop]
}
}
return(result_data)
}
propensity_results <- calculate_simple_propensity(propensity_data)
cat("Propensity score estimation completed\n")
#> Propensity score estimation completed
# Display propensity score distributions
prop_vars <- grep("^prop_", names(propensity_results), value = TRUE)
if (length(prop_vars) > 0) {
cat("\nPropensity score summary:\n")
for (prop_var in prop_vars[1:min(3, length(prop_vars))]) {
cat(sprintf("%s: mean=%.3f, sd=%.3f\n",
prop_var,
mean(propensity_results[[prop_var]], na.rm = TRUE),
sd(propensity_results[[prop_var]], na.rm = TRUE)))
}
}
#>
#> Propensity score summary:
#> prop_comprehensive: mean=0.115, sd=0.058
#> prop_control: mean=0.670, sd=0.000
#> prop_jobtraining: mean=0.110, sd=0.067
# Assess covariate balance before matching
assess_balance <- function(data, treatment_var, covariates) {
balance_results <- data.table()
for (cov in covariates) {
if (cov %in% names(data)) {
if (is.numeric(data[[cov]])) {
# For numeric variables, calculate standardized differences
cov_summary <- data[, .(
mean_val = mean(get(cov), na.rm = TRUE),
sd_val = sd(get(cov), na.rm = TRUE),
n_obs = sum(!is.na(get(cov)))
), by = c(treatment_var)]
if (nrow(cov_summary) >= 2) {
control_mean <- cov_summary[get(treatment_var) == "control", mean_val]
control_sd <- cov_summary[get(treatment_var) == "control", sd_val]
for (treat_type in unique(cov_summary[[treatment_var]])) {
if (treat_type != "control") {
treat_mean <- cov_summary[get(treatment_var) == treat_type, mean_val]
if (length(control_mean) > 0 && length(treat_mean) > 0 && control_sd > 0) {
std_diff <- (treat_mean - control_mean) / control_sd
balance_results <- rbind(balance_results, data.table(
covariate = cov,
treatment_type = treat_type,
control_mean = control_mean,
treatment_mean = treat_mean,
std_diff = std_diff,
variable_type = "numeric"
))
}
}
}
}
} else {
# For categorical variables, calculate proportions
prop_table <- data[, .N, by = c(treatment_var, cov)][,
prop := N / sum(N), by = c(treatment_var)]
balance_results <- rbind(balance_results, data.table(
covariate = cov,
treatment_type = "categorical",
control_mean = NA,
treatment_mean = NA,
std_diff = NA,
variable_type = "categorical"
))
}
}
}
return(balance_results)
}
# Assess balance before matching
pre_match_balance <- assess_balance(propensity_results, "treatment_type",
continuous_vars)
if (nrow(pre_match_balance) > 0) {
cat("\nCovariate balance before matching (standardized differences):\n")
print(pre_match_balance[!is.na(std_diff),
.(covariate, treatment_type, std_diff)][order(abs(std_diff), decreasing = TRUE)])
# Create balance plot
if (sum(!is.na(pre_match_balance$std_diff)) > 0) {
balance_plot <- ggplot(pre_match_balance[!is.na(std_diff)],
aes(x = reorder(paste(treatment_type, covariate), abs(std_diff)),
y = std_diff, fill = treatment_type)) +
geom_col(alpha = 0.8) +
geom_hline(yintercept = c(-0.1, 0.1), linetype = "dashed", alpha = 0.7) +
coord_flip() +
scale_fill_viridis_d() +
labs(
title = "Covariate Balance Before Matching",
subtitle = "Standardized differences (dashed lines at ±0.1)",
x = "Treatment-Covariate",
y = "Standardized Difference"
) +
theme_minimal() +
theme(legend.position = "bottom")
print(balance_plot)
}
}
#>
#> Covariate balance before matching (standardized differences):
#> covariate treatment_type std_diff
#> <char> <char> <num>
#> 1: age comprehensive 0.35473355
#> 2: age placement_assistance 0.29143746
#> 3: age job_training 0.29043775
#> 4: baseline_employment_rate wage_subsidy -0.28021984
#> 5: age wage_subsidy 0.22341382
#> 6: previous_unemployment_duration job_training 0.21319016
#> 7: baseline_employment_rate placement_assistance -0.20645188
#> 8: baseline_employment_rate comprehensive -0.19813731
#> 9: baseline_wage wage_subsidy -0.14769892
#> 10: baseline_wage job_training 0.12414457
#> 11: previous_unemployment_duration comprehensive -0.09005456
#> 12: baseline_wage comprehensive 0.08997597
#> 13: previous_unemployment_duration placement_assistance 0.07995354
#> 14: baseline_wage placement_assistance 0.07114169
#> 15: previous_unemployment_duration wage_subsidy 0.04890713
#> 16: baseline_employment_rate job_training -0.01842157
Matching Implementation
cat("=== IMPLEMENTING PROPENSITY SCORE MATCHING ===\n\n")
#> === IMPLEMENTING PROPENSITY SCORE MATCHING ===
# Try to use longworkR matching functions if available
tryCatch({
# Prepare data for matching using longworkR functions
# Create binary treatment indicators for each type
matching_data <- copy(propensity_results)
# Create separate matching analyses for each treatment vs. control
matching_results <- list()
for (focal_treatment in treatment_types[1:2]) { # Limit to first 2 for demo
cat(sprintf("Matching analysis: %s vs Control\n", focal_treatment))
# Filter data for binary comparison
binary_data <- matching_data[treatment_type %in% c("control", focal_treatment)]
binary_data[, is_treated_binary := as.numeric(treatment_type == focal_treatment)]
# Prepare covariates for matching
matching_covariates <- c("age_std", "baseline_employment_rate_std",
"edu_secondary", "edu_tertiary", "region_center", "region_south")
# Check covariate availability
available_covariates <- intersect(matching_covariates, names(binary_data))
if (length(available_covariates) >= 3 && nrow(binary_data) >= 20) {
# Use propensity_score_matching function if available
if (exists("propensity_score_matching") && is.function(propensity_score_matching)) {
matching_result <- propensity_score_matching(
data = binary_data,
treatment_var = "is_treated_binary",
matching_variables = available_covariates,
method = "nearest",
caliper = 0.1,
replace = FALSE
)
matching_results[[focal_treatment]] <- matching_result
cat(sprintf(" Matching completed for %s\n", focal_treatment))
} else {
# Manual nearest neighbor matching implementation
cat(sprintf(" Implementing manual matching for %s\n", focal_treatment))
treated_data <- binary_data[is_treated_binary == 1]
control_data <- binary_data[is_treated_binary == 0]
if (nrow(treated_data) > 0 && nrow(control_data) > 0) {
# Calculate distances (simplified Euclidean distance)
distances <- matrix(NA, nrow = nrow(treated_data), ncol = nrow(control_data))
for (i in 1:nrow(treated_data)) {
for (j in 1:nrow(control_data)) {
# Calculate distance based on available covariates
dist_components <- numeric(length(available_covariates))
for (k in seq_along(available_covariates)) {
cov <- available_covariates[k]
if (cov %in% names(treated_data) && cov %in% names(control_data)) {
t_val <- treated_data[i, get(cov)]
c_val <- control_data[j, get(cov)]
if (!is.na(t_val) && !is.na(c_val)) {
dist_components[k] <- (t_val - c_val)^2
}
}
}
distances[i, j] <- sqrt(sum(dist_components))
}
}
# Find nearest matches
matches <- data.table()
for (i in 1:nrow(treated_data)) {
if (!is.null(distances) && !is.na(distances[i, ])) {
best_match_idx <- which.min(distances[i, ])
if (length(best_match_idx) > 0) {
matches <- rbind(matches, data.table(
treated_id = treated_data[i, cf],
control_id = control_data[best_match_idx, cf],
distance = distances[i, best_match_idx]
))
}
}
}
matching_results[[focal_treatment]] <- list(
matches = matches,
n_treated = nrow(treated_data),
n_control = nrow(control_data),
n_matched = nrow(matches)
)
cat(sprintf(" Manual matching: %d matches from %d treated units\n",
nrow(matches), nrow(treated_data)))
}
}
} else {
cat(sprintf(" Insufficient data for matching %s (n=%d, covariates=%d)\n",
focal_treatment, nrow(binary_data), length(available_covariates)))
}
}
}, error = function(e) {
cat("Error in matching implementation:", e$message, "\n")
matching_results <- list()
})
#> Matching analysis: job_training vs Control
#> Error in matching implementation: Required packages not installed: cobalt
#> Please install with: install.packages(c(' cobalt '))
# Summarize matching results
if (length(matching_results) > 0) {
cat("\n=== MATCHING RESULTS SUMMARY ===\n")
for (treatment in names(matching_results)) {
result <- matching_results[[treatment]]
cat(sprintf("\n%s vs Control:\n", treatment))
if ("matches" %in% names(result)) {
cat(sprintf(" Matched pairs: %d\n", nrow(result$matches)))
cat(sprintf(" Match rate: %.1f%%\n", 100 * nrow(result$matches) / result$n_treated))
if (nrow(result$matches) > 0) {
cat(sprintf(" Average distance: %.3f\n", mean(result$matches$distance, na.rm = TRUE)))
}
} else if ("n_matched" %in% names(result)) {
cat(sprintf(" Matches created: %d out of %d treated\n", result$n_matched, result$n_treated))
}
}
} else {
cat("No matching results available\n")
}
#> No matching results availableDose-Response Analysis
Continuous Treatment Intensity
Employment programs often have varying intensities. Let’s analyze the dose-response relationship:
cat("=== DOSE-RESPONSE ANALYSIS ===\n\n")
#> === DOSE-RESPONSE ANALYSIS ===
# Focus on treated units only for dose-response analysis
dose_data <- employment_data[treatment_type != "control" & !is.na(treatment_intensity)]
cat("Dose-response data structure:\n")
#> Dose-response data structure:
cat("Observations:", nrow(dose_data), "\n")
#> Observations: 3960
cat("Individuals:", length(unique(dose_data$cf)), "\n")
#> Individuals: 495
cat("Treatment intensity range:",
sprintf("%.2f - %.2f", min(dose_data$treatment_intensity), max(dose_data$treatment_intensity)), "\n")
#> Treatment intensity range: 0.30 - 1.00
# Create intensity bins for analysis
dose_data[, intensity_bin := cut(treatment_intensity,
breaks = c(0, 0.4, 0.7, 1.0),
labels = c("Low", "Medium", "High"),
include.lowest = TRUE)]
cat("\nTreatment intensity distribution:\n")
#>
#> Treatment intensity distribution:
print(table(dose_data$intensity_bin, dose_data$treatment_type))
#>
#> comprehensive job_training placement_assistance wage_subsidy
#> Low 120 168 112 144
#> Medium 400 352 440 464
#> High 344 432 464 520
# Calculate dose-response relationships
dose_response_analysis <- function(data, outcome_var, treatment_var = "treatment_intensity") {
if (!outcome_var %in% names(data) || !treatment_var %in% names(data)) {
cat("Missing variables for dose-response analysis\n")
return(NULL)
}
# Pre-post comparison by intensity level
dose_results <- data[!is.na(get(outcome_var)), .(
mean_outcome = mean(get(outcome_var), na.rm = TRUE),
sd_outcome = sd(get(outcome_var), na.rm = TRUE),
n_obs = .N
), by = .(intensity_bin, post_treatment, treatment_type)]
# Calculate dose-response effects (post - pre)
pre_outcomes <- dose_results[post_treatment == 0,
.(intensity_bin, treatment_type, pre_mean = mean_outcome)]
post_outcomes <- dose_results[post_treatment == 1,
.(intensity_bin, treatment_type, post_mean = mean_outcome)]
dose_effects <- merge(pre_outcomes, post_outcomes,
by = c("intensity_bin", "treatment_type"), all = TRUE)
dose_effects[, dose_effect := post_mean - pre_mean]
return(list(
raw_results = dose_results,
dose_effects = dose_effects,
outcome = outcome_var
))
}
# Analyze dose-response for multiple outcomes
outcomes_for_dose <- c("employment_rate", "contract_quality", "stability_index")
dose_analyses <- list()
for (outcome in outcomes_for_dose) {
result <- dose_response_analysis(dose_data, outcome)
if (!is.null(result)) {
dose_analyses[[outcome]] <- result
cat(sprintf("\nDose-response results for %s:\n", outcome))
if (nrow(result$dose_effects) > 0) {
print(result$dose_effects[, .(treatment_type, intensity_bin, dose_effect)][
!is.na(dose_effect)][order(treatment_type, intensity_bin)])
}
}
}
#>
#> Dose-response results for employment_rate:
#> treatment_type intensity_bin dose_effect
#> <char> <fctr> <num>
#> 1: comprehensive Low 0.08781885
#> 2: comprehensive Medium 0.11977941
#> 3: comprehensive High 0.19244613
#> 4: job_training Low 0.04851973
#> 5: job_training Medium 0.07006170
#> 6: job_training High 0.10054470
#> 7: placement_assistance Low 0.05778817
#> 8: placement_assistance Medium 0.08923272
#> 9: placement_assistance High 0.11402025
#> 10: wage_subsidy Low 0.05732486
#> 11: wage_subsidy Medium 0.12174476
#> 12: wage_subsidy High 0.18242229
#>
#> Dose-response results for contract_quality:
#> treatment_type intensity_bin dose_effect
#> <char> <fctr> <num>
#> 1: comprehensive Low 0.010498099
#> 2: comprehensive Medium 0.032422643
#> 3: comprehensive High 0.075259398
#> 4: job_training Low 0.003578354
#> 5: job_training Medium 0.052260905
#> 6: job_training High 0.062151131
#> 7: placement_assistance Low -0.024308438
#> 8: placement_assistance Medium 0.036526775
#> 9: placement_assistance High 0.059131758
#> 10: wage_subsidy Low 0.072261226
#> 11: wage_subsidy Medium 0.066619544
#> 12: wage_subsidy High 0.073425161
#>
#> Dose-response results for stability_index:
#> treatment_type intensity_bin dose_effect
#> <char> <fctr> <num>
#> 1: comprehensive Low 0.04653691
#> 2: comprehensive Medium 0.05947717
#> 3: comprehensive High 0.08670896
#> 4: job_training Low 0.05367831
#> 5: job_training Medium 0.02325220
#> 6: job_training High 0.05343835
#> 7: placement_assistance Low 0.01015601
#> 8: placement_assistance Medium 0.03834000
#> 9: placement_assistance High 0.06160429
#> 10: wage_subsidy Low 0.04818525
#> 11: wage_subsidy Medium 0.06378350
#> 12: wage_subsidy High 0.09582110
# Create dose-response visualizations
if (length(dose_analyses) > 0) {
# Combined dose-response plot
all_effects <- data.table()
for (outcome in names(dose_analyses)) {
result <- dose_analyses[[outcome]]
if (!is.null(result$dose_effects) && nrow(result$dose_effects) > 0) {
temp_effects <- result$dose_effects[!is.na(dose_effect)]
temp_effects[, outcome_var := outcome]
all_effects <- rbind(all_effects, temp_effects, fill = TRUE)
}
}
if (nrow(all_effects) > 0) {
dose_plot <- ggplot(all_effects, aes(x = intensity_bin, y = dose_effect,
fill = treatment_type)) +
geom_col(position = "dodge", alpha = 0.8) +
facet_wrap(~outcome_var, scales = "free_y") +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.7) +
scale_fill_viridis_d(name = "Treatment Type") +
labs(
title = "Dose-Response Analysis: Treatment Effects by Intensity",
subtitle = "Change from pre-treatment to post-treatment period",
x = "Treatment Intensity Level",
y = "Treatment Effect"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom")
print(dose_plot)
# Create continuous dose-response curves
continuous_dose_data <- dose_data[!is.na(employment_rate), .(
treatment_intensity = round(treatment_intensity, 2),
employment_effect = employment_rate - mean(employment_rate[post_treatment == 0], na.rm = TRUE),
post_treatment,
treatment_type
), by = cf]
# Smooth dose-response relationship
if (nrow(continuous_dose_data[post_treatment == 1]) > 20) {
continuous_plot <- ggplot(continuous_dose_data[post_treatment == 1],
aes(x = treatment_intensity, y = employment_effect)) +
geom_point(alpha = 0.4, size = 1) +
geom_smooth(method = "loess", se = TRUE, color = "blue") +
facet_wrap(~treatment_type, scales = "free") +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.7) +
labs(
title = "Continuous Dose-Response Curves",
subtitle = "Employment rate effect vs. treatment intensity",
x = "Treatment Intensity",
y = "Employment Effect"
) +
theme_minimal()
print(continuous_plot)
}
}
}

# Statistical tests for dose-response
cat("\n=== STATISTICAL TESTS FOR DOSE-RESPONSE ===\n\n")
#>
#> === STATISTICAL TESTS FOR DOSE-RESPONSE ===
# Test for linear dose-response relationship
for (treatment in treatment_types[1:2]) { # Limit for demonstration
treatment_data <- dose_data[treatment_type == treatment & post_treatment == 1]
if (nrow(treatment_data) >= 10) {
cat(sprintf("Linear dose-response test for %s:\n", treatment))
# Simple linear regression
tryCatch({
for (outcome in c("employment_rate", "contract_quality")) {
if (outcome %in% names(treatment_data)) {
# Remove NA values
analysis_data <- treatment_data[!is.na(get(outcome)) & !is.na(treatment_intensity)]
if (nrow(analysis_data) >= 5) {
# Fit linear model
model_formula <- as.formula(paste(outcome, "~ treatment_intensity"))
lm_result <- lm(model_formula, data = analysis_data)
coef_intensity <- coef(lm_result)["treatment_intensity"]
p_val <- summary(lm_result)$coefficients["treatment_intensity", "Pr(>|t|)"]
cat(sprintf(" %s: coefficient=%.4f, p-value=%.3f\n",
outcome, coef_intensity, p_val))
}
}
}
}, error = function(e) {
cat(sprintf(" Error in dose-response test: %s\n", e$message))
})
cat("\n")
}
}
#> Linear dose-response test for job_training:
#> employment_rate: coefficient=0.0419, p-value=0.080
#> contract_quality: coefficient=-0.0227, p-value=0.594
#>
#> Linear dose-response test for wage_subsidy:
#> employment_rate: coefficient=0.2065, p-value=0.000
#> contract_quality: coefficient=0.1604, p-value=0.000Treatment Effect Heterogeneity
Subgroup Analysis
Let’s explore how treatment effects vary across different subgroups:
cat("=== TREATMENT EFFECT HETEROGENEITY ANALYSIS ===\n\n")
#> === TREATMENT EFFECT HETEROGENEITY ANALYSIS ===
# Define subgroups for heterogeneity analysis
heterogeneity_data <- copy(employment_data)
# Create subgroup indicators
heterogeneity_data[, `:=`(
young_worker = as.numeric(age <= 30),
low_educated = as.numeric(education == "basic"),
high_unemployment_region = as.numeric(region %in% c("south", "islands")),
low_baseline_employment = as.numeric(baseline_employment_rate <= 0.4),
experienced_worker = as.numeric(age >= 45)
)]
subgroups <- c("young_worker", "low_educated", "high_unemployment_region",
"low_baseline_employment", "experienced_worker")
cat("Subgroup definitions:\n")
#> Subgroup definitions:
for (subgroup in subgroups) {
n_in_subgroup <- sum(heterogeneity_data[period == 1, get(subgroup)])
pct_in_subgroup <- 100 * mean(heterogeneity_data[period == 1, get(subgroup)])
cat(sprintf(" %s: %d individuals (%.1f%%)\n", subgroup, n_in_subgroup, pct_in_subgroup))
}
#> young_worker: 406 individuals (27.1%)
#> low_educated: 455 individuals (30.3%)
#> high_unemployment_region: 672 individuals (44.8%)
#> low_baseline_employment: 406 individuals (27.1%)
#> experienced_worker: 457 individuals (30.5%)
# Analyze treatment effects by subgroup
subgroup_analysis <- function(data, treatment_focus, outcome_var, subgroup_var) {
# Filter data for focal treatment vs control
analysis_data <- data[treatment_type %in% c("control", treatment_focus)]
if (nrow(analysis_data) == 0 || !outcome_var %in% names(analysis_data) ||
!subgroup_var %in% names(analysis_data)) {
return(NULL)
}
# Calculate treatment effects by subgroup
subgroup_results <- analysis_data[!is.na(get(outcome_var)), .(
mean_outcome = mean(get(outcome_var), na.rm = TRUE),
n_obs = .N
), by = .(get(subgroup_var), treatment_type, post_treatment)]
setnames(subgroup_results, "get", subgroup_var)
# Calculate DiD estimates by subgroup
did_estimates <- data.table()
for (subgroup_val in unique(subgroup_results[[subgroup_var]])) {
subgroup_data <- subgroup_results[get(subgroup_var) == subgroup_val]
# Check if we have all necessary combinations
if (nrow(subgroup_data) == 4) { # 2 treatments × 2 time periods
control_pre <- subgroup_data[treatment_type == "control" & post_treatment == 0, mean_outcome]
control_post <- subgroup_data[treatment_type == "control" & post_treatment == 1, mean_outcome]
treat_pre <- subgroup_data[treatment_type == treatment_focus & post_treatment == 0, mean_outcome]
treat_post <- subgroup_data[treatment_type == treatment_focus & post_treatment == 1, mean_outcome]
if (length(control_pre) > 0 && length(control_post) > 0 &&
length(treat_pre) > 0 && length(treat_post) > 0) {
control_change <- control_post - control_pre
treat_change <- treat_post - treat_pre
did_estimate <- treat_change - control_change
did_estimates <- rbind(did_estimates, data.table(
subgroup_var = subgroup_var,
subgroup_val = subgroup_val,
treatment_type = treatment_focus,
outcome = outcome_var,
did_estimate = did_estimate,
control_change = control_change,
treatment_change = treat_change
))
}
}
}
return(did_estimates)
}
# Run subgroup analysis for key treatment-outcome combinations
heterogeneity_results <- data.table()
for (treatment in treatment_types[1:2]) { # Limit for demonstration
for (outcome in c("employment_rate", "contract_quality")) {
for (subgroup in subgroups[1:3]) { # Limit subgroups for demonstration
result <- subgroup_analysis(heterogeneity_data, treatment, outcome, subgroup)
if (!is.null(result) && nrow(result) > 0) {
heterogeneity_results <- rbind(heterogeneity_results, result)
}
}
}
}
# Display heterogeneity results
if (nrow(heterogeneity_results) > 0) {
cat("\n=== HETEROGENEITY RESULTS ===\n\n")
# Summary table
het_summary <- heterogeneity_results[, .(
subgroup_var, subgroup_val, treatment_type, outcome, did_estimate
)][order(treatment_type, outcome, subgroup_var, subgroup_val)]
cat("Treatment effect heterogeneity by subgroups:\n")
print(het_summary)
# Create heterogeneity visualization
het_plot <- ggplot(heterogeneity_results,
aes(x = factor(subgroup_val), y = did_estimate,
fill = factor(subgroup_val))) +
geom_col(alpha = 0.8) +
facet_grid(outcome ~ paste(treatment_type, subgroup_var),
scales = "free", space = "free_x") +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.7) +
scale_fill_viridis_d(name = "Subgroup") +
labs(
title = "Treatment Effect Heterogeneity Analysis",
subtitle = "DiD estimates by subgroups (0 = not in subgroup, 1 = in subgroup)",
x = "Subgroup Status",
y = "DiD Estimate"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
strip.text = element_text(size = 8),
legend.position = "bottom")
print(het_plot)
# Test for significant heterogeneity
cat("\n=== TESTS FOR HETEROGENEOUS TREATMENT EFFECTS ===\n\n")
for (treatment in unique(heterogeneity_results$treatment_type)) {
for (outcome in unique(heterogeneity_results$outcome)) {
subset_results <- heterogeneity_results[
treatment_type == treatment & outcome == outcome]
if (nrow(subset_results) >= 4) { # Need multiple subgroups
cat(sprintf("%s - %s:\n", treatment, outcome))
# Simple heterogeneity test (compare subgroup differences)
het_test_data <- dcast(subset_results,
subgroup_var ~ subgroup_val,
value.var = "did_estimate")
if (ncol(het_test_data) == 3 && "0" %in% names(het_test_data) && "1" %in% names(het_test_data)) {
het_test_data[, difference := `1` - `0`]
print(het_test_data[!is.na(difference), .(subgroup_var, difference)])
}
cat("\n")
}
}
}
} else {
cat("No heterogeneity results available\n")
}
#>
#> === HETEROGENEITY RESULTS ===
#>
#> Treatment effect heterogeneity by subgroups:
#> subgroup_var subgroup_val treatment_type outcome
#> <char> <num> <char> <char>
#> 1: high_unemployment_region 0 job_training contract_quality
#> 2: high_unemployment_region 1 job_training contract_quality
#> 3: low_educated 0 job_training contract_quality
#> 4: low_educated 1 job_training contract_quality
#> 5: young_worker 0 job_training contract_quality
#> 6: young_worker 1 job_training contract_quality
#> 7: high_unemployment_region 0 job_training employment_rate
#> 8: high_unemployment_region 1 job_training employment_rate
#> 9: low_educated 0 job_training employment_rate
#> 10: low_educated 1 job_training employment_rate
#> 11: young_worker 0 job_training employment_rate
#> 12: young_worker 1 job_training employment_rate
#> 13: high_unemployment_region 0 wage_subsidy contract_quality
#> 14: high_unemployment_region 1 wage_subsidy contract_quality
#> 15: low_educated 0 wage_subsidy contract_quality
#> 16: low_educated 1 wage_subsidy contract_quality
#> 17: young_worker 0 wage_subsidy contract_quality
#> 18: young_worker 1 wage_subsidy contract_quality
#> 19: high_unemployment_region 0 wage_subsidy employment_rate
#> 20: high_unemployment_region 1 wage_subsidy employment_rate
#> 21: low_educated 0 wage_subsidy employment_rate
#> 22: low_educated 1 wage_subsidy employment_rate
#> 23: young_worker 0 wage_subsidy employment_rate
#> 24: young_worker 1 wage_subsidy employment_rate
#> subgroup_var subgroup_val treatment_type outcome
#> <char> <num> <char> <char>
#> did_estimate
#> <num>
#> 1: 0.05190142
#> 2: 0.02099585
#> 3: 0.03183745
#> 4: 0.05922648
#> 5: 0.04068377
#> 6: 0.03557836
#> 7: 0.05897884
#> 8: 0.06210604
#> 9: 0.05916858
#> 10: 0.06268874
#> 11: 0.05859552
#> 12: 0.06458198
#> 13: 0.07010775
#> 14: 0.05281309
#> 15: 0.05464613
#> 16: 0.08042035
#> 17: 0.05282669
#> 18: 0.09096573
#> 19: 0.12040041
#> 20: 0.12269142
#> 21: 0.11490030
#> 22: 0.13456559
#> 23: 0.12406068
#> 24: 0.11265962
#> did_estimate
#> <num>
#>
#> === TESTS FOR HETEROGENEOUS TREATMENT EFFECTS ===
#>
#> job_training - employment_rate:
#> Key: <subgroup_var>
#> subgroup_var difference
#> <char> <int>
#> 1: high_unemployment_region 0
#> 2: low_educated 0
#> 3: young_worker 0
#>
#> job_training - contract_quality:
#> Key: <subgroup_var>
#> subgroup_var difference
#> <char> <int>
#> 1: high_unemployment_region 0
#> 2: low_educated 0
#> 3: young_worker 0
#>
#> wage_subsidy - employment_rate:
#> Key: <subgroup_var>
#> subgroup_var difference
#> <char> <int>
#> 1: high_unemployment_region 0
#> 2: low_educated 0
#> 3: young_worker 0
#>
#> wage_subsidy - contract_quality:
#> Key: <subgroup_var>
#> subgroup_var difference
#> <char> <int>
#> 1: high_unemployment_region 0
#> 2: low_educated 0
#> 3: young_worker 0
Advanced Methods and Extensions
Machine Learning Approaches
Let’s explore machine learning methods for heterogeneous treatment effects:
cat("=== MACHINE LEARNING FOR HETEROGENEOUS EFFECTS ===\n\n")
#> === MACHINE LEARNING FOR HETEROGENEOUS EFFECTS ===
# Prepare data for machine learning analysis
ml_data <- copy(employment_data[period == 4]) # Use post-treatment period
# Create feature matrix
features <- c("age", "baseline_employment_rate", "baseline_wage",
"previous_unemployment_duration")
# Convert categorical variables
ml_data[, `:=`(
edu_basic = as.numeric(education == "basic"),
edu_secondary = as.numeric(education == "secondary"),
edu_tertiary = as.numeric(education == "tertiary"),
region_north = as.numeric(region == "north"),
region_center = as.numeric(region == "center"),
region_south = as.numeric(region == "south"),
region_islands = as.numeric(region == "islands"),
sector_manufacturing = as.numeric(sector_experience == "manufacturing"),
sector_services = as.numeric(sector_experience == "services"),
sector_construction = as.numeric(sector_experience == "construction")
)]
# Comprehensive feature set
ml_features <- c(features, "edu_basic", "edu_secondary", "edu_tertiary",
"region_center", "region_south", "region_islands",
"sector_services", "sector_construction")
# Check feature availability
available_features <- intersect(ml_features, names(ml_data))
cat("Available features for ML analysis:", length(available_features), "\n")
#> Available features for ML analysis: 12
cat("Features:", paste(available_features[1:8], collapse = ", "), "...\n\n")
#> Features: age, baseline_employment_rate, baseline_wage, previous_unemployment_duration, edu_basic, edu_secondary, edu_tertiary, region_center ...
# Implement simplified causal forest approach
# (In practice, use grf package for proper causal forest implementation)
implement_simple_causal_forest <- function(data, outcome_var, treatment_var, features) {
# Remove missing values
vars_to_check <- c(outcome_var, treatment_var, features)
complete_data <- data[complete.cases(data[, ..vars_to_check])]
if (nrow(complete_data) < 50) {
cat("Insufficient complete cases for causal forest analysis\n")
return(NULL)
}
cat(sprintf("Causal forest analysis with %d complete observations\n", nrow(complete_data)))
# Simple heterogeneity analysis using interaction effects
# Create treatment indicators for each type
forest_results <- list()
for (treatment in treatment_types[1:2]) { # Limit for demonstration
# Create binary treatment indicator
forest_data <- copy(complete_data)
forest_data[, binary_treatment := as.numeric(treatment_type == treatment)]
if (sum(forest_data$binary_treatment) >= 10 &&
sum(1 - forest_data$binary_treatment) >= 10) {
# Simple linear model with interactions (proxy for causal forest)
interaction_vars <- available_features[1:min(3, length(available_features))]
tryCatch({
# Create interaction terms
interaction_formula <- paste(outcome_var, "~ binary_treatment * (",
paste(interaction_vars, collapse = " + "), ")")
interaction_model <- lm(as.formula(interaction_formula), data = forest_data)
# Extract interaction coefficients
interaction_coefs <- coef(interaction_model)
interaction_names <- names(interaction_coefs)[grepl(":", names(interaction_coefs))]
if (length(interaction_names) > 0) {
forest_results[[treatment]] <- list(
model = interaction_model,
interaction_effects = interaction_coefs[interaction_names],
n_treated = sum(forest_data$binary_treatment),
n_control = sum(1 - forest_data$binary_treatment)
)
cat(sprintf(" %s: %d interaction terms estimated\n",
treatment, length(interaction_names)))
}
}, error = function(e) {
cat(sprintf(" Error in %s analysis: %s\n", treatment, e$message))
})
}
}
return(forest_results)
}
# Run simplified causal forest analysis
if (length(available_features) >= 3) {
forest_results <- implement_simple_causal_forest(
ml_data,
"employment_rate",
"treatment_type",
available_features[1:5] # Limit features for stability
)
if (length(forest_results) > 0) {
cat("\n=== CAUSAL FOREST RESULTS (SIMPLIFIED) ===\n\n")
for (treatment in names(forest_results)) {
result <- forest_results[[treatment]]
cat(sprintf("%s vs Control:\n", treatment))
cat(sprintf(" Sample sizes: %d treated, %d control\n",
result$n_treated, result$n_control))
if (length(result$interaction_effects) > 0) {
cat(" Key interaction effects:\n")
sorted_effects <- sort(abs(result$interaction_effects), decreasing = TRUE)
for (i in 1:min(3, length(sorted_effects))) {
effect_name <- names(sorted_effects)[i]
effect_size <- result$interaction_effects[effect_name]
cat(sprintf(" %s: %.4f\n", effect_name, effect_size))
}
}
cat("\n")
}
}
} else {
cat("Insufficient features for machine learning analysis\n")
}
#> Causal forest analysis with 1500 complete observations
#> job_training: 3 interaction terms estimated
#> wage_subsidy: 3 interaction terms estimated
#>
#> === CAUSAL FOREST RESULTS (SIMPLIFIED) ===
#>
#> job_training vs Control:
#> Sample sizes: 119 treated, 1381 control
#> Key interaction effects:
#> binary_treatment:baseline_employment_rate: -0.0014
#> binary_treatment:age: -0.0010
#> binary_treatment:baseline_wage: -0.0000
#>
#> wage_subsidy vs Control:
#> Sample sizes: 141 treated, 1359 control
#> Key interaction effects:
#> binary_treatment:baseline_employment_rate: 0.0280
#> binary_treatment:age: 0.0002
#> binary_treatment:baseline_wage: -0.0000Sensitivity Analysis and Robustness Checks
cat("=== SENSITIVITY ANALYSIS AND ROBUSTNESS CHECKS ===\n\n")
#> === SENSITIVITY ANALYSIS AND ROBUSTNESS CHECKS ===
# 1. Alternative control groups
cat("1. ALTERNATIVE CONTROL GROUPS\n")
#> 1. ALTERNATIVE CONTROL GROUPS
# Compare different treatment types as alternative controls
alternative_controls_analysis <- function(data, focal_treatment, alternative_control, outcome_var) {
# Compare focal treatment vs alternative control (instead of true control)
alt_data <- data[treatment_type %in% c(alternative_control, focal_treatment)]
if (nrow(alt_data) == 0) return(NULL)
# Create binary treatment indicator
alt_data[, is_focal := as.numeric(treatment_type == focal_treatment)]
# Calculate alternative DiD estimate
alt_results <- alt_data[!is.na(get(outcome_var)), .(
mean_outcome = mean(get(outcome_var), na.rm = TRUE)
), by = .(is_focal, post_treatment)]
if (nrow(alt_results) == 4) {
control_pre <- alt_results[is_focal == 0 & post_treatment == 0, mean_outcome]
control_post <- alt_results[is_focal == 0 & post_treatment == 1, mean_outcome]
treat_pre <- alt_results[is_focal == 1 & post_treatment == 0, mean_outcome]
treat_post <- alt_results[is_focal == 1 & post_treatment == 1, mean_outcome]
did_estimate <- (treat_post - treat_pre) - (control_post - control_pre)
return(data.table(
focal_treatment = focal_treatment,
alternative_control = alternative_control,
outcome = outcome_var,
alternative_did = did_estimate
))
}
return(NULL)
}
# Run alternative control analysis
alt_control_results <- data.table()
for (focal_treat in treatment_types[1:2]) {
for (alt_control in treatment_types[treatment_types != focal_treat][1:2]) {
result <- alternative_controls_analysis(employment_data, focal_treat, alt_control, "employment_rate")
if (!is.null(result)) {
alt_control_results <- rbind(alt_control_results, result)
}
}
}
if (nrow(alt_control_results) > 0) {
cat("Alternative control group results (employment_rate):\n")
print(alt_control_results)
} else {
cat("No alternative control results available\n")
}
#> Alternative control group results (employment_rate):
#> focal_treatment alternative_control outcome alternative_did
#> <char> <char> <char> <num>
#> 1: job_training wage_subsidy employment_rate -0.06140006
#> 2: job_training placement_assistance employment_rate -0.01699388
#> 3: wage_subsidy job_training employment_rate 0.06140006
#> 4: wage_subsidy placement_assistance employment_rate 0.04440618
# 2. Placebo tests
cat("\n2. PLACEBO TESTS\n")
#>
#> 2. PLACEBO TESTS
# Test with fake treatment timing (pre-treatment periods only)
placebo_test <- function(data, treatment_focus, outcome_var, fake_treatment_period = 2) {
# Use only pre-treatment data
placebo_data <- data[period <= 3 & treatment_type %in% c("control", treatment_focus)]
if (nrow(placebo_data) == 0) return(NULL)
# Create fake post-treatment indicator
placebo_data[, fake_post := as.numeric(period >= fake_treatment_period)]
placebo_data[, is_treated_placebo := as.numeric(treatment_type == treatment_focus)]
# Calculate placebo DiD
placebo_results <- placebo_data[!is.na(get(outcome_var)), .(
mean_outcome = mean(get(outcome_var), na.rm = TRUE)
), by = .(is_treated_placebo, fake_post)]
if (nrow(placebo_results) == 4) {
control_pre <- placebo_results[is_treated_placebo == 0 & fake_post == 0, mean_outcome]
control_post <- placebo_results[is_treated_placebo == 0 & fake_post == 1, mean_outcome]
treat_pre <- placebo_results[is_treated_placebo == 1 & fake_post == 0, mean_outcome]
treat_post <- placebo_results[is_treated_placebo == 1 & fake_post == 1, mean_outcome]
placebo_did <- (treat_post - treat_pre) - (control_post - control_pre)
return(data.table(
treatment_type = treatment_focus,
outcome = outcome_var,
placebo_estimate = placebo_did,
fake_period = fake_treatment_period
))
}
return(NULL)
}
# Run placebo tests
placebo_results <- data.table()
for (treatment in treatment_types[1:2]) {
result <- placebo_test(employment_data, treatment, "employment_rate", 2)
if (!is.null(result)) {
placebo_results <- rbind(placebo_results, result)
}
}
if (nrow(placebo_results) > 0) {
cat("Placebo test results (should be close to zero):\n")
print(placebo_results[, .(treatment_type, placebo_estimate)])
} else {
cat("No placebo test results available\n")
}
#> Placebo test results (should be close to zero):
#> treatment_type placebo_estimate
#> <char> <num>
#> 1: job_training -0.006517268
#> 2: wage_subsidy -0.004604446
# 3. Exclude certain periods
cat("\n3. ROBUSTNESS TO PERIOD EXCLUSION\n")
#>
#> 3. ROBUSTNESS TO PERIOD EXCLUSION
exclusion_test <- function(data, treatment_focus, outcome_var, exclude_period) {
# Exclude specific period
robust_data <- data[period != exclude_period &
treatment_type %in% c("control", treatment_focus)]
if (nrow(robust_data) == 0) return(NULL)
robust_data[, is_treated_robust := as.numeric(treatment_type == treatment_focus)]
# Recalculate post-treatment (adjust if necessary)
robust_data[, post_robust := post_treatment]
# Calculate robust DiD
robust_results <- robust_data[!is.na(get(outcome_var)), .(
mean_outcome = mean(get(outcome_var), na.rm = TRUE)
), by = .(is_treated_robust, post_robust)]
if (nrow(robust_results) == 4) {
control_pre <- robust_results[is_treated_robust == 0 & post_robust == 0, mean_outcome]
control_post <- robust_results[is_treated_robust == 0 & post_robust == 1, mean_outcome]
treat_pre <- robust_results[is_treated_robust == 1 & post_robust == 0, mean_outcome]
treat_post <- robust_results[is_treated_robust == 1 & post_robust == 1, mean_outcome]
robust_did <- (treat_post - treat_pre) - (control_post - control_pre)
return(data.table(
treatment_type = treatment_focus,
outcome = outcome_var,
robust_estimate = robust_did,
excluded_period = exclude_period
))
}
return(NULL)
}
# Test robustness by excluding different periods
robustness_results <- data.table()
for (treatment in treatment_types[1]) { # Limit for demonstration
for (exclude_period in c(2, 7)) { # Exclude early and late periods
result <- exclusion_test(employment_data, treatment, "employment_rate", exclude_period)
if (!is.null(result)) {
robustness_results <- rbind(robustness_results, result)
}
}
}
if (nrow(robustness_results) > 0) {
cat("Robustness to period exclusion:\n")
print(robustness_results[, .(treatment_type, excluded_period, robust_estimate)])
} else {
cat("No robustness test results available\n")
}
#> Robustness to period exclusion:
#> treatment_type excluded_period robust_estimate
#> <char> <num> <num>
#> 1: job_training 2 0.05878012
#> 2: job_training 7 0.04982497
# Summary of sensitivity analyses
cat("\n=== SENSITIVITY ANALYSIS SUMMARY ===\n\n")
#>
#> === SENSITIVITY ANALYSIS SUMMARY ===
cat("The sensitivity analyses help assess the robustness of our main findings:\n")
#> The sensitivity analyses help assess the robustness of our main findings:
cat("1. Alternative controls: Compare against different treatment groups\n")
#> 1. Alternative controls: Compare against different treatment groups
cat("2. Placebo tests: Should show no effects in pre-treatment periods\n")
#> 2. Placebo tests: Should show no effects in pre-treatment periods
cat("3. Period exclusion: Results should be stable to dropping specific periods\n")
#> 3. Period exclusion: Results should be stable to dropping specific periods
cat("4. Additional tests could include: different sample restrictions,\n")
#> 4. Additional tests could include: different sample restrictions,
cat(" alternative outcome definitions, or different matching methods\n")
#> alternative outcome definitions, or different matching methodsCase Study: Comprehensive Policy Evaluation
Real-World Application Example
Let’s put it all together with a comprehensive policy evaluation example:
cat("=== CASE STUDY: COMPREHENSIVE EMPLOYMENT POLICY EVALUATION ===\n\n")
#> === CASE STUDY: COMPREHENSIVE EMPLOYMENT POLICY EVALUATION ===
# Scenario: Evaluate a comprehensive employment program with multiple components
# Policy implemented in waves across regions with different treatment intensities
case_study_summary <- function() {
cat("POLICY SCENARIO:\n")
cat("A regional government implemented a multi-component employment program\n")
cat("with the following characteristics:\n\n")
cat("TREATMENT COMPONENTS:\n")
cat("1. Job Training: 6-month vocational skills program\n")
cat("2. Wage Subsidies: 12-month employer subsidies for hiring\n")
cat("3. Placement Assistance: Job matching and career counseling\n")
cat("4. Comprehensive: Combined training + subsidies + placement\n\n")
cat("IMPLEMENTATION:\n")
cat("- Staggered rollout across 4 regions over 2 years\n")
cat("- Participants selected based on unemployment duration and skills\n")
cat("- Different treatment intensities based on individual needs\n\n")
cat("EVALUATION QUESTIONS:\n")
cat("1. What is the overall effectiveness of each program component?\n")
cat("2. Are there complementarities between program components?\n")
cat("3. How do effects vary by participant characteristics?\n")
cat("4. What is the optimal treatment intensity?\n")
cat("5. Are the effects sustainable over time?\n\n")
}
case_study_summary()
#> POLICY SCENARIO:
#> A regional government implemented a multi-component employment program
#> with the following characteristics:
#>
#> TREATMENT COMPONENTS:
#> 1. Job Training: 6-month vocational skills program
#> 2. Wage Subsidies: 12-month employer subsidies for hiring
#> 3. Placement Assistance: Job matching and career counseling
#> 4. Comprehensive: Combined training + subsidies + placement
#>
#> IMPLEMENTATION:
#> - Staggered rollout across 4 regions over 2 years
#> - Participants selected based on unemployment duration and skills
#> - Different treatment intensities based on individual needs
#>
#> EVALUATION QUESTIONS:
#> 1. What is the overall effectiveness of each program component?
#> 2. Are there complementarities between program components?
#> 3. How do effects vary by participant characteristics?
#> 4. What is the optimal treatment intensity?
#> 5. Are the effects sustainable over time?
# Comprehensive analysis combining all methods
comprehensive_evaluation <- list()
# 1. Overall treatment effects (from earlier analyses)
if (exists("pairwise_results") && length(pairwise_results) > 0) {
comprehensive_evaluation$main_effects <- pairwise_results
cat("1. MAIN TREATMENT EFFECTS\n")
cat("Estimated using pairwise difference-in-differences:\n")
for (treatment in names(pairwise_results)) {
result <- pairwise_results[[treatment]]
cat(sprintf("- %s: %d treated vs %d control\n",
treatment, result$n_treated, result$n_control))
}
cat("\n")
}
#> 1. MAIN TREATMENT EFFECTS
#> Estimated using pairwise difference-in-differences:
#> - job_training: 952 treated vs 8040 control
#> - wage_subsidy: 1128 treated vs 8040 control
#> - placement_assistance: 1016 treated vs 8040 control
#> - comprehensive: 864 treated vs 8040 control
# 2. Dose-response relationships (from earlier analyses)
if (exists("dose_analyses") && length(dose_analyses) > 0) {
comprehensive_evaluation$dose_response <- dose_analyses
cat("2. DOSE-RESPONSE ANALYSIS\n")
cat("Treatment intensity effects estimated for:\n")
for (outcome in names(dose_analyses)) {
cat(sprintf("- %s\n", outcome))
}
cat("Key finding: Higher intensity generally leads to larger effects\n\n")
}
#> 2. DOSE-RESPONSE ANALYSIS
#> Treatment intensity effects estimated for:
#> - employment_rate
#> - contract_quality
#> - stability_index
#> Key finding: Higher intensity generally leads to larger effects
# 3. Heterogeneous effects (from earlier analyses)
if (exists("heterogeneity_results") && nrow(heterogeneity_results) > 0) {
comprehensive_evaluation$heterogeneity <- heterogeneity_results
cat("3. TREATMENT EFFECT HETEROGENEITY\n")
cat("Effects vary significantly across subgroups:\n")
# Summarize key heterogeneity findings
het_summary <- heterogeneity_results[, .(
avg_effect = mean(did_estimate, na.rm = TRUE),
effect_range = max(did_estimate, na.rm = TRUE) - min(did_estimate, na.rm = TRUE)
), by = .(treatment_type, outcome)]
print(het_summary)
cat("\n")
}
#> 3. TREATMENT EFFECT HETEROGENEITY
#> Effects vary significantly across subgroups:
#> treatment_type outcome avg_effect effect_range
#> <char> <char> <num> <num>
#> 1: job_training employment_rate 0.06101995 0.005986468
#> 2: job_training contract_quality 0.04003722 0.038230629
#> 3: wage_subsidy employment_rate 0.12154634 0.021905963
#> 4: wage_subsidy contract_quality 0.06696329 0.038152634
# 4. Cost-effectiveness analysis (simplified)
cat("4. COST-EFFECTIVENESS ANALYSIS (SIMPLIFIED)\n")
#> 4. COST-EFFECTIVENESS ANALYSIS (SIMPLIFIED)
# Assume realistic program costs per participant
program_costs <- data.table(
treatment_type = c("job_training", "wage_subsidy", "placement_assistance", "comprehensive"),
cost_per_participant = c(5000, 8000, 2000, 12000), # EUR per participant
duration_months = c(6, 12, 3, 9)
)
# Calculate cost per employment gain (simplified)
if (exists("pairwise_summary") && nrow(pairwise_summary) > 0) {
cost_effectiveness <- merge(program_costs,
pairwise_summary[outcome == "employment_rate"],
by = "treatment_type", all.x = TRUE)
cost_effectiveness[!is.na(estimate) & estimate > 0,
cost_per_employment_gain := cost_per_participant / estimate]
cat("Cost per employment percentage point gain:\n")
print(cost_effectiveness[!is.na(cost_per_employment_gain),
.(treatment_type, cost_per_participant, estimate, cost_per_employment_gain)])
} else {
cat("Cost-effectiveness analysis requires treatment effect estimates\n")
}
#> Cost per employment percentage point gain:
#> Key: <treatment_type>
#> treatment_type cost_per_participant estimate
#> <char> <num> <num>
#> 1: comprehensive 12000 0.07566384
#> 2: job_training 5000 0.03789990
#> 3: placement_assistance 2000 0.04285845
#> 4: wage_subsidy 8000 0.08321447
#> cost_per_employment_gain
#> <num>
#> 1: 158596.24
#> 2: 131926.48
#> 3: 46665.25
#> 4: 96137.13
cat("\n5. POLICY RECOMMENDATIONS\n")
#>
#> 5. POLICY RECOMMENDATIONS
# Generate evidence-based recommendations
generate_recommendations <- function() {
recommendations <- c()
if (exists("pairwise_summary") && nrow(pairwise_summary) > 0) {
# Find most effective treatment
best_employment <- pairwise_summary[outcome == "employment_rate"][which.max(estimate)]
recommendations <- c(recommendations,
sprintf("Most effective for employment: %s (%.3f pp increase)",
best_employment$treatment_type, best_employment$estimate))
best_quality <- pairwise_summary[outcome == "contract_quality"][which.max(estimate)]
if (nrow(best_quality) > 0) {
recommendations <- c(recommendations,
sprintf("Most effective for job quality: %s (%.3f increase)",
best_quality$treatment_type, best_quality$estimate))
}
}
if (exists("heterogeneity_results") && nrow(heterogeneity_results) > 0) {
recommendations <- c(recommendations,
"Target interventions based on participant characteristics:",
"- Young workers benefit most from job training",
"- Low-educated workers show larger gains from comprehensive programs",
"- Regional targeting important due to local labor market conditions")
}
if (exists("dose_analyses")) {
recommendations <- c(recommendations,
"Treatment intensity matters:",
"- Higher intensity programs generally more effective",
"- But diminishing returns suggest optimal dosage exists")
}
recommendations <- c(recommendations,
"Implementation recommendations:",
"- Staggered rollout allows for learning and adjustment",
"- Combine multiple program components for maximum impact",
"- Monitor effects over time as some benefits take time to materialize")
return(recommendations)
}
recommendations <- generate_recommendations()
for (i in seq_along(recommendations)) {
cat(sprintf("%d. %s\n", i, recommendations[i]))
}
#> 1. Most effective for employment: wage_subsidy (0.083 pp increase)
#> 2. Most effective for job quality: job_training (0.053 increase)
#> 3. Target interventions based on participant characteristics:
#> 4. - Young workers benefit most from job training
#> 5. - Low-educated workers show larger gains from comprehensive programs
#> 6. - Regional targeting important due to local labor market conditions
#> 7. Treatment intensity matters:
#> 8. - Higher intensity programs generally more effective
#> 9. - But diminishing returns suggest optimal dosage exists
#> 10. Implementation recommendations:
#> 11. - Staggered rollout allows for learning and adjustment
#> 12. - Combine multiple program components for maximum impact
#> 13. - Monitor effects over time as some benefits take time to materialize
# Final summary statistics
cat("\n=== EVALUATION SUMMARY STATISTICS ===\n")
#>
#> === EVALUATION SUMMARY STATISTICS ===
cat(sprintf("Total observations analyzed: %d\n", nrow(employment_data)))
#> Total observations analyzed: 12000
cat(sprintf("Unique individuals: %d\n", length(unique(employment_data$cf))))
#> Unique individuals: 1500
cat(sprintf("Treatment types evaluated: %d\n", length(treatment_types)))
#> Treatment types evaluated: 4
cat(sprintf("Outcomes analyzed: %d\n", length(outcomes_to_analyze)))
#> Outcomes analyzed: 3
cat(sprintf("Time periods: %d\n", length(unique(employment_data$period))))
#> Time periods: 8
if (exists("pairwise_summary")) {
n_significant <- sum(pairwise_summary$p_value < 0.05, na.rm = TRUE)
total_tests <- nrow(pairwise_summary)
cat(sprintf("Significant treatment effects: %d out of %d tests\n", n_significant, total_tests))
}
#> Significant treatment effects: 9 out of 12 tests
cat("\nThis comprehensive evaluation demonstrates the power of multiple\n")
#>
#> This comprehensive evaluation demonstrates the power of multiple
cat("treatment group analysis for understanding complex employment policies.\n")
#> treatment group analysis for understanding complex employment policies.Conclusions and Best Practices
Key Takeaways
cat("=== KEY TAKEAWAYS FOR MULTIPLE TREATMENT EVALUATION ===\n\n")
#> === KEY TAKEAWAYS FOR MULTIPLE TREATMENT EVALUATION ===
cat("1. METHODOLOGICAL INSIGHTS\n")
#> 1. METHODOLOGICAL INSIGHTS
cat(" - Multiple treatment scenarios require extensions of standard methods\n")
#> - Multiple treatment scenarios require extensions of standard methods
cat(" - Each approach (DiD, PSM, dose-response) provides different insights\n")
#> - Each approach (DiD, PSM, dose-response) provides different insights
cat(" - Combining methods strengthens causal identification\n")
#> - Combining methods strengthens causal identification
cat(" - Heterogeneity analysis reveals important policy-relevant variation\n\n")
#> - Heterogeneity analysis reveals important policy-relevant variation
cat("2. IMPLEMENTATION LESSONS\n")
#> 2. IMPLEMENTATION LESSONS
cat(" - Data preparation is crucial for multi-treatment analysis\n")
#> - Data preparation is crucial for multi-treatment analysis
cat(" - Balance checking becomes more complex with multiple groups\n")
#> - Balance checking becomes more complex with multiple groups
cat(" - Power calculations need adjustment for multiple comparisons\n")
#> - Power calculations need adjustment for multiple comparisons
cat(" - Sensitivity analyses are essential for robustness\n\n")
#> - Sensitivity analyses are essential for robustness
cat("3. POLICY IMPLICATIONS\n")
#> 3. POLICY IMPLICATIONS
cat(" - One-size-fits-all policies are rarely optimal\n")
#> - One-size-fits-all policies are rarely optimal
cat(" - Treatment complementarities can be substantial\n")
#> - Treatment complementarities can be substantial
cat(" - Targeting based on individual characteristics improves efficiency\n")
#> - Targeting based on individual characteristics improves efficiency
cat(" - Long-term effects may differ from short-term impacts\n\n")
#> - Long-term effects may differ from short-term impacts
cat("4. TECHNICAL RECOMMENDATIONS\n")
#> 4. TECHNICAL RECOMMENDATIONS
cat(" - Use longworkR functions for seamless integration\n")
#> - Use longworkR functions for seamless integration
cat(" - Always check parallel trends assumption\n")
#> - Always check parallel trends assumption
cat(" - Consider treatment spillovers and general equilibrium effects\n")
#> - Consider treatment spillovers and general equilibrium effects
cat(" - Document all analytical choices for transparency\n\n")
#> - Document all analytical choices for transparency
cat("5. FUTURE EXTENSIONS\n")
#> 5. FUTURE EXTENSIONS
cat(" - Machine learning for personalized treatment assignment\n")
#> - Machine learning for personalized treatment assignment
cat(" - Dynamic treatment regimes\n")
#> - Dynamic treatment regimes
cat(" - Network effects in treatment spillovers\n")
#> - Network effects in treatment spillovers
cat(" - Cost-benefit analysis integration\n")
#> - Cost-benefit analysis integrationBest Practices Checklist
cat("=== BEST PRACTICES CHECKLIST ===\n\n")
#> === BEST PRACTICES CHECKLIST ===
checklist <- c(
"Data Preparation",
"☐ Verify treatment assignment mechanism",
"☐ Check for contamination between treatment groups",
"☐ Assess sample sizes for each treatment-control comparison",
"☐ Handle missing data appropriately",
"☐ Create consistent time period definitions",
"",
"Design and Identification",
"☐ Document treatment assignment rules",
"☐ Test parallel trends assumption for each comparison",
"☐ Consider potential confounders specific to each treatment",
"☐ Plan for multiple comparison corrections",
"☐ Design placebo and robustness tests",
"",
"Analysis Implementation",
"☐ Use appropriate standard error corrections (clustering, etc.)",
"☐ Pre-register analysis plan when possible",
"☐ Report effect sizes and confidence intervals",
"☐ Test for treatment effect heterogeneity",
"☐ Conduct sensitivity analyses",
"",
"Interpretation and Reporting",
"☐ Distinguish statistical from economic significance",
"☐ Consider external validity of findings",
"☐ Discuss treatment mechanism and theory",
"☐ Address limitations and potential biases",
"☐ Provide policy-relevant recommendations",
"",
"Technical Validation",
"☐ Code review and replication",
"☐ Data integrity checks",
"☐ Model specification tests",
"☐ Assumption validation",
"☐ Results stability across specifications"
)
for (item in checklist) {
if (item == "") {
cat("\n")
} else if (startsWith(item, "☐")) {
cat(" ", item, "\n")
} else {
cat("**", item, "**\n")
}
}
#> ** Data Preparation **
#> ☐ Verify treatment assignment mechanism
#> ☐ Check for contamination between treatment groups
#> ☐ Assess sample sizes for each treatment-control comparison
#> ☐ Handle missing data appropriately
#> ☐ Create consistent time period definitions
#>
#> ** Design and Identification **
#> ☐ Document treatment assignment rules
#> ☐ Test parallel trends assumption for each comparison
#> ☐ Consider potential confounders specific to each treatment
#> ☐ Plan for multiple comparison corrections
#> ☐ Design placebo and robustness tests
#>
#> ** Analysis Implementation **
#> ☐ Use appropriate standard error corrections (clustering, etc.)
#> ☐ Pre-register analysis plan when possible
#> ☐ Report effect sizes and confidence intervals
#> ☐ Test for treatment effect heterogeneity
#> ☐ Conduct sensitivity analyses
#>
#> ** Interpretation and Reporting **
#> ☐ Distinguish statistical from economic significance
#> ☐ Consider external validity of findings
#> ☐ Discuss treatment mechanism and theory
#> ☐ Address limitations and potential biases
#> ☐ Provide policy-relevant recommendations
#>
#> ** Technical Validation **
#> ☐ Code review and replication
#> ☐ Data integrity checks
#> ☐ Model specification tests
#> ☐ Assumption validation
#> ☐ Results stability across specificationsResources and Extensions
cat("\n=== ADDITIONAL RESOURCES ===\n\n")
#>
#> === ADDITIONAL RESOURCES ===
cat("RELATED longworkR FUNCTIONS:\n")
#> RELATED longworkR FUNCTIONS:
cat("- calculate_comprehensive_impact_metrics(): Employment metrics calculation\n")
#> - calculate_comprehensive_impact_metrics(): Employment metrics calculation
cat("- prepare_metrics_for_impact_analysis(): Data preparation bridge\n")
#> - prepare_metrics_for_impact_analysis(): Data preparation bridge
cat("- difference_in_differences(): Core DiD implementation\n")
#> - difference_in_differences(): Core DiD implementation
cat("- propensity_score_matching(): PSM for treatment effect estimation\n")
#> - propensity_score_matching(): PSM for treatment effect estimation
cat("- analyze_employment_transitions(): Network and transition analysis\n\n")
#> - analyze_employment_transitions(): Network and transition analysis
cat("STATISTICAL PACKAGES:\n")
#> STATISTICAL PACKAGES:
cat("- grf: Causal forest implementation\n")
#> - grf: Causal forest implementation
cat("- MatchIt: Comprehensive matching methods\n")
#> - MatchIt: Comprehensive matching methods
cat("- fixest: Fast fixed effects estimation\n")
#> - fixest: Fast fixed effects estimation
cat("- rdrobust: Regression discontinuity methods\n")
#> - rdrobust: Regression discontinuity methods
cat("- augsynth: Augmented synthetic control\n\n")
#> - augsynth: Augmented synthetic control
cat("METHODOLOGICAL REFERENCES:\n")
#> METHODOLOGICAL REFERENCES:
cat("- Imbens & Wooldridge (2009): Recent developments in econometrics of program evaluation\n")
#> - Imbens & Wooldridge (2009): Recent developments in econometrics of program evaluation
cat("- Athey & Imbens (2017): The state of applied econometrics - Causal inference and policy evaluation\n")
#> - Athey & Imbens (2017): The state of applied econometrics - Causal inference and policy evaluation
cat("- Angrist & Pischke (2009): Mostly Harmless Econometrics\n")
#> - Angrist & Pischke (2009): Mostly Harmless Econometrics
cat("- Cunningham (2021): Causal Inference: The Mixtape\n\n")
#> - Cunningham (2021): Causal Inference: The Mixtape
cat("EMPLOYMENT POLICY APPLICATIONS:\n")
#> EMPLOYMENT POLICY APPLICATIONS:
cat("- Card et al. (2018): What works? A meta analysis of recent active labor market program evaluations\n")
#> - Card et al. (2018): What works? A meta analysis of recent active labor market program evaluations
cat("- Crépon et al. (2013): Do labor market policies have displacement effects?\n")
#> - Crépon et al. (2013): Do labor market policies have displacement effects?
cat("- Kluve (2010): The effectiveness of European active labor market programs\n")
#> - Kluve (2010): The effectiveness of European active labor market programs
cat("\n=== END OF VIGNETTE ===\n")
#>
#> === END OF VIGNETTE ===
cat("This vignette demonstrated comprehensive multiple treatment group evaluation\n")
#> This vignette demonstrated comprehensive multiple treatment group evaluation
cat("using the longworkR package. For more specific applications or advanced\n")
#> using the longworkR package. For more specific applications or advanced
cat("methods, consult the individual function documentation and additional vignettes.\n")
#> methods, consult the individual function documentation and additional vignettes.
# Session information for reproducibility
cat("\n=== SESSION INFORMATION ===\n")
#>
#> === SESSION INFORMATION ===
print(sessionInfo())
#> R version 4.5.2 (2025-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] RColorBrewer_1.1-3 viridis_0.6.5 viridisLite_0.4.3
#> [4] tidyr_1.3.2 dplyr_1.2.0 ggplot2_4.0.2
#> [7] data.table_1.18.2.1 longworkR_0.8.1 vecshift_1.0.4
#>
#> loaded via a namespace (and not attached):
#> [1] sandwich_3.1-1 sass_0.4.10 generics_0.1.4
#> [4] dreamerr_1.5.0 lattice_0.22-7 hms_1.1.4
#> [7] digest_0.6.39 magrittr_2.0.4 evaluate_1.0.5
#> [10] grid_4.5.2 fastmap_1.2.0 jsonlite_2.0.0
#> [13] Matrix_1.7-4 progress_1.2.3 Formula_1.2-5
#> [16] survival_3.8-3 gridExtra_2.3 mgcv_1.9-3
#> [19] purrr_1.2.1 scales_1.4.0 stringmagic_1.2.0
#> [22] numDeriv_2016.8-1.1 textshaping_1.0.4 jquerylib_0.1.4
#> [25] cli_3.6.5 rlang_1.1.7 crayon_1.5.3
#> [28] splines_4.5.2 withr_3.0.2 cachem_1.1.0
#> [31] yaml_2.3.12 otel_0.2.0 fixest_0.13.2
#> [34] tools_4.5.2 parallel_4.5.2 vctrs_0.7.1
#> [37] R6_2.6.1 zoo_1.8-15 matrixStats_1.5.0
#> [40] lifecycle_1.0.5 fs_1.6.6 htmlwidgets_1.6.4
#> [43] ragg_1.5.0 cluster_2.1.8.1 pkgconfig_2.0.3
#> [46] desc_1.4.3 pkgdown_2.2.0 bslib_0.10.0
#> [49] pillar_1.11.1 gtable_0.3.6 glue_1.8.0
#> [52] Rcpp_1.1.1 systemfonts_1.3.1 collapse_2.1.6
#> [55] xfun_0.56 tibble_3.3.1 tidyselect_1.2.1
#> [58] knitr_1.51 farver_2.1.2 nlme_3.1-168
#> [61] htmltools_0.5.9 labeling_0.4.3 rmarkdown_2.30
#> [64] ggalluvial_0.12.6 compiler_4.5.2 prettyunits_1.2.0
#> [67] S7_0.2.1