Prepare Metrics Output for Impact Analysis
Source:R/impact_integration.R
prepare_metrics_for_impact_analysis.RdThis function serves as a bridge between the output of
calculate_comprehensive_impact_metrics and impact evaluation
methods like difference_in_differences. It restructures the
metrics data to create a proper panel structure suitable for causal inference
methods.
Usage
prepare_metrics_for_impact_analysis(
metrics_output,
treatment_assignment,
impact_method = c("did", "event_study", "matching"),
id_column = "cf",
period_column = "period",
outcome_vars = NULL,
auto_detect_outcomes = TRUE,
verbose = TRUE
)Arguments
- metrics_output
data.table or list. Output from
calculate_comprehensive_impact_metrics- treatment_assignment
data.table. A data.table with columns for the individual identifier (default: "cf") and treatment indicator (default: "is_treated"). This maps individuals to their treatment status.
- impact_method
character. The intended impact evaluation method. One of "did" (difference-in-differences), "event_study", or "matching". This affects how the data is restructured. Default: "did"
- id_column
character. Name of the individual identifier column. Default: "cf"
- period_column
character. Name of the period column in metrics_output. Default: "period"
- outcome_vars
character vector or NULL. Names of outcome variables to include in the analysis. If NULL and auto_detect_outcomes is TRUE, automatically detects numeric metrics columns. Default: NULL
- auto_detect_outcomes
logical. Whether to automatically detect outcome variables from the metrics output. Default: TRUE
- verbose
logical. Whether to print diagnostic information. Default: TRUE
Value
data.table with the following structure:
Individual identifiers: Column specified by id_column
Treatment indicator: "is_treated" (0/1)
Time variables: Both original period column and binary "post" variable
Outcome variables: Selected metrics suitable for analysis
Panel structure: Proper pre/post observations for all units
The output is directly compatible with difference_in_differences
and other impact evaluation functions in the package.
Details
This function handles several key transformations needed for impact analysis:
Data Structure Transformations
Panel Creation: For treated units with "pre"/"post" periods, maintains both observations
Control Expansion: For control units (typically with "control" period), creates duplicate observations with "pre" and "post" labels to enable proper DiD estimation
Time Variable Creation: Creates binary time variables (post = 0/1) suitable for regression analysis
Treatment Interaction: Ensures proper interaction terms can be created for DiD estimation
Outcome Variable Selection
The function automatically detects numeric metrics as potential outcomes, excluding ID and time variables. Common detected outcomes include:
Employment stability metrics (employment_rate, employment_spells)
Contract quality metrics (permanent_contract_rate, avg_contract_quality)
Career complexity metrics (transition_complexity, career_entropy)
Transition metrics (avg_transition_duration, transition_success_rate)
Examples
if (FALSE) { # \dontrun{
# ===== BASIC WORKFLOW EXAMPLE =====
# Load sample data
sample_data <- readRDS(system.file("data", "sample.rds", package = "longworkR"))
# Step 1: Calculate comprehensive employment metrics
metrics_result <- calculate_comprehensive_impact_metrics(
data = sample_data,
metrics = c("stability", "quality", "complexity"),
output_format = "wide"
)
# Step 2: Define treatment assignment
treatment_data <- data.table(
cf = unique(sample_data$cf),
is_treated = sample(c(0, 1), length(unique(sample_data$cf)), replace = TRUE),
# Add baseline covariates for matching/controls
baseline_employment = runif(length(unique(sample_data$cf))),
region = sample(c("urban", "rural"), length(unique(sample_data$cf)), replace = TRUE)
)
# Step 3: Prepare for DiD analysis
did_data <- prepare_metrics_for_impact_analysis(
metrics_output = metrics_result,
treatment_assignment = treatment_data,
impact_method = "did",
verbose = TRUE
)
# Step 4: Run difference-in-differences
did_results <- difference_in_differences(
data = did_data,
outcome_vars = c("employment_rate", "permanent_contract_rate", "career_stability_score"),
treatment_var = "is_treated",
time_var = "post",
id_var = "cf",
control_vars = c("baseline_employment")
)
# ===== ADVANCED SCENARIOS =====
# Event Study Analysis
# First create event time structure in original data
sample_data[, event_time := ifelse(event_period == "pre", -1,
ifelse(event_period == "post", 1, 0))]
# Recalculate metrics with event time
metrics_event <- calculate_comprehensive_impact_metrics(
data = sample_data,
metrics = "all",
period_column = "event_time"
)
# Prepare for event study
event_data <- prepare_metrics_for_impact_analysis(
metrics_output = metrics_event,
treatment_assignment = treatment_data,
impact_method = "event_study"
)
# Multiple Treatment Groups
treatment_multi <- copy(treatment_data)
treatment_multi[, `:=`(
treatment_type = sample(c("none", "training", "subsidies", "both"), .N, replace = TRUE),
is_treated = as.numeric(treatment_type != "none")
)]
multi_data <- prepare_metrics_for_impact_analysis(
metrics_output = metrics_result,
treatment_assignment = treatment_multi,
impact_method = "did"
)
# Propensity Score Matching Setup
matching_data <- prepare_metrics_for_impact_analysis(
metrics_output = metrics_result,
treatment_assignment = treatment_data,
impact_method = "matching",
outcome_vars = c("employment_rate", "contract_quality_score"),
auto_detect_outcomes = FALSE # Specify outcomes explicitly
)
# ===== DATA VALIDATION =====
# Check prepared data structure
validation <- validate_integration_setup(
data = did_data,
impact_method = "did",
id_column = "cf",
verbose = TRUE
)
if (validation) {
cat("Data successfully prepared for impact analysis\n")
}
} # }