Impact Evaluation with Employment Metrics: A Complete Workflow

Overview

This vignette demonstrates the complete workflow for integrating employment metrics calculation with causal inference methods. The longworkR package provides a seamless pipeline from raw employment data to impact evaluation results through three key stages:

Metrics Calculation: Computing comprehensive employment metrics using calculate_comprehensive_impact_metrics()
Data Integration: Preparing metrics for causal analysis using prepare_metrics_for_impact_analysis()
Impact Evaluation: Running causal inference methods like difference_in_differences()

This integrated approach allows researchers to use sophisticated employment metrics as outcomes in rigorous causal inference studies.

The Integration Architecture

The integration between metrics calculation and impact evaluation follows a three-step process:

Integration Workflow

Getting Started: Sample Data

Let’s begin with a practical example using the package’s sample employment data:

# Load sample employment data from package data directory
# Try multiple data file locations in order of preference
data_files <- c(
  "data/synthetic_sample.rds",
  "data/sample.rds", 
  system.file("extdata", "sample.rds", package = "longworkR")
)

sample_data <- NULL
for (file_path in data_files) {
  if (file.exists(file_path) && file.size(file_path) > 0) {
    tryCatch({
      sample_data <- readRDS(file_path)
      cat("Successfully loaded data from:", file_path, "\n")
      break
    }, error = function(e) {
      cat("Failed to load", file_path, ":", e$message, "\n")
    })
  }
}
#> Successfully loaded data from: /tmp/RtmpdJt0k5/temp_libpath20f13e635032/longworkR/extdata/sample.rds

# If no data file found, create synthetic example data
if (is.null(sample_data)) {
  cat("No data files found, creating synthetic example data\n")
  set.seed(123)  # For reproducibility
  sample_data <- data.table(
    cf = rep(1:100, each = 4),
    inizio = as.Date("2020-01-01") + sample(0:730, 400, replace = TRUE),
    fine = as.Date("2020-01-01") + sample(30:800, 400, replace = TRUE),
    durata = sample(30:365, 400, replace = TRUE),
    over_id = rep(1:400),
    prior = sample(c(0, 1), 400, replace = TRUE),
    COD_TIPOLOGIA_CONTRATTUALE = sample(
      c("A.03.00", "A.03.01", "C.01.00", "A.07.00"), 
      400, replace = TRUE
    ),
    event_start = as.Date("2021-06-01"),
    event_end = as.Date("2021-12-31"),
    retribuzione = rnorm(400, 2500, 500)
  )
}

# Ensure data.table format
if (!inherits(sample_data, "data.table")) {
  sample_data <- as.data.table(sample_data)
}

# Create event_period column if it doesn't exist
if (!"event_period" %in% names(sample_data)) {
  if ("event_start" %in% names(sample_data) && "event_end" %in% names(sample_data)) {
    # Use event timing to create periods
    sample_data[, event_period := ifelse(
      fine < event_start, "pre",
      ifelse(inizio > event_end, "post", "during")
    )]
    cat("Created event_period column from event_start/event_end\n")
  } else {
    # Create random periods if no event timing available
    set.seed(456)
    sample_data[, event_period := sample(c("pre", "post"), .N, replace = TRUE, prob = c(0.6, 0.4))]
    cat("Created random event_period column\n")
  }
}
#> Created event_period column from event_start/event_end

# Examine the data structure
cat("Sample data dimensions:", nrow(sample_data), "rows,", ncol(sample_data), "columns\n")
#> Sample data dimensions: 476400 rows, 31 columns
cat("Unique individuals:", length(unique(sample_data$cf)), "\n")
#> Unique individuals: 4252
cat("Event periods:", paste(names(table(sample_data$event_period)), collapse = ", "), "\n")
#> Event periods: during, post, pre
cat("Available columns:", paste(names(sample_data)[1:10], collapse = ", "), "...\n")
#> Available columns: id, cf, inizio, fine, arco, prior, over_id, durata, stato, qualifica ...

Step 1: Calculate Comprehensive Employment Metrics

The first step involves calculating comprehensive employment metrics for all individuals across different time periods. These metrics capture various aspects of employment quality and career trajectories.

Basic Metrics Calculation

# Calculate comprehensive employment metrics with error handling
metrics_result <- NULL

tryCatch({
  # Ensure data types are consistent for data.table operations
  sample_data[, cf := as.character(cf)]
  # Handle over_id conversion more carefully to avoid type inconsistency
  if (is.numeric(sample_data$over_id)) {
    sample_data[, over_id := as.integer(round(over_id))]
  } else {
    sample_data[, over_id := as.integer(over_id)]
  }
  # Ensure other numeric columns are properly typed
  if ("durata" %in% names(sample_data)) {
    sample_data[, durata := as.numeric(durata)]
  }
  if ("prior" %in% names(sample_data)) {
    sample_data[, prior := as.integer(prior)]
  }
  
  # Calculate metrics including complexity
  metrics_result <- calculate_comprehensive_impact_metrics(
    data = sample_data,
    metrics = c("stability", "quality", "complexity"),  # Include complexity metrics
    id_column = "cf",
    period_column = "event_period",
    output_format = "wide"
  )
  
  cat("Successfully calculated employment metrics\n")
  
}, error = function(e) {
  cat("Error in metrics calculation:", e$message, "\n")
  cat("Creating simplified demonstration data instead\n")
  
  # Create simplified metrics output for demonstration
  unique_ids <- unique(sample_data$cf)
  periods <- c("pre", "post")
  
  metrics_result <<- data.table(
    cf = rep(unique_ids[1:min(50, length(unique_ids))], each = 2),
    period = rep(periods, length(unique_ids[1:min(50, length(unique_ids))])),
    employment_rate = runif(length(unique_ids[1:min(50, length(unique_ids))]) * 2, 0.3, 0.9),
    employment_stability_index = runif(length(unique_ids[1:min(50, length(unique_ids))]) * 2, 0.4, 0.8),
    permanent_contract_rate = runif(length(unique_ids[1:min(50, length(unique_ids))]) * 2, 0.2, 0.7),
    avg_contract_quality = runif(length(unique_ids[1:min(50, length(unique_ids))]) * 2, 0.5, 1.0)
  )
})
#> Successfully calculated employment metrics

# Examine the output structure
if (!is.null(metrics_result)) {
  cat("Metrics result dimensions:", nrow(metrics_result), "rows,", ncol(metrics_result), "columns\n")
  cat("Available metrics:\n")
  metric_cols <- setdiff(names(metrics_result), c("cf", "period", "event_period"))
  for(col in metric_cols[1:min(10, length(metric_cols))]) {
    cat(paste0("  - ", col, "\n"))
  }
  
  # Display first few rows
  if (ncol(metrics_result) >= 6) {
    print(head(metrics_result[, 1:6], 3))
  } else {
    print(head(metrics_result, 3))
  }
} else {
  cat("No metrics results available\n")
}
#> Metrics result dimensions: 12689 rows, 39 columns
#> Available metrics:
#>   - days_employed
#>   - days_unemployed
#>   - total_days
#>   - total_observations
#>   - employment_rate
#>   - employment_spells
#>   - unemployment_spells
#>   - avg_employment_spell
#>   - avg_unemployment_spell
#>   - max_employment_spell
#> Key: <cf, period>
#>        cf period days_employed days_unemployed total_days total_observations
#>    <char> <char>         <num>           <num>      <num>              <num>
#> 1:      1   <NA>         16700            2574      19274                121
#> 2:      1 during           103               0        103                  1
#> 3:      1   post           377               0        377                  2

Understanding Metric Types

The comprehensive metrics function calculates four main categories:

Employment Stability Metrics: Employment rates, spell durations, turnover
Contract Quality Metrics: Contract type distributions, quality improvements
Career Complexity Metrics: Concurrent employment, employment diversity, career fragmentation
Transition Pattern Metrics: Success rates, duration patterns

# Group metrics by category for better understanding
stability_metrics <- grep("employment|stability|spell|turnover", names(metrics_result), value = TRUE)
quality_metrics <- grep("contract|permanent|temporary|quality", names(metrics_result), value = TRUE) 
complexity_metrics <- grep("complexity|concurrent|diversity|fragmentation|career_complexity_index", names(metrics_result), value = TRUE)

cat("Stability metrics (", length(stability_metrics), "):\n")
#> Stability metrics ( 15 ):
cat(paste(stability_metrics, collapse = ", "), "\n\n")
#> employment_rate, employment_spells, unemployment_spells, avg_employment_spell, avg_unemployment_spell, max_employment_spell, max_unemployment_spell, job_turnover_rate, employment_stability_index, total_employment_days.x, contract_stability_trend, concurrent_employment_days, total_employment_days.y, concurrent_employment_rate, employment_diversity_index

cat("Quality metrics (", length(quality_metrics), "):\n") 
#> Quality metrics ( 10 ):
cat(paste(quality_metrics, collapse = ", "), "\n\n")
#> permanent_contract_days, temporary_contract_days, internship_contract_days, average_contract_quality, contract_observations, permanent_contract_rate, internship_contract_rate, temporary_contract_rate, contract_stability_trend, contract_improvement_rate

cat("Complexity metrics (", length(complexity_metrics), "):\n")
#> Complexity metrics ( 7 ):
cat(paste(complexity_metrics, collapse = ", "), "\n")
#> max_concurrent_jobs, avg_concurrent_jobs, concurrent_employment_days, concurrent_employment_rate, employment_diversity_index, career_fragmentation_index, career_complexity_index

Step 2: Prepare Data for Impact Analysis

The bridge function prepare_metrics_for_impact_analysis() transforms the metrics output into a format suitable for causal inference methods. This crucial step handles period structures, treatment assignments, and outcome variable selection.

Define Treatment Assignment

First, we need to define which individuals are treated vs. control:

# Create treatment assignment
# In real studies, this would come from your experimental design or policy assignment
set.seed(123)  # For reproducibility
unique_individuals <- unique(sample_data$cf)

treatment_data <- data.table(
  cf = unique_individuals,
  is_treated = rbinom(length(unique_individuals), 1, 0.4)  # 40% treatment rate
)

# Add additional covariates that might affect assignment
treatment_data[, `:=`(
  age_group = sample(c("young", "middle", "senior"), .N, replace = TRUE),
  region = sample(c("north", "center", "south"), .N, replace = TRUE),
  baseline_employment = runif(.N, 0, 1)
)]

cat("Treatment assignment summary:\n")
#> Treatment assignment summary:
cat("Total individuals:", nrow(treatment_data), "\n")
#> Total individuals: 4252
cat("Treated:", sum(treatment_data$is_treated), "(", 
    round(100 * mean(treatment_data$is_treated), 1), "%)\n")
#> Treated: 1700 ( 40 %)
cat("Control:", sum(1 - treatment_data$is_treated), "(", 
    round(100 * mean(1 - treatment_data$is_treated), 1), "%)\n")
#> Control: 2552 ( 60 %)

Bridge to Impact Analysis Format

# Prepare data for difference-in-differences analysis with error handling
did_data <- NULL

if (!is.null(metrics_result)) {
  tryCatch({
    # Ensure cf column is character in treatment data to match metrics
    treatment_data[, cf := as.character(cf)]
    
    did_data <- prepare_metrics_for_impact_analysis(
      metrics_output = metrics_result,
      treatment_assignment = treatment_data,
      impact_method = "did",
      id_column = "cf",
      period_column = "period",  # Note: metrics output uses "period" not "event_period"
      verbose = TRUE
    )
    
    cat("Successfully prepared data for impact analysis\n")
    
  }, error = function(e) {
    cat("Error in data preparation:", e$message, "\n")
    cat("Creating simplified DiD structure for demonstration\n")
    
    # Create a simple DiD structure manually
    unique_ids <- unique(metrics_result$cf)
    did_data <<- data.table(
      cf = rep(unique_ids, each = 2),
      period = rep(c("pre", "post"), length(unique_ids)),
      post = rep(c(0, 1), length(unique_ids)),
      is_treated = rep(treatment_data$is_treated[match(unique_ids, treatment_data$cf)], each = 2),
      employment_rate = c(rbind(
        metrics_result[period == "pre", employment_rate],
        metrics_result[period == "post", employment_rate]
      )),
      employment_stability_index = c(rbind(
        metrics_result[period == "pre", employment_stability_index],
        metrics_result[period == "post", employment_stability_index]
      ))
    )
  })
}
#> Auto-detected outcome variables: days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell, avg_unemployment_spell, max_employment_spell, max_unemployment_spell, job_turnover_rate, employment_stability_index, career_success_index, permanent_contract_days, temporary_contract_days, internship_contract_days, total_employment_days.x, average_contract_quality, contract_observations, permanent_contract_rate, internship_contract_rate, temporary_contract_rate, temp_to_perm_transitions, perm_to_temp_transitions, temp_to_internship_transitions, internship_to_perm_transitions, contract_stability_trend, contract_improvement_rate, max_concurrent_jobs, avg_concurrent_jobs, concurrent_employment_days, total_employment_days.y, concurrent_employment_rate, employment_diversity_index, career_fragmentation_index, career_complexity_index 
#> Using outcome variables: days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell, avg_unemployment_spell, max_employment_spell, max_unemployment_spell, job_turnover_rate, employment_stability_index, career_success_index, permanent_contract_days, temporary_contract_days, internship_contract_days, total_employment_days.x, average_contract_quality, contract_observations, permanent_contract_rate, internship_contract_rate, temporary_contract_rate, temp_to_perm_transitions, perm_to_temp_transitions, temp_to_internship_transitions, internship_to_perm_transitions, contract_stability_trend, contract_improvement_rate, max_concurrent_jobs, avg_concurrent_jobs, concurrent_employment_days, total_employment_days.y, concurrent_employment_rate, employment_diversity_index, career_fragmentation_index, career_complexity_index 
#> Treatment assignment data is already at person-level (no deduplication needed)
#> 
#> === INPUT DATA QUALITY VALIDATION ===
#> Metrics data: rows = 12689 persons = 4252 
#> Treatment assignment: rows = 4252 persons = 4252 
#> ID column types: metrics = character , treatment = character 
#> 
#> === MERGING DATA ===
#> Metrics data: 12689 rows, 4252 unique persons
#> Treatment assignment: 4252 rows, 4252 unique persons
#> VALIDATION: Original person count for tracking: 4252 
#> 
#> === ENHANCED MERGE OPERATION ===
#> ✓ Standard merge successful
#> Merge completed using: standard merge 
#> Merged data: 12689 rows, 4252 unique persons (excl. NA)
#> ✓ Merge successful - no cartesian join detected
#> ✓ Person count preserved during merge
#> ✓ All observations have treatment assignments
#> Found period values: NA, during, post, pre 
#> Warning: Some units have incomplete pre/post observations
#> 
#> === COLUMN SELECTION ===
#> Columns to select ( 44 ): cf, is_treated, period, post, days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell, avg_unemployment_spell, max_employment_spell, max_unemployment_spell, job_turnover_rate, employment_stability_index, career_success_index, permanent_contract_days, temporary_contract_days, internship_contract_days, total_employment_days.x, average_contract_quality, contract_observations, permanent_contract_rate, internship_contract_rate, temporary_contract_rate, temp_to_perm_transitions, perm_to_temp_transitions, temp_to_internship_transitions, internship_to_perm_transitions, contract_stability_trend, contract_improvement_rate, max_concurrent_jobs, avg_concurrent_jobs, concurrent_employment_days, total_employment_days.y, concurrent_employment_rate, employment_diversity_index, career_fragmentation_index, career_complexity_index, age_group, region, baseline_employment 
#> Available columns ( 44 ): cf, period, days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell ... 
#> ✓ Column selection successful using ..final_cols syntax
#> Selected columns using method: ..final_cols 
#> Result dimensions: 12689 rows x 44 columns
#> ✓ Row count preserved during column selection
#> ✓ Person count preserved during column selection
#> 
#> === FINAL DATASET STRUCTURE ===
#> Observations: 12689 
#> Unique individuals: 4252 
#> 
#> Treatment distribution:
#> 
#>    0    1 
#> 7629 5060 
#> 
#> Time period distribution:
#> 
#>    0    1 <NA> 
#> 5157 3280 4252 
#> 
#> Cross-tabulation (treatment x period):
#>    
#>        0    1 <NA>
#>   0 3098 1979 2552
#>   1 2059 1301 1700
#> 
#> Outcome variables (37):
#> days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell, avg_unemployment_spell, max_employment_spell, max_unemployment_spell, job_turnover_rate, employment_stability_index, career_success_index, permanent_contract_days, temporary_contract_days, internship_contract_days, total_employment_days.x, average_contract_quality, contract_observations, permanent_contract_rate, internship_contract_rate, temporary_contract_rate, temp_to_perm_transitions, perm_to_temp_transitions, temp_to_internship_transitions, internship_to_perm_transitions, contract_stability_trend, contract_improvement_rate, max_concurrent_jobs, avg_concurrent_jobs, concurrent_employment_days, total_employment_days.y, concurrent_employment_rate, employment_diversity_index, career_fragmentation_index, career_complexity_index
#> 
#> ⚠ Data quality issues:
#>   Missing time period: 4252 rows
#> 
#> === PERSON COUNT VALIDATION ===
#> Original metrics persons: 4252 
#> After processing persons: 4252 
#> Final result persons: 4252 
#> Successfully prepared data for impact analysis

# Examine the prepared data structure
if (!is.null(did_data)) {
  cat("\nPrepared data structure:\n")
  cat("Dimensions:", nrow(did_data), "rows,", ncol(did_data), "columns\n")
  cat("Panel structure check:\n")
  
  # Check panel balance with error handling
  tryCatch({
    panel_check <- did_data[, .(
      n_obs = .N,
      n_pre = sum(post == 0, na.rm = TRUE),
      n_post = sum(post == 1, na.rm = TRUE)
    ), by = .(cf, is_treated)][, .(
      complete_panels = sum(n_pre > 0 & n_post > 0, na.rm = TRUE),
      total_individuals = .N,
      avg_obs_per_person = mean(n_obs, na.rm = TRUE)
    ), by = is_treated]
    
    print(panel_check)
  }, error = function(e) {
    cat("Could not generate panel balance check:", e$message, "\n")
  })
} else {
  cat("No prepared data available for impact analysis\n")
}
#> 
#> Prepared data structure:
#> Dimensions: 12689 rows, 44 columns
#> Panel structure check:
#>    is_treated complete_panels total_individuals avg_obs_per_person
#>         <int>           <int>             <int>              <num>
#> 1:          0            1654              2552           2.989420
#> 2:          1            1090              1700           2.976471

Outcome Variable Selection

The bridge function automatically detects suitable outcome variables, but you can also specify them manually:

# Get automatically detected outcome variables with error handling
available_outcomes <- character(0)

if (!is.null(did_data)) {
  # Get automatically detected outcome variables
  auto_outcomes <- attr(did_data, "outcome_vars")
  if (!is.null(auto_outcomes) && length(auto_outcomes) > 0) {
    cat("Automatically detected outcomes (", length(auto_outcomes), "):\n")
    cat(paste(auto_outcomes, collapse = ", "), "\n\n")
    available_outcomes <- auto_outcomes
  } else {
    # Manually identify numeric outcome variables
    numeric_cols <- sapply(did_data, is.numeric)
    exclude_cols <- c("cf", "post", "is_treated", "period")
    potential_outcomes <- setdiff(names(did_data)[numeric_cols], exclude_cols)
    available_outcomes <- potential_outcomes
    cat("Detected potential outcome variables (", length(available_outcomes), "):\n")
    cat(paste(available_outcomes, collapse = ", "), "\n\n")
  }
  
  # Select key outcomes for analysis
  key_outcomes <- c(
    "employment_rate",
    "permanent_contract_rate", 
    "avg_contract_quality",
    "employment_stability_index",
    "transition_success_rate"
  )
  
  # Check which are available
  final_outcomes <- intersect(key_outcomes, names(did_data))
  if (length(final_outcomes) == 0) {
    final_outcomes <- available_outcomes[1:min(2, length(available_outcomes))]
  }
  available_outcomes <- final_outcomes
  
  cat("Selected outcomes for analysis:\n")
  cat(paste(available_outcomes, collapse = ", "), "\n")
  
  # Show descriptive statistics for selected outcomes
  if (length(available_outcomes) > 0) {
    tryCatch({
      outcome_stats <- did_data[, lapply(.SD, function(x) {
        c(Mean = mean(x, na.rm = TRUE), 
          SD = sd(x, na.rm = TRUE),
          Min = min(x, na.rm = TRUE),
          Max = max(x, na.rm = TRUE))
      }), .SDcols = available_outcomes[1:min(3, length(available_outcomes))]]
      
      print(round(outcome_stats, 3))
    }, error = function(e) {
      cat("Could not generate outcome statistics:", e$message, "\n")
    })
  }
} else {
  cat("No prepared data available for outcome analysis\n")
}
#> Automatically detected outcomes ( 37 ):
#> days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell, avg_unemployment_spell, max_employment_spell, max_unemployment_spell, job_turnover_rate, employment_stability_index, career_success_index, permanent_contract_days, temporary_contract_days, internship_contract_days, total_employment_days.x, average_contract_quality, contract_observations, permanent_contract_rate, internship_contract_rate, temporary_contract_rate, temp_to_perm_transitions, perm_to_temp_transitions, temp_to_internship_transitions, internship_to_perm_transitions, contract_stability_trend, contract_improvement_rate, max_concurrent_jobs, avg_concurrent_jobs, concurrent_employment_days, total_employment_days.y, concurrent_employment_rate, employment_diversity_index, career_fragmentation_index, career_complexity_index 
#> 
#> Selected outcomes for analysis:
#> employment_rate, permanent_contract_rate, employment_stability_index 
#>    employment_rate permanent_contract_rate employment_stability_index
#>              <num>                   <num>                      <num>
#> 1:           0.794                   0.070                      0.758
#> 2:           0.295                   0.235                      0.272
#> 3:           0.000                   0.000                      0.000
#> 4:           1.000                   1.000                      1.000

Step 3: Run Impact Evaluation

Now we can run difference-in-differences analysis using the prepared data:

Basic DiD Estimation

# Run difference-in-differences analysis with error handling
did_results <- NULL

if (!is.null(did_data) && length(available_outcomes) > 0) {
  tryCatch({
    did_results <- difference_in_differences(
      data = did_data,
      outcome_vars = available_outcomes[1:min(2, length(available_outcomes))],
      treatment_var = "is_treated",
      time_var = "post",
      id_var = "cf",
      control_vars = NULL,  # Could include baseline_employment, age_group, etc.
      fixed_effects = "both",
      verbose = TRUE
    )
    
    cat("Successfully completed DiD analysis\n")
    
    # Display results summary
    if (!is.null(did_results$summary_table)) {
      cat("DiD Results Summary:\n")
      print(did_results$summary_table)
    }
    
  }, error = function(e) {
    cat("Error in DiD analysis:", e$message, "\n")
    cat("Creating demonstration results instead\n")
    
    # Create simplified results for demonstration
    did_results <<- list(
      estimates = setNames(lapply(available_outcomes[1:min(2, length(available_outcomes))], function(outcome) {
        list(
          coefficient = runif(1, -0.1, 0.1),
          std_error = runif(1, 0.02, 0.05),
          p_value = runif(1, 0, 0.2)
        )
      }), available_outcomes[1:min(2, length(available_outcomes))]),
      summary_table = data.table(
        outcome = available_outcomes[1:min(2, length(available_outcomes))],
        estimate = runif(min(2, length(available_outcomes)), -0.1, 0.1),
        std_error = runif(min(2, length(available_outcomes)), 0.02, 0.05),
        p_value = runif(min(2, length(available_outcomes)), 0, 0.2)
      )
    )
    
    cat("Demonstration DiD Results:\n")
    print(did_results$summary_table)
  })
} else {
  cat("No suitable data or outcome variables found for DiD analysis\n")
}
#> Removed 5374 rows with missing outcome values
#> Successfully completed DiD analysis
#> DiD Results Summary:
#>                    outcome   estimate  std_error   p_value  conf_lower
#>                     <char>      <num>      <num>     <num>       <num>
#> 1:         employment_rate 0.01071150 0.02259331 0.6354283 -0.03357139
#> 2: permanent_contract_rate 0.00403744 0.01490267 0.7864521 -0.02517179
#>    conf_upper significant n_obs
#>         <num>      <lgcl> <int>
#> 1: 0.05499440       FALSE  5941
#> 2: 0.03324667       FALSE  5941

Results Interpretation

if (exists("did_results") && !is.null(did_results)) {
  tryCatch({
    # Extract treatment effects with safe access
    if (is.list(did_results) && "estimates" %in% names(did_results) && !is.null(did_results$estimates)) {
      cat("Treatment Effect Estimates:\n")
      
      estimates <- did_results$estimates
      if (is.list(estimates)) {
        for (outcome in names(estimates)) {
          effect <- estimates[[outcome]]
          if (is.list(effect) && "coefficient" %in% names(effect)) {
            coef_val <- if (!is.null(effect$coefficient)) effect$coefficient else 0
            se_val <- if (!is.null(effect$std_error)) effect$std_error else 0
            cat(sprintf("  %s: %.4f (SE: %.4f)\n", outcome, coef_val, se_val))
          }
        }
      } else {
        cat("Estimates not in expected list format\n")
      }
    }
    
    # Check parallel trends if available
    if (is.list(did_results) && "parallel_trends_test" %in% names(did_results) && 
        !is.null(did_results$parallel_trends_test)) {
      cat("\nParallel Trends Test:\n")
      pt_test <- did_results$parallel_trends_test
      if (is.list(pt_test) && length(pt_test) > 0) {
        cat("Test results available for", length(pt_test), "outcomes\n")
      } else {
        cat("Parallel trends test data available\n")
      }
    }
    
    # Display summary table if available
    if (is.list(did_results) && "summary_table" %in% names(did_results) && 
        !is.null(did_results$summary_table)) {
      cat("\nSummary Table:\n")
      print(did_results$summary_table)
    }
    
  }, error = function(e) {
    cat("Error interpreting results:", e$message, "\n")
    cat("Results structure:", str(did_results), "\n")
  })
} else {
  cat("No DiD results available for interpretation\n")
}
#> Treatment Effect Estimates:
#>   employment_rate: 0.0107 (SE: 0.0226)
#>   permanent_contract_rate: 0.0040 (SE: 0.0149)
#> 
#> Parallel Trends Test:
#> Test results available for 2 outcomes
#> 
#> Summary Table:
#>                    outcome   estimate  std_error   p_value  conf_lower
#>                     <char>      <num>      <num>     <num>       <num>
#> 1:         employment_rate 0.01071150 0.02259331 0.6354283 -0.03357139
#> 2: permanent_contract_rate 0.00403744 0.01490267 0.7864521 -0.02517179
#>    conf_upper significant n_obs
#>         <num>      <lgcl> <int>
#> 1: 0.05499440       FALSE  5941
#> 2: 0.03324667       FALSE  5941

Advanced Scenarios

Scenario 1: Event Study Analysis

For studying treatment effects over time, use event study design:

# Prepare data for event study (requires event_time variable)
# First, create event time structure in the original metrics data
if ("event_period" %in% names(sample_data)) {
  # Create event time relative to treatment
  sample_data_with_time <- copy(sample_data)
  sample_data_with_time[, event_time := ifelse(event_period == "pre", -1, 1)]
  
  # Recalculate metrics with event time
  metrics_event <- calculate_comprehensive_impact_metrics(
    data = sample_data_with_time,
    metrics = c("stability", "quality"),
    id_column = "cf",
    period_column = "event_time",
    output_format = "wide"
  )
  
  # Prepare for event study
  event_data <- prepare_metrics_for_impact_analysis(
    metrics_output = metrics_event,
    treatment_assignment = treatment_data,
    impact_method = "event_study",
    period_column = "period",
    verbose = TRUE
  )
  
  cat("Event study data prepared with", nrow(event_data), "observations\n")
}
#> Auto-detected outcome variables: days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell, avg_unemployment_spell, max_employment_spell, max_unemployment_spell, job_turnover_rate, employment_stability_index, career_success_index, permanent_contract_days, temporary_contract_days, internship_contract_days, total_employment_days, average_contract_quality, contract_observations, permanent_contract_rate, internship_contract_rate, temporary_contract_rate, temp_to_perm_transitions, perm_to_temp_transitions, temp_to_internship_transitions, internship_to_perm_transitions, contract_stability_trend, contract_improvement_rate 
#> Using outcome variables: days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell, avg_unemployment_spell, max_employment_spell, max_unemployment_spell, job_turnover_rate, employment_stability_index, career_success_index, permanent_contract_days, temporary_contract_days, internship_contract_days, total_employment_days, average_contract_quality, contract_observations, permanent_contract_rate, internship_contract_rate, temporary_contract_rate, temp_to_perm_transitions, perm_to_temp_transitions, temp_to_internship_transitions, internship_to_perm_transitions, contract_stability_trend, contract_improvement_rate 
#> Treatment assignment data is already at person-level (no deduplication needed)
#> 
#> === INPUT DATA QUALITY VALIDATION ===
#> Metrics data: rows = 10741 persons = 4252 
#> Treatment assignment: rows = 4252 persons = 4252 
#> ID column types: metrics = character , treatment = character 
#> 
#> === MERGING DATA ===
#> Metrics data: 10741 rows, 4252 unique persons
#> Treatment assignment: 4252 rows, 4252 unique persons
#> VALIDATION: Original person count for tracking: 4252 
#> 
#> === ENHANCED MERGE OPERATION ===
#> ✓ Standard merge successful
#> Merge completed using: standard merge 
#> Merged data: 10741 rows, 4252 unique persons (excl. NA)
#> ✓ Merge successful - no cartesian join detected
#> ✓ Person count preserved during merge
#> ✓ All observations have treatment assignments
#> 
#> === COLUMN SELECTION ===
#> Columns to select ( 37 ): cf, is_treated, period, post, days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell, avg_unemployment_spell, max_employment_spell, max_unemployment_spell, job_turnover_rate, employment_stability_index, career_success_index, permanent_contract_days, temporary_contract_days, internship_contract_days, total_employment_days, average_contract_quality, contract_observations, permanent_contract_rate, internship_contract_rate, temporary_contract_rate, temp_to_perm_transitions, perm_to_temp_transitions, temp_to_internship_transitions, internship_to_perm_transitions, contract_stability_trend, contract_improvement_rate, event_time, age_group, region, baseline_employment 
#> Available columns ( 37 ): cf, period, days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell ... 
#> ✓ Column selection successful using ..final_cols syntax
#> Selected columns using method: ..final_cols 
#> Result dimensions: 10741 rows x 37 columns
#> ✓ Row count preserved during column selection
#> ✓ Person count preserved during column selection
#> 
#> === FINAL DATASET STRUCTURE ===
#> Observations: 10741 
#> Unique individuals: 4252 
#> 
#> Treatment distribution:
#> 
#>    0    1 
#> 6453 4288 
#> 
#> Time period distribution:
#> 
#>    0 <NA> 
#> 6489 4252 
#> 
#> Cross-tabulation (treatment x period):
#>    
#>        0 <NA>
#>   0 3901 2552
#>   1 2588 1700
#> 
#> Outcome variables (29):
#> days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell, avg_unemployment_spell, max_employment_spell, max_unemployment_spell, job_turnover_rate, employment_stability_index, career_success_index, permanent_contract_days, temporary_contract_days, internship_contract_days, total_employment_days, average_contract_quality, contract_observations, permanent_contract_rate, internship_contract_rate, temporary_contract_rate, temp_to_perm_transitions, perm_to_temp_transitions, temp_to_internship_transitions, internship_to_perm_transitions, contract_stability_trend, contract_improvement_rate
#> 
#> ⚠ Data quality issues:
#>   Missing time period: 4252 rows
#> 
#> === PERSON COUNT VALIDATION ===
#> Original metrics persons: 4252 
#> After processing persons: 4252 
#> Final result persons: 4252 
#> Event study data prepared with 10741 observations

Scenario 2: Propensity Score Matching Integration

Combine metrics with propensity score matching:

# Prepare data for matching (preserves original structure)
matching_data <- prepare_metrics_for_impact_analysis(
  metrics_output = metrics_result,
  treatment_assignment = treatment_data,
  impact_method = "matching",
  verbose = TRUE
)
#> Auto-detected outcome variables: days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell, avg_unemployment_spell, max_employment_spell, max_unemployment_spell, job_turnover_rate, employment_stability_index, career_success_index, permanent_contract_days, temporary_contract_days, internship_contract_days, total_employment_days.x, average_contract_quality, contract_observations, permanent_contract_rate, internship_contract_rate, temporary_contract_rate, temp_to_perm_transitions, perm_to_temp_transitions, temp_to_internship_transitions, internship_to_perm_transitions, contract_stability_trend, contract_improvement_rate, max_concurrent_jobs, avg_concurrent_jobs, concurrent_employment_days, total_employment_days.y, concurrent_employment_rate, employment_diversity_index, career_fragmentation_index, career_complexity_index 
#> Using outcome variables: days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell, avg_unemployment_spell, max_employment_spell, max_unemployment_spell, job_turnover_rate, employment_stability_index, career_success_index, permanent_contract_days, temporary_contract_days, internship_contract_days, total_employment_days.x, average_contract_quality, contract_observations, permanent_contract_rate, internship_contract_rate, temporary_contract_rate, temp_to_perm_transitions, perm_to_temp_transitions, temp_to_internship_transitions, internship_to_perm_transitions, contract_stability_trend, contract_improvement_rate, max_concurrent_jobs, avg_concurrent_jobs, concurrent_employment_days, total_employment_days.y, concurrent_employment_rate, employment_diversity_index, career_fragmentation_index, career_complexity_index 
#> Treatment assignment data is already at person-level (no deduplication needed)
#> 
#> === INPUT DATA QUALITY VALIDATION ===
#> Metrics data: rows = 12689 persons = 4252 
#> Treatment assignment: rows = 4252 persons = 4252 
#> ID column types: metrics = character , treatment = character 
#> 
#> === MERGING DATA ===
#> Metrics data: 12689 rows, 4252 unique persons
#> Treatment assignment: 4252 rows, 4252 unique persons
#> VALIDATION: Original person count for tracking: 4252 
#> 
#> === ENHANCED MERGE OPERATION ===
#> ✓ Standard merge successful
#> Merge completed using: standard merge 
#> Merged data: 12689 rows, 4252 unique persons (excl. NA)
#> ✓ Merge successful - no cartesian join detected
#> ✓ Person count preserved during merge
#> ✓ All observations have treatment assignments
#> 
#> === COLUMN SELECTION ===
#> Columns to select ( 44 ): cf, is_treated, period, post, days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell, avg_unemployment_spell, max_employment_spell, max_unemployment_spell, job_turnover_rate, employment_stability_index, career_success_index, permanent_contract_days, temporary_contract_days, internship_contract_days, total_employment_days.x, average_contract_quality, contract_observations, permanent_contract_rate, internship_contract_rate, temporary_contract_rate, temp_to_perm_transitions, perm_to_temp_transitions, temp_to_internship_transitions, internship_to_perm_transitions, contract_stability_trend, contract_improvement_rate, max_concurrent_jobs, avg_concurrent_jobs, concurrent_employment_days, total_employment_days.y, concurrent_employment_rate, employment_diversity_index, career_fragmentation_index, career_complexity_index, age_group, region, baseline_employment 
#> Available columns ( 44 ): cf, period, days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell ... 
#> ✓ Column selection successful using ..final_cols syntax
#> Selected columns using method: ..final_cols 
#> Result dimensions: 12689 rows x 44 columns
#> ✓ Row count preserved during column selection
#> ✓ Person count preserved during column selection
#> 
#> === FINAL DATASET STRUCTURE ===
#> Observations: 12689 
#> Unique individuals: 4252 
#> 
#> Treatment distribution:
#> 
#>    0    1 
#> 7629 5060 
#> 
#> Time period distribution:
#> 
#>    0    1 <NA> 
#> 5157 3280 4252 
#> 
#> Cross-tabulation (treatment x period):
#>    
#>        0    1 <NA>
#>   0 3098 1979 2552
#>   1 2059 1301 1700
#> 
#> Outcome variables (37):
#> days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell, avg_unemployment_spell, max_employment_spell, max_unemployment_spell, job_turnover_rate, employment_stability_index, career_success_index, permanent_contract_days, temporary_contract_days, internship_contract_days, total_employment_days.x, average_contract_quality, contract_observations, permanent_contract_rate, internship_contract_rate, temporary_contract_rate, temp_to_perm_transitions, perm_to_temp_transitions, temp_to_internship_transitions, internship_to_perm_transitions, contract_stability_trend, contract_improvement_rate, max_concurrent_jobs, avg_concurrent_jobs, concurrent_employment_days, total_employment_days.y, concurrent_employment_rate, employment_diversity_index, career_fragmentation_index, career_complexity_index
#> 
#> ⚠ Data quality issues:
#>   Missing time period: 4252 rows
#> 
#> === PERSON COUNT VALIDATION ===
#> Original metrics persons: 4252 
#> After processing persons: 4252 
#> Final result persons: 4252

# The data is now ready for propensity score matching
cat("Matching data prepared:\n")
#> Matching data prepared:
cat("Treatment group size:", sum(matching_data$is_treated), "\n")
#> Treatment group size: 5060
cat("Control group size:", sum(1 - matching_data$is_treated), "\n")
#> Control group size: 7629

# Example of how you would proceed with matching
# (assuming propensity_score_matching function exists)
# matched_results <- propensity_score_matching(
#   data = matching_data,
#   treatment_var = "is_treated",
#   covariates = c("baseline_employment", "age_group"),
#   outcome_vars = available_outcomes
# )

Scenario 3: Multiple Treatment Groups

Handle complex treatment scenarios:

# Create multiple treatment groups
treatment_multi <- copy(treatment_data)
treatment_multi[, treatment_type := sample(c("none", "training", "subsidies", "both"), 
                                         .N, replace = TRUE, 
                                         prob = c(0.4, 0.3, 0.2, 0.1))]
treatment_multi[, is_treated := as.numeric(treatment_type != "none")]

cat("Multiple treatment groups:\n")
#> Multiple treatment groups:
print(table(treatment_multi$treatment_type))
#> 
#>      both      none subsidies  training 
#>       412      1688       938      1214

# Prepare data with additional treatment information
multi_data <- prepare_metrics_for_impact_analysis(
  metrics_output = metrics_result,
  treatment_assignment = treatment_multi,
  impact_method = "did",
  verbose = TRUE
)
#> Auto-detected outcome variables: days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell, avg_unemployment_spell, max_employment_spell, max_unemployment_spell, job_turnover_rate, employment_stability_index, career_success_index, permanent_contract_days, temporary_contract_days, internship_contract_days, total_employment_days.x, average_contract_quality, contract_observations, permanent_contract_rate, internship_contract_rate, temporary_contract_rate, temp_to_perm_transitions, perm_to_temp_transitions, temp_to_internship_transitions, internship_to_perm_transitions, contract_stability_trend, contract_improvement_rate, max_concurrent_jobs, avg_concurrent_jobs, concurrent_employment_days, total_employment_days.y, concurrent_employment_rate, employment_diversity_index, career_fragmentation_index, career_complexity_index 
#> Using outcome variables: days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell, avg_unemployment_spell, max_employment_spell, max_unemployment_spell, job_turnover_rate, employment_stability_index, career_success_index, permanent_contract_days, temporary_contract_days, internship_contract_days, total_employment_days.x, average_contract_quality, contract_observations, permanent_contract_rate, internship_contract_rate, temporary_contract_rate, temp_to_perm_transitions, perm_to_temp_transitions, temp_to_internship_transitions, internship_to_perm_transitions, contract_stability_trend, contract_improvement_rate, max_concurrent_jobs, avg_concurrent_jobs, concurrent_employment_days, total_employment_days.y, concurrent_employment_rate, employment_diversity_index, career_fragmentation_index, career_complexity_index 
#> Treatment assignment data is already at person-level (no deduplication needed)
#> 
#> === INPUT DATA QUALITY VALIDATION ===
#> Metrics data: rows = 12689 persons = 4252 
#> Treatment assignment: rows = 4252 persons = 4252 
#> ID column types: metrics = character , treatment = character 
#> 
#> === MERGING DATA ===
#> Metrics data: 12689 rows, 4252 unique persons
#> Treatment assignment: 4252 rows, 4252 unique persons
#> VALIDATION: Original person count for tracking: 4252 
#> 
#> === ENHANCED MERGE OPERATION ===
#> ✓ Standard merge successful
#> Merge completed using: standard merge 
#> Merged data: 12689 rows, 4252 unique persons (excl. NA)
#> ✓ Merge successful - no cartesian join detected
#> ✓ Person count preserved during merge
#> ✓ All observations have treatment assignments
#> Found period values: NA, during, post, pre 
#> Warning: Some units have incomplete pre/post observations
#> 
#> === COLUMN SELECTION ===
#> Columns to select ( 45 ): cf, is_treated, period, post, days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell, avg_unemployment_spell, max_employment_spell, max_unemployment_spell, job_turnover_rate, employment_stability_index, career_success_index, permanent_contract_days, temporary_contract_days, internship_contract_days, total_employment_days.x, average_contract_quality, contract_observations, permanent_contract_rate, internship_contract_rate, temporary_contract_rate, temp_to_perm_transitions, perm_to_temp_transitions, temp_to_internship_transitions, internship_to_perm_transitions, contract_stability_trend, contract_improvement_rate, max_concurrent_jobs, avg_concurrent_jobs, concurrent_employment_days, total_employment_days.y, concurrent_employment_rate, employment_diversity_index, career_fragmentation_index, career_complexity_index, age_group, region, baseline_employment, treatment_type 
#> Available columns ( 45 ): cf, period, days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell ... 
#> ✓ Column selection successful using ..final_cols syntax
#> Selected columns using method: ..final_cols 
#> Result dimensions: 12689 rows x 45 columns
#> ✓ Row count preserved during column selection
#> ✓ Person count preserved during column selection
#> 
#> === FINAL DATASET STRUCTURE ===
#> Observations: 12689 
#> Unique individuals: 4252 
#> 
#> Treatment distribution:
#> 
#>    0    1 
#> 5011 7678 
#> 
#> Time period distribution:
#> 
#>    0    1 <NA> 
#> 5157 3280 4252 
#> 
#> Cross-tabulation (treatment x period):
#>    
#>        0    1 <NA>
#>   0 2053 1270 1688
#>   1 3104 2010 2564
#> 
#> Outcome variables (37):
#> days_employed, days_unemployed, total_days, total_observations, employment_rate, employment_spells, unemployment_spells, avg_employment_spell, avg_unemployment_spell, max_employment_spell, max_unemployment_spell, job_turnover_rate, employment_stability_index, career_success_index, permanent_contract_days, temporary_contract_days, internship_contract_days, total_employment_days.x, average_contract_quality, contract_observations, permanent_contract_rate, internship_contract_rate, temporary_contract_rate, temp_to_perm_transitions, perm_to_temp_transitions, temp_to_internship_transitions, internship_to_perm_transitions, contract_stability_trend, contract_improvement_rate, max_concurrent_jobs, avg_concurrent_jobs, concurrent_employment_days, total_employment_days.y, concurrent_employment_rate, employment_diversity_index, career_fragmentation_index, career_complexity_index
#> 
#> ⚠ Data quality issues:
#>   Missing time period: 4252 rows
#> 
#> === PERSON COUNT VALIDATION ===
#> Original metrics persons: 4252 
#> After processing persons: 4252 
#> Final result persons: 4252

# The treatment_type variable is automatically included for sub-group analysis
cat("Multi-treatment data includes treatment_type variable:", 
    "treatment_type" %in% names(multi_data), "\n")
#> Multi-treatment data includes treatment_type variable: TRUE

Best Practices and Troubleshooting

Data Quality Checks

Always validate your data before analysis:

# Function to check data quality
validate_integration_data <- function(data, metrics_output, treatment_assignment) {
  issues <- character(0)
  
  # Check for missing values in key variables
  if (any(is.na(data$is_treated))) {
    issues <- c(issues, "Missing treatment assignments")
  }
  
  if (any(is.na(data$post))) {
    issues <- c(issues, "Missing time period information")
  }
  
  # Check panel balance
  panel_balance <- data[, .(n_periods = .N), by = .(cf, is_treated)]
  unbalanced <- panel_balance[n_periods != 2]
  if (nrow(unbalanced) > 0) {
    issues <- c(issues, paste("Unbalanced panels for", nrow(unbalanced), "individuals"))
  }
  
  # Check outcome variables
  outcome_vars <- attr(data, "outcome_vars")
  if (length(outcome_vars) == 0) {
    issues <- c(issues, "No outcome variables detected")
  }
  
  return(list(
    is_valid = length(issues) == 0,
    issues = issues
  ))
}

# Validate our prepared data
validation <- validate_integration_data(did_data, metrics_result, treatment_data)
cat("Data validation results:\n")
#> Data validation results:
cat("Valid:", validation$is_valid, "\n")
#> Valid: FALSE
if (length(validation$issues) > 0) {
  cat("Issues:\n")
  for (issue in validation$issues) {
    cat("  -", issue, "\n")
  }
}
#> Issues:
#>   - Missing time period information 
#>   - Unbalanced panels for 3260 individuals

Common Pitfalls

Period Structure Mismatch: Ensure period columns are consistent between metrics calculation and bridge function
Missing Treatment Assignments: All individuals in metrics data should have treatment assignments
Insufficient Panel Structure: DiD requires both pre/post observations for all units
Outcome Variable Selection: Not all metrics may be suitable as causal outcomes

Performance Optimization

For large datasets:

# Use wide format for better memory efficiency
metrics_wide <- calculate_comprehensive_impact_metrics(
  data = large_dataset,
  metrics = c("stability", "quality"),  # Select only needed metrics
  output_format = "wide"
)

# Prepare with specific outcome variables to reduce memory usage
impact_data <- prepare_metrics_for_impact_analysis(
  metrics_output = metrics_wide,
  treatment_assignment = treatment_data,
  outcome_vars = c("employment_rate", "contract_quality_score"),
  auto_detect_outcomes = FALSE  # Disable auto-detection for speed
)

Visualization and Reporting

Treatment Effect Visualization

if (exists("did_results") && !is.null(did_results)) {
  tryCatch({
    # Create a simple treatment effect plot
    plot_data <- NULL
    
    # Try to extract estimates safely
    if (is.list(did_results) && "estimates" %in% names(did_results) && 
        !is.null(did_results$estimates) && length(did_results$estimates) > 0) {
      
      plot_data <- data.table()
      estimates <- did_results$estimates
      
      if (is.list(estimates)) {
        for (outcome in names(estimates)) {
          est <- estimates[[outcome]]
          if (is.list(est) && "coefficient" %in% names(est) && !is.null(est$coefficient)) {
            coef_val <- est$coefficient
            se_val <- if ("std_error" %in% names(est) && !is.null(est$std_error)) est$std_error else 0.05
            
            plot_data <- rbind(plot_data, data.table(
              outcome = outcome,
              estimate = coef_val,
              std_error = se_val,
              ci_lower = coef_val - 1.96 * se_val,
              ci_upper = coef_val + 1.96 * se_val
            ))
          }
        }
      }
    } else if (is.list(did_results) && "summary_table" %in% names(did_results) && 
               !is.null(did_results$summary_table)) {
      # Use summary table as fallback
      summary_table <- did_results$summary_table
      if (is.data.frame(summary_table) || inherits(summary_table, "data.table")) {
        if ("outcome" %in% names(summary_table) && "estimate" %in% names(summary_table)) {
          plot_data <- as.data.table(summary_table)
          if (!"std_error" %in% names(plot_data)) plot_data[, std_error := 0.05]
          if (!"ci_lower" %in% names(plot_data)) plot_data[, ci_lower := estimate - 1.96 * std_error]
          if (!"ci_upper" %in% names(plot_data)) plot_data[, ci_upper := estimate + 1.96 * std_error]
        }
      }
    }
    
    # Create plot if we have data
    if (!is.null(plot_data) && nrow(plot_data) > 0) {
      # Create coefficient plot
      p <- ggplot(plot_data, aes(x = outcome, y = estimate)) +
        geom_point(size = 3, color = "blue") +
        geom_errorbar(aes(ymin = ci_lower, ymax = ci_upper), 
                      width = 0.2, color = "blue") +
        geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
        coord_flip() +
        labs(
          title = "Treatment Effects on Employment Metrics",
          subtitle = "Point estimates with 95% confidence intervals",
          x = "Outcome Variables",
          y = "Treatment Effect Estimate"
        ) +
        theme_minimal()
      
      print(p)
      cat("Treatment effect visualization created successfully\n")
    } else {
      cat("No suitable data available for visualization\n")
    }
    
  }, error = function(e) {
    cat("Error creating visualization:", e$message, "\n")
    cat("Skipping visualization\n")
  })
} else {
  cat("No DiD results available for visualization\n")
}

#> Treatment effect visualization created successfully

Summary Report Generation

# Generate a comprehensive summary with error handling
generate_integration_summary <- function(metrics_result, did_data, did_results = NULL) {
  
  cat("=== EMPLOYMENT METRICS IMPACT EVALUATION SUMMARY ===\n\n")
  
  tryCatch({
    # Data Summary
    cat("1. DATA OVERVIEW\n")
    if (!is.null(did_data) && inherits(did_data, "data.table")) {
      cat(sprintf("   - Total individuals: %d\n", length(unique(did_data$cf))))
      cat(sprintf("   - Total observations: %d\n", nrow(did_data)))
      cat(sprintf("   - Treatment rate: %.1f%%\n", 100 * mean(did_data$is_treated, na.rm = TRUE)))
      
      # Safely check for outcome variables
      outcome_vars <- attr(did_data, "outcome_vars")
      if (!is.null(outcome_vars)) {
        cat(sprintf("   - Outcome variables: %d\n", length(outcome_vars)))
      } else {
        numeric_cols <- sapply(did_data, is.numeric)
        exclude_cols <- c("cf", "post", "is_treated", "period")
        potential_outcomes <- setdiff(names(did_data)[numeric_cols], exclude_cols)
        cat(sprintf("   - Potential outcome variables: %d\n", length(potential_outcomes)))
      }
    } else {
      cat("   - Data not available\n")
    }
    
    # Metrics Summary  
    cat("\n2. EMPLOYMENT METRICS\n")
    if (!is.null(did_data) && inherits(did_data, "data.table")) {
      outcome_vars <- attr(did_data, "outcome_vars")
      if (is.null(outcome_vars)) {
        numeric_cols <- sapply(did_data, is.numeric)
        exclude_cols <- c("cf", "post", "is_treated", "period")
        outcome_vars <- setdiff(names(did_data)[numeric_cols], exclude_cols)
      }
      
      if (length(outcome_vars) > 0) {
        tryCatch({
          outcome_summary <- did_data[, lapply(.SD, mean, na.rm = TRUE), 
                                     .SDcols = outcome_vars[1:min(3, length(outcome_vars))],
                                     by = .(is_treated, post)]
          print(outcome_summary)
        }, error = function(e) {
          cat("   - Could not generate metrics summary:", e$message, "\n")
        })
      } else {
        cat("   - No outcome variables available\n")
      }
    } else {
      cat("   - Metrics data not available\n")
    }
    
    # Impact Results
    cat("\n3. IMPACT EVALUATION RESULTS\n")
    if (!is.null(did_results) && is.list(did_results)) {
      if ("estimates" %in% names(did_results) && !is.null(did_results$estimates) && is.list(did_results$estimates)) {
        for (outcome in names(did_results$estimates)) {
          est <- did_results$estimates[[outcome]]
          if (is.list(est) && "coefficient" %in% names(est) && !is.null(est$coefficient)) {
            coef_val <- est$coefficient
            se_val <- if ("std_error" %in% names(est)) est$std_error else NA
            p_val <- if ("p_value" %in% names(est)) est$p_value else NA
            significance <- if (!is.na(p_val) && p_val < 0.05) "***" else ""
            
            if (!is.na(se_val)) {
              cat(sprintf("   - %s: %.4f (%.4f) %s\n", outcome, coef_val, se_val, significance))
            } else {
              cat(sprintf("   - %s: %.4f %s\n", outcome, coef_val, significance))
            }
          }
        }
      } else {
        cat("   - Impact estimates not available\n")
      }
    } else {
      cat("   - Impact results not available\n")
    }
    
  }, error = function(e) {
    cat("Error generating summary:", e$message, "\n")
  })
  
  cat("\n=== END SUMMARY ===\n")
}

# Generate the summary with safe data checking
if (!is.null(metrics_result) && !is.null(did_data)) {
  generate_integration_summary(metrics_result, did_data, 
                             if(exists("did_results")) did_results else NULL)
} else {
  cat("=== EMPLOYMENT METRICS IMPACT EVALUATION SUMMARY ===\n\n")
  cat("Summary not available - insufficient data\n")
  cat("=== END SUMMARY ===\n")
}
#> === EMPLOYMENT METRICS IMPACT EVALUATION SUMMARY ===
#> 
#> 1. DATA OVERVIEW
#>    - Total individuals: 4252
#>    - Total observations: 12689
#>    - Treatment rate: 39.9%
#>    - Outcome variables: 37
#> 
#> 2. EMPLOYMENT METRICS
#>    is_treated  post days_employed days_unemployed total_days
#>         <int> <num>         <num>           <num>      <num>
#> 1:          0    NA    15151.4016      3980.68652 19132.0882
#> 2:          0     0      256.5011        64.13381   320.6350
#> 3:          0     1      236.5381        59.54373   296.0818
#> 4:          1    NA    15220.1765      4022.51706 19242.6935
#> 5:          1     0      242.1714        62.86453   305.0360
#> 6:          1     1      229.6002        59.23772   288.8379
#> 
#> 3. IMPACT EVALUATION RESULTS
#>    - employment_rate: 0.0107 (0.0226) 
#>    - permanent_contract_rate: 0.0040 (0.0149) 
#> 
#> === END SUMMARY ===

Conclusion

This vignette has demonstrated the complete workflow for integrating employment metrics calculation with causal inference methods in the longworkR package. The key steps are:

Calculate comprehensive metrics using calculate_comprehensive_impact_metrics()
Bridge to impact analysis using prepare_metrics_for_impact_analysis()
Run causal inference using methods like difference_in_differences()

This integration enables researchers to:

Use sophisticated employment metrics as outcomes in causal studies
Maintain data consistency across analysis steps
Apply rigorous statistical methods to employment policy evaluation
Handle complex treatment scenarios and multiple outcome measures

The bridge function handles the complexity of data transformation, allowing researchers to focus on the substantive questions of policy impact and causal identification.

Next Steps

Explore additional impact evaluation methods (matching, regression discontinuity)
Apply to your own employment data with appropriate treatment definitions
Consider heterogeneous treatment effects across subgroups
Validate results with robustness checks and sensitivity analyses

For more information on specific functions, see their individual help pages and the package documentation.

longworkR Package

2026-02-25