Coarsened Exact Matching (CEM) with Person-Level Matching — coarsened_exact

Performs coarsened exact matching to create balanced treatment and control groups by automatically binning continuous variables and exactly matching on specified factors. This function first aggregates event-level data to person-level characteristics, performs matching on people, then returns all events for matched individuals. This implementation provides both native CEM (when available) and a fallback implementation that doesn't require X11/tcltk dependencies.

Usage

coarsened_exact_matching(
  data,
  treatment_var = "is_treated",
  person_id_var = "cf",
  matching_variables,
  person_aggregation = "first",
  automatic_binning = TRUE,
  cutpoints = NULL,
  k2k = FALSE,
  control_ratio = 1,
  keep_all = TRUE,
  use_native_cem = TRUE,
  n_bins = 4,
  verbose = TRUE
)

Arguments

data: A data.table containing treatment and control observations (can be event-level)
treatment_var: Character. Name of treatment indicator variable. Default: "is_treated"
person_id_var: Character. Name of person identifier variable (e.g., "cf"). Default: "cf"
matching_variables: Character vector. Variables to include in matching
person_aggregation: Character. How to aggregate person characteristics: "first", "last", "mode", "mean". Default: "first"
automatic_binning: Logical. Automatically bin continuous variables? Default: TRUE
cutpoints: Named list. Custom cutpoints for continuous variables. Default: NULL
k2k: Logical. Perform k-to-k matching (1:1 ratio)? Default: FALSE
control_ratio: Numeric. Ratio of control to treatment units (e.g., 2 for 2:1). Default: 1
keep_all: Logical. Keep all matched observations? Default: TRUE
use_native_cem: Logical. Try to use native CEM package? Default: TRUE
n_bins: Integer. Number of bins for automatic binning. Default: 4
verbose: Logical. Print detailed information? Default: TRUE

Value

A list containing:

matched_data: Data.table with ALL events for matched individuals (both treated and control). Includes event_time column with values: "pre" (pre-treatment), "post" (post-treatment), "control" (control group), or NA
matched_persons: Data.table with person-level characteristics used for matching
match_summary: Summary of CEM procedure
imbalance_measures: L1 and other imbalance statistics
strata_info: Information about matching strata
aggregation_report: Report on person-level aggregation process
implementation_used: Which implementation was used: "native_cem" or "fallback"

Examples

if (FALSE) { # \dontrun{
# Process employment data first
employment_data <- vecshift(raw_employment_data)

# Add treatment indicator (e.g., policy intervention)
employment_data[, is_treated := some_treatment_condition]

# Basic CEM with person-level matching
cem_match <- coarsened_exact_matching(
  data = employment_data,
  person_id_var = "cf",
  matching_variables = c("age", "education", "sector"),
  person_aggregation = "first",
  automatic_binning = TRUE
)

# Result contains all events for matched persons
matched_employment_data <- cem_match$matched_data
person_characteristics <- cem_match$matched_persons

# CEM with 2:1 control-to-treatment ratio
cem_match_2to1 <- coarsened_exact_matching(
  data = employment_data,
  person_id_var = "cf",
  matching_variables = c("age", "wage", "experience"),
  control_ratio = 2,  # 2 controls per treated unit
  cutpoints = list(
    age = c(25, 35, 45, 55),
    wage = c(1000, 2000, 3000, 4000)
  )
)

# Force use of fallback implementation with custom ratio
cem_match_fallback <- coarsened_exact_matching(
  data = employment_data,
  person_id_var = "cf",
  matching_variables = c("age", "education", "sector"),
  control_ratio = 2,
  use_native_cem = FALSE
)
} # }