Consolidates contiguous employment periods with no gap or unemployment between them. Two employment periods are adjacent if the end of one is immediately followed by the start of the next (no days between). Unemployment periods act as barriers that prevent consolidation.
Arguments
- data
data.table with employment records. Must contain columns:
cf,inizio,fine,durata. Thearcocolumn is used if present to identify employment vs unemployment periods.- variable_handling
Character string specifying aggregation strategy for variables:
"weight"uses weighted mean/mode (default),"first"takes first non-NA value- engine
Character string specifying the consolidation engine:
"v2"(default) uses the collapse-native engine for maximum performance,"v1"uses the original data.table J-expression engine for backward compatibility.
Value
data.table with adjacent employment periods consolidated. Includes all
original columns plus n_periods_consolidated indicating how many
periods were merged (1 means no consolidation occurred for that record).
Details
What makes periods "adjacent":
Two employment periods are adjacent if:
They belong to the same person (
cf)They are consecutive in time (no gap days between them)
Both are employment periods (
arco > 0or missing)There is no unemployment period between them
How unemployment acts as a barrier:
Unemployment periods (arco == 0) prevent consolidation. For example,
if you have Employment-Unemployment-Employment, these will NOT be consolidated
even if the dates are adjacent. Use consolidate_short_gaps if
you want to bridge unemployment gaps.
Difference from overlapping consolidation:
consolidate_overlapping: Merges concurrent employment (sameover_id)consolidate_adjacent: Merges sequential employment with no gap
Aggregation rules:
When consolidating periods, the function:
Uses
min(inizio)andmax(fine)for date rangeRecalculates
durataas the full spanUses weighted mode for qualitative variables (e.g., contract type)
Uses weighted mean for quantitative variables (e.g., salary)
Weights are based on the
durataof each period
Performance:
Fully vectorized implementation with exceptional performance:
Handles 10M+ employment records efficiently
9x faster than previous consolidation implementations (Phase 3)
Phase 4 optimization: 1.2-3x additional speedup via single-period worker bypass
Memory efficient: < 1x input data size
Base throughput: ~41,000 records/second (Phase 3)
Optimized throughput: ~50,000-120,000 records/second (Phase 4, dataset dependent)
Phase 4 automatically skips consolidation for single-period workers (no adjacent periods possible). Performance scales with percentage of single-period workers:
20% singles: ~1.2x speedup
40% singles: ~1.4x speedup
50% singles: ~1.7x speedup
70% singles: ~2.9x speedup
Composability:
This function is designed to be chained with other consolidation functions:
data |>
consolidate_overlapping() |> # First merge concurrent
consolidate_adjacent() |> # Then merge adjacent
consolidate_short_gaps(30) # Finally bridge short gapsThe order matters: always consolidate overlapping employment first, then adjacent periods, and finally bridge gaps if needed.
See also
consolidate_overlapping for concurrent employment consolidation
consolidate_by_employer for same-employer consolidation
consolidate_short_gaps for gap-bridging consolidation
consolidation_helpers for internal aggregation functions
Examples
if (FALSE) { # \dontrun{
# Basic: Consolidate 3 consecutive employment periods
data <- data.table::data.table(
cf = rep(1, 3),
inizio = as.Date(c("2023-01-01", "2023-02-01", "2023-03-01")),
fine = as.Date(c("2023-01-31", "2023-02-28", "2023-03-31")),
durata = c(31, 28, 31),
arco = c(1, 1, 1)
)
result <- consolidate_adjacent(data)
nrow(result) # 1 (all three periods merged)
result$n_periods_consolidated # 3
# With gaps: periods separated by days won't consolidate
data_with_gap <- data.table::data.table(
cf = rep(1, 3),
inizio = as.Date(c("2023-01-01", "2023-02-01", "2023-04-01")),
fine = as.Date(c("2023-01-31", "2023-02-28", "2023-04-30")),
durata = c(31, 28, 30),
arco = c(1, 1, 1)
)
# Periods 1-2 are adjacent (Jan 31 → Feb 1)
# Period 3 has a gap (Feb 28 → Apr 1 = 32 days)
result <- consolidate_adjacent(data_with_gap)
nrow(result) # 2 (first two merged, third separate)
# With unemployment barrier
data_barrier <- data.table::data.table(
cf = rep(1, 3),
inizio = as.Date(c("2023-01-01", "2023-02-01", "2023-03-01")),
fine = as.Date(c("2023-01-31", "2023-02-28", "2023-03-31")),
durata = c(31, 28, 31),
arco = c(1, 0, 1) # Middle period is unemployment
)
result <- consolidate_adjacent(data_barrier)
nrow(result) # 3 (unemployment blocks consolidation)
# Chaining: after consolidate_overlapping()
data <- readRDS("data/sample.rds")
result <- data |>
consolidate_overlapping() |>
consolidate_adjacent()
cat("Original records:", nrow(data), "\n")
cat("After consolidation:", nrow(result), "\n")
# Integration with transition analysis
consolidated <- data |>
consolidate_overlapping() |>
consolidate_adjacent()
transitions <- analyze_employment_transitions(consolidated)
# Edge case: empty data
empty_data <- data.table::data.table(
cf = integer(),
inizio = as.Date(character()),
fine = as.Date(character()),
durata = integer()
)
result_empty <- consolidate_adjacent(empty_data) # Returns empty data.table
# Edge case: single record
single_record <- data[1]
result_single <- consolidate_adjacent(single_record) # Returns as-is
} # }