Employment Consolidation Strategies
Giampaolo Montaletti
2026-04-07
Source:vignettes/consolidation-strategies.Rmd
consolidation-strategies.RmdIntroduction
Why Consolidate Employment Data?
Employment records from administrative sources often contain fragmentation that doesn’t reflect true labor market attachment. Workers may have:
- Multiple concurrent jobs (part-time + part-time, seasonal + permanent)
- Brief gaps between contracts (1-2 days due to administrative processing)
- Contract renewals (same employer, consecutive periods)
-
Overlapping employment periods identified by
vecshift’s
over_idcolumn
Without consolidation, these patterns create noise in longitudinal employment analysis, making it difficult to:
- Identify true employment trajectories
- Calculate meaningful employment durations
- Analyze career stability and mobility
- Estimate transition probabilities accurately
The longworkR Consolidation Solution
The longworkR package provides four focused functions for employment consolidation, each with a single, clear purpose:
-
consolidate_overlapping()- Merges concurrent employment periods -
consolidate_by_employer()- Merges same-employer periods within a gap threshold -
consolidate_adjacent()- Merges touching employment periods (no gap) -
consolidate_short_gaps()- Bridges short unemployment gaps
These functions are composable (chainable with pipes) and performant (9x faster than previous implementations).
The New Consolidation API
From Complex to Simple
Previous approach (longworkR < 0.6.0): Single function with complex parameters
# OLD API (removed in v0.6.0)
consolidated <- consolidate_employment(
data,
mode = "temporal", # Complex parameter
type = "both", # Complex parameter
employer_var = "employer_id", # Complex parameter
min_lag = 30 # Complex parameter
)Problems with the old API: - 6 different implementation files for variations - Complex parameter interactions - Difficult to understand what consolidation was performed - Performance issues with large datasets - Hard to compose different consolidation strategies
New approach (longworkR 0.6.0+): Four focused functions
# NEW API (v0.6.0+)
consolidated <- data |>
consolidate_overlapping() |> # Clear: merge concurrent jobs
consolidate_by_employer("datore") |> # Clear: merge same-employer periods
consolidate_adjacent() |> # Clear: merge touching periods
consolidate_short_gaps(30) # Clear: bridge 30-day gapsBenefits of the new API: - ✅ Composable: Chain functions as needed - ✅ Focused: Each function does one thing well - ✅ Fast: 9x performance improvement - ✅ Clear: Intent is obvious from function names - ✅ Flexible: Choose which consolidations to apply
Function Guide: consolidate_overlapping()
Purpose
Merges concurrent employment periods where a person holds multiple
jobs at the same time. Uses vecshift’s over_id column to
identify overlapping employment.
When to Use
Use this function when: - Your data has been processed by vecshift
(creates over_id column) - You want to treat concurrent
employment as a single employment period - You need to simplify analysis
by consolidating multiple simultaneous jobs
How It Works
The function: 1. Groups records by cf (person ID) and
over_id > 0 2. Merges overlapping periods into single
records 3. Uses min start date and max end
date for consolidated period 4. Aggregates other columns using
weighted mode/mean by duration 5. Adds
n_periods_consolidated to track how many periods were
merged
Example: Basic Usage
# Load sample data
data <- readRDS("data/sample.rds")
# Check for overlapping employment
table(data$over_id > 0) # TRUE = overlapping periods
# Consolidate overlapping periods
consolidated <- consolidate_overlapping(data)
# Compare record counts
cat("Original records:", nrow(data), "\n")
cat("After consolidation:", nrow(consolidated), "\n")
cat("Reduction:", round((1 - nrow(consolidated)/nrow(data)) * 100, 1), "%\n")Example: Understanding the Consolidation
# View person with overlapping employment
person_165 <- data[cf == 165 & over_id > 0]
print(person_165[, .(cf, inizio, fine, durata, over_id, COD_TIPOLOGIA_CONTRATTUALE)])
# After consolidation
person_165_consolidated <- consolidate_overlapping(person_165)
print(person_165_consolidated[, .(cf, inizio, fine, durata, n_periods_consolidated)])
# Note: The consolidated period spans from earliest inizio to latest fine
# The n_periods_consolidated shows how many concurrent jobs were mergedAggregation Rules
When consolidating overlapping periods, columns are aggregated as:
| Column Type | Aggregation Method | Example |
|---|---|---|
inizio |
Minimum (earliest) | min(c(“2023-01-01”, “2023-01-15”)) = “2023-01-01” |
fine |
Maximum (latest) | max(c(“2023-02-15”, “2023-02-28”)) = “2023-02-28” |
durata |
Recalculated | fine - inizio + 1 |
| Numeric/Integer | Weighted mean | sum(value * durata) / sum(durata) |
| Character/Factor | Weighted mode | Most frequent value by total duration |
| Logical | Majority rule | mean(values) >= 0.5 |
arco |
Maximum | Indicates if any period was employment |
stato |
Employment state | Preferentially selects employment states |
Function Guide: consolidate_by_employer()
Purpose
Merges consecutive employment periods with the same employer, consolidating contracts separated by short gaps into single employment spells. This is useful when contract renewals or rollovers with the same employer should be treated as continuous employment.
When to Use
Use this function when: - You have an employer identifier column in your data - Contract renewals with the same employer should be treated as continuous employment - You want to distinguish same-employer continuity from cross-employer transitions - Short gaps between same-employer contracts are administrative artifacts
How It Works
The function: 1. Sorts records by cf, employer, and
start date 2. Detects new group starts when: - First
record for person - Employer changes - Gap >
max_gap_days days between periods - Previous or current
period is unemployment (arco == 0) 3. Consolidates periods
within each group using the shared aggregation engine
Important: Unemployment periods
(arco == 0) act as barriers that prevent
consolidation, even between same-employer records.
Example: Basic Usage
# Data with same-employer renewals
employer_data <- data.table(
cf = rep(1, 4),
inizio = as.Date(c("2023-01-01", "2023-04-05", "2023-07-01", "2023-04-01")),
fine = as.Date(c("2023-03-31", "2023-06-30", "2023-09-30", "2023-06-30")),
durata = c(90, 87, 92, 91),
arco = c(1, 1, 1, 1),
datore = c("Employer_A", "Employer_A", "Employer_A", "Employer_B")
)
# Consolidate same-employer periods (gap between first two = 5 days <= 8 default)
result <- consolidate_by_employer(employer_data, employer_var = "datore")
nrow(result) # 2 (Employer_A merged, Employer_B separate)Example: Adjusting Gap Threshold
# Strict: only merge if gap <= 3 days
strict <- consolidate_by_employer(employer_data, "datore", max_gap_days = 3)
nrow(strict) # 3 (5-day gap too large for strict threshold)
# Lenient: merge if gap <= 30 days
lenient <- consolidate_by_employer(employer_data, "datore", max_gap_days = 30)
nrow(lenient) # 2 (all Employer_A periods merged)Difference from Other Functions
| Aspect | consolidate_overlapping() |
consolidate_by_employer() |
consolidate_adjacent() |
consolidate_short_gaps() |
|---|---|---|---|---|
| What it merges | Concurrent employment | Same-employer sequential | All sequential (no gap) | Sequential across gaps |
Uses over_id |
Yes | No | No | No |
| Employer-aware | No | Yes (required) | No | No |
| Gap tolerance | N/A | Configurable (max_gap_days) |
Exactly 0 days | Configurable |
| Typical use case | Multiple simultaneous jobs | Contract renewals | Administrative boundaries | Short unemployment |
Function Guide: consolidate_adjacent()
Purpose
Merges employment periods that are contiguous in time (touching, with no gap between them). This consolidates contract renewals and consecutive employment spells with the same or different employers.
When to Use
Use this function when: - You want to treat consecutive employment as continuous - Contract renewals should be viewed as single employment spells - You’re analyzing employment stability and don’t want administrative contract boundaries to fragment true employment periods
How It Works
The function: 1. Sorts records by cf and date 2.
Calculates gap in days between consecutive periods 3. Detects
new group starts when: - First record for person - Gap
> 0 days between periods - Previous period was unemployment
(arco == 0) - Current period is unemployment 4.
Consolidates periods within each group
Important: Unemployment periods
(arco == 0) act as barriers that prevent
consolidation.
Example: Basic Usage
# Create test data with adjacent periods
test_data <- data.table(
cf = rep(1, 3),
inizio = as.Date(c("2023-01-01", "2023-02-01", "2023-03-01")),
fine = as.Date(c("2023-01-31", "2023-02-28", "2023-03-31")),
durata = c(31, 28, 31),
arco = c(1, 1, 1),
COD_TIPOLOGIA_CONTRATTUALE = c("A.03.00", "A.03.00", "A.01.00")
)
# Note: Jan 31 → Feb 1 and Feb 28 → Mar 1 are adjacent (no gap)
result <- consolidate_adjacent(test_data)
print(result)
# Result: 1 record spanning Jan 1 - Mar 31
# n_periods_consolidated = 3
# Contract type = "A.03.00" (weighted mode, 59 days vs 31 days)Example: Unemployment as Barrier
# Data with unemployment between employment
barrier_data <- data.table(
cf = rep(1, 3),
inizio = as.Date(c("2023-01-01", "2023-02-01", "2023-03-01")),
fine = as.Date(c("2023-01-31", "2023-02-28", "2023-03-31")),
durata = c(31, 28, 31),
arco = c(1, 0, 1), # Middle period is unemployment
stato = c("occupato", "disoccupato", "occupato")
)
result <- consolidate_adjacent(barrier_data)
print(result)
# Result: 3 records (unemployment blocks consolidation)
# Even though dates are adjacent, unemployment creates a breakDifference from consolidate_overlapping()
| Aspect | consolidate_overlapping() |
consolidate_adjacent() |
|---|---|---|
| What it merges | Concurrent employment (same time) | Sequential employment (no gap) |
Uses over_id |
Yes (required) | No |
| Gap tolerance | N/A (overlapping by definition) | Exactly 0 days |
| Unemployment | Kept separate | Acts as barrier |
| Typical use case | Multiple simultaneous jobs | Contract renewals |
Function Guide: consolidate_short_gaps()
Purpose
Bridges short unemployment gaps between employment periods, consolidating employment spells separated by brief periods of inactivity. This is useful for analyzing labor market attachment when short gaps don’t represent true labor market exit.
When to Use
Use this function when: - You want to treat brief unemployment (1-30 days) as part of continuous employment - You’re analyzing labor market attachment over time - Administrative gaps between contracts shouldn’t fragment employment trajectories
How It Works
The function: 1. Calculates gap in days between consecutive periods
2. Creates consolidation groups where gap ≤ max_gap_days 3.
Consolidates periods within each group 4. Adds
non_working_days column tracking total
unemployment days bridged
Important: Unlike
consolidate_adjacent(), this function
includes unemployment periods in the consolidation and
tracks them.
Choosing the Right Threshold
Different max_gap_days values suit different
analyses:
| Threshold | Use Case | Interpretation |
|---|---|---|
| 7-14 days | Strict continuity | Very short breaks, sick leave |
| 30 days | Monthly analysis (default) | Standard employment continuity |
| 60-90 days | Seasonal work | Quarterly employment patterns |
| 180+ days | Long-term attachment | Persistent labor market presence |
Example: Basic Gap Bridging
# Employment with 15-day gap
gap_data <- data.table(
cf = rep(1, 5),
inizio = as.Date(c("2023-01-01", "2023-01-20", "2023-02-01",
"2023-03-01", "2023-04-01")),
fine = as.Date(c("2023-01-15", "2023-01-25", "2023-02-15",
"2023-03-15", "2023-04-15")),
durata = c(15, 6, 15, 15, 15),
arco = c(1, 0, 1, 0, 1) # Alternating employment/unemployment
)
# Bridge gaps up to 30 days
result_30 <- consolidate_short_gaps(gap_data, max_gap_days = 30)
cat("Records with 30-day threshold:", nrow(result_30), "\n")
cat("Non-working days bridged:", result_30$non_working_days, "\n")
# Compare with stricter threshold
result_10 <- consolidate_short_gaps(gap_data, max_gap_days = 10)
cat("Records with 10-day threshold:", nrow(result_10), "\n")Example: Analyzing Employment Quality
# After full consolidation chain
data <- readRDS("data/sample.rds")
final <- data |>
consolidate_overlapping() |>
consolidate_adjacent() |>
consolidate_short_gaps(30)
# Analyze employment quality by non_working_days
summary(final$non_working_days)
# Employment quality categories
final[, quality_category := fcase(
non_working_days == 0, "Continuous",
non_working_days <= 7, "High quality",
non_working_days <= 30, "Good quality",
non_working_days <= 90, "Moderate quality",
default = "Low quality"
)]
table(final$quality_category)The non_working_days Column
This column is critical for understanding consolidation quality:
- 0 days: Truly continuous employment (no gaps bridged)
- 1-30 days: Brief gaps, likely administrative or short unemployment
- 31-90 days: Longer gaps, seasonal patterns or search periods
- 90+ days: Extended unemployment bridged (use with caution)
Progressive Consolidation: Chaining Functions
The Recommended Workflow
The four consolidation functions are designed to be used in sequence, creating a progressive consolidation chain:
# RECOMMENDED: Full consolidation chain
consolidated_data <- raw_data |>
consolidate_overlapping() |> # Step 1: Merge concurrent employment
consolidate_by_employer("datore") |> # Step 2: Merge same-employer renewals
consolidate_adjacent() |> # Step 3: Merge touching periods
consolidate_short_gaps(30) # Step 4: Bridge short gapsWhy Order Matters
Always consolidate in this order:
- Overlapping first: Simplifies concurrent employment before analyzing sequences
- Employer second: Consolidates same-employer renewals while employer identity is still available
- Adjacent third: Removes administrative contract boundaries
- Short gaps last: Most aggressive, should work on already-simplified data
Wrong order can produce suboptimal results:
# WRONG: Bridging gaps before merging adjacent periods
wrong_order <- data |>
consolidate_short_gaps(30) |> # Gap calculation includes noise
consolidate_adjacent() # Less effective on noisy data
# RIGHT: Clean up first, then bridge gaps
right_order <- data |>
consolidate_adjacent() |> # Remove noise first
consolidate_short_gaps(30) # Gap calculation more accurateSelective Consolidation
You don’t have to use all three functions. Choose based on your analysis needs:
# Minimal: Only merge concurrent employment
minimal <- data |> consolidate_overlapping()
# Conservative: Merge concurrent and adjacent, no gap bridging
conservative <- data |>
consolidate_overlapping() |>
consolidate_adjacent()
# Aggressive: Full chain with generous gap threshold
aggressive <- data |>
consolidate_overlapping() |>
consolidate_adjacent() |>
consolidate_short_gaps(90) # Bridge up to 3 monthsExample: Progressive Analysis
# Load data
data <- readRDS("data/sample.rds")
# Track consolidation impact at each step
cat("Original data:", nrow(data), "records\n\n")
step1 <- consolidate_overlapping(data)
cat("After overlapping consolidation:", nrow(step1), "records\n")
cat("Reduction:", nrow(data) - nrow(step1), "records\n")
cat("Percentage:", round((1 - nrow(step1)/nrow(data)) * 100, 1), "%\n\n")
step2 <- consolidate_adjacent(step1)
cat("After adjacent consolidation:", nrow(step2), "records\n")
cat("Additional reduction:", nrow(step1) - nrow(step2), "records\n")
cat("Cumulative:", round((1 - nrow(step2)/nrow(data)) * 100, 1), "%\n\n")
step3 <- consolidate_short_gaps(step2, max_gap_days = 30)
cat("After gap bridging (30d):", nrow(step3), "records\n")
cat("Additional reduction:", nrow(step2) - nrow(step3), "records\n")
cat("Total reduction:", round((1 - nrow(step3)/nrow(data)) * 100, 1), "%\n")Integration with analyze_employment_transitions()
The New Workflow
In longworkR 0.6.0+, consolidation is decoupled from transition analysis. You consolidate data first, then analyze transitions:
# NEW WORKFLOW (v0.6.0+)
# Step 1: Consolidate employment data
consolidated <- data |>
consolidate_overlapping() |>
consolidate_adjacent() |>
consolidate_short_gaps(30)
# Step 2: Analyze transitions on consolidated data
transitions <- analyze_employment_transitions(consolidated)
# Step 3: Visualize
plot_transitions_network(transitions)Why This Is Better
The decoupled approach provides:
- Flexibility: Choose consolidation strategy independent of analysis
- Clarity: Explicit about what consolidation was performed
- Reusability: Consolidate once, use for multiple analyses
- Performance: Consolidation optimized separately from transition analysis
Example: Multiple Analyses on Same Consolidation
# Consolidate once
consolidated <- data |>
consolidate_overlapping() |>
consolidate_adjacent() |>
consolidate_short_gaps(30)
# Use for different analyses
transitions <- analyze_employment_transitions(consolidated)
survival_results <- estimate_contract_survival(consolidated)
impact_events <- identify_treatment_events(consolidated, ...)
# Or analyze with different parameters
transitions_first <- analyze_employment_transitions(consolidated, eval_chain = "first")
transitions_last <- analyze_employment_transitions(consolidated, eval_chain = "last")Comparing Consolidation Strategies
# Compare impact of different consolidation strategies
# No consolidation
trans_none <- analyze_employment_transitions(data)
# Minimal consolidation
data_minimal <- consolidate_overlapping(data)
trans_minimal <- analyze_employment_transitions(data_minimal)
# Full consolidation
data_full <- data |>
consolidate_overlapping() |>
consolidate_adjacent() |>
consolidate_short_gaps(30)
trans_full <- analyze_employment_transitions(data_full)
# Compare transition counts
cat("No consolidation:", nrow(trans_none$transitions), "transitions\n")
cat("Minimal:", nrow(trans_minimal$transitions), "transitions\n")
cat("Full:", nrow(trans_full$transitions), "transitions\n")Performance Guide
Performance Achievements
The new consolidation functions deliver exceptional performance:
| Metric | Value | Comparison |
|---|---|---|
| Overall speedup | 9x faster | vs. previous implementation |
| Throughput | ~41,000 records/sec | Full consolidation chain |
| Memory efficiency | < 1x input size | No memory bloat |
| Scalability | 10M+ records | Production-ready |
Benchmark: Real-World Dataset
# Load large dataset (434,103 employment records)
large_data <- readRDS("data/large_sample.rds")
# Benchmark each function
system.time({
result1 <- consolidate_overlapping(large_data)
}) # ~14 seconds (9.1x faster than baseline)
system.time({
result2 <- consolidate_adjacent(result1)
}) # ~13 seconds (9.2x faster than baseline)
system.time({
result3 <- consolidate_short_gaps(result2, max_gap_days = 30)
}) # ~11 seconds (3.6x faster than baseline)
# Full chain
system.time({
final <- large_data |>
consolidate_overlapping() |>
consolidate_adjacent() |>
consolidate_short_gaps(30)
}) # ~38 seconds total (7.6x faster, was 287 seconds)
# Record reduction
cat("Original:", nrow(large_data), "records\n")
cat("Final:", nrow(final), "records\n")
cat("Reduction:", round((1 - nrow(final)/nrow(large_data)) * 100, 1), "%\n")Performance Best Practices
-
Use data.table format:
# Good: data.table (fast) dt <- data.table::as.data.table(data) result <- consolidate_overlapping(dt) # Bad: data.frame (will convert, slower) df <- as.data.frame(data) result <- consolidate_overlapping(df) # Throws error -
Set key for better performance (optional, done automatically):
data.table::setkey(data, cf, inizio, fine) result <- consolidate_adjacent(data) # Slightly faster -
Use pipe chains for clarity:
# Good: Clear pipeline result <- data |> consolidate_overlapping() |> consolidate_adjacent() # Less clear: Nested calls result <- consolidate_adjacent(consolidate_overlapping(data)) -
Consolidate before analysis, not during:
# Good: Consolidate once, use many times consolidated <- data |> consolidate_overlapping() trans1 <- analyze_employment_transitions(consolidated, eval_chain = "first") trans2 <- analyze_employment_transitions(consolidated, eval_chain = "last") # Bad: Consolidate repeatedly trans1 <- analyze_employment_transitions( consolidate_overlapping(data), eval_chain = "first" ) trans2 <- analyze_employment_transitions( consolidate_overlapping(data), eval_chain = "last" )
Memory Considerations
The consolidation functions are memory-efficient:
- No data copying: Uses data.table’s modify-by-reference where safe
- < 1x memory overhead: Peak memory ≈ input data size
- Vectorized operations: No loop-based memory buildup
For extremely large datasets (100M+ records), consider processing in chunks:
# Process by person ID ranges
chunk_size <- 1000000 # 1M records per chunk
person_ids <- unique(data$cf)
chunks <- split(person_ids, ceiling(seq_along(person_ids) / chunk_size))
results <- lapply(chunks, function(ids) {
chunk_data <- data[cf %in% ids]
chunk_data |>
consolidate_overlapping() |>
consolidate_adjacent() |>
consolidate_short_gaps(30)
})
final_result <- rbindlist(results)Migration Guide: Upgrading from < 0.6.0
Breaking Changes in v0.6.0
longworkR 0.6.0 introduces breaking changes by
removing the old consolidate_employment() function and its
variants. If you’re upgrading from an earlier version, you’ll need to
update your code.
Removed Functions
The following functions are no longer available:
consolidate_employment()consolidate_employment_fast()consolidate_employment_robust()consolidate_employment_safe()consolidate_employment_ultra_fast()
Migration Examples
Example 1: Basic Temporal Consolidation
# OLD CODE (< 0.6.0) - NO LONGER WORKS
consolidated <- consolidate_employment(
data,
mode = "temporal",
type = "both"
)
# NEW CODE (0.6.0+)
consolidated <- data |>
consolidate_overlapping() |>
consolidate_adjacent()Example 2: With Gap Bridging
# OLD CODE (< 0.6.0) - NO LONGER WORKS
consolidated <- consolidate_employment(
data,
mode = "temporal",
type = "both",
min_lag = 30
)
# NEW CODE (0.6.0+)
consolidated <- data |>
consolidate_overlapping() |>
consolidate_adjacent() |>
consolidate_short_gaps(30) # min_lag → max_gap_daysExample 3: Within analyze_employment_transitions()
# OLD CODE (< 0.6.0) - NO LONGER WORKS
result <- analyze_employment_transitions(
data,
consolidation_mode = "temporal",
consolidation_type = "both"
)
# NEW CODE (0.6.0+)
# Consolidate first, then analyze
consolidated <- data |>
consolidate_overlapping() |>
consolidate_adjacent()
result <- analyze_employment_transitions(consolidated)Example 4: Only Overlapping Consolidation
# OLD CODE (< 0.6.0)
consolidated <- consolidate_employment(
data,
mode = "temporal",
type = "overlapping"
)
# NEW CODE (0.6.0+)
consolidated <- consolidate_overlapping(data)Example 5: Only Adjacent Consolidation
# OLD CODE (< 0.6.0)
consolidated <- consolidate_employment(
data,
mode = "temporal",
type = "adjacent"
)
# NEW CODE (0.6.0+)
consolidated <- consolidate_adjacent(data)Parameter Mapping
| Old Parameter | New Approach |
|---|---|
mode = "temporal" |
Use consolidate_overlapping() and/or
consolidate_adjacent()
|
type = "both" |
Chain
consolidate_overlapping() |> consolidate_adjacent()
|
type = "overlapping" |
Use consolidate_overlapping() only |
type = "adjacent" |
Use consolidate_adjacent() only |
min_lag = N |
Use consolidate_short_gaps(max_gap_days = N)
|
employer_var = "..." |
Use consolidate_by_employer(employer_var = "...")
|
Employer-Based Consolidation
The old mode = "employer" parameter maps directly to the
new consolidate_by_employer() function:
# OLD CODE (< 0.6.0) - NO LONGER WORKS
consolidated <- consolidate_employment(
data,
mode = "employer",
employer_var = "employer_id",
min_lag = 30
)
# NEW CODE (0.6.0+)
consolidated <- consolidate_by_employer(
data,
employer_var = "employer_id",
max_gap_days = 30 # min_lag → max_gap_days
)
# Or as part of a pipeline
consolidated <- data |>
consolidate_overlapping() |>
consolidate_by_employer("employer_id", max_gap_days = 30) |>
consolidate_adjacent()Updated Workflow Pattern
The new API separates consolidation from analysis:
# OLD PATTERN (< 0.6.0)
result <- analyze_employment_transitions(
data,
consolidation_mode = "temporal",
consolidation_type = "both",
eval_chain = "first"
)
# NEW PATTERN (0.6.0+)
# Step 1: Consolidate
consolidated <- data |>
consolidate_overlapping() |>
consolidate_adjacent()
# Step 2: Analyze
result <- analyze_employment_transitions(
consolidated,
eval_chain = "first"
)Benefits of Migration
Migrating to the new API provides:
- 9x performance improvement: Much faster consolidation
- Clearer code: Intent obvious from function names
- More flexible: Mix and match consolidation strategies
- Better maintained: Single, optimized implementation
- Composable: Easy to chain with pipes
Summary
Key Takeaways
-
Four focused functions replace complex old API:
-
consolidate_overlapping()- Concurrent employment -
consolidate_by_employer()- Same-employer renewals -
consolidate_adjacent()- Touching periods -
consolidate_short_gaps()- Bridge unemployment gaps
-
Composable design: Chain functions with pipes in logical order
9x performance improvement: Handles 10M+ records efficiently
Decoupled from analysis: Consolidate first, then analyze
Breaking changes: Old
consolidate_employment()removed in v0.6.0
Recommended Workflow
# 1. Load data
data <- readRDS("data/sample.rds")
# 2. Progressive consolidation
consolidated <- data |>
consolidate_overlapping() |> # Merge concurrent jobs
consolidate_by_employer("datore") |> # Merge same-employer renewals
consolidate_adjacent() |> # Merge touching periods
consolidate_short_gaps(30) # Bridge 30-day gaps
# 3. Analyze
transitions <- analyze_employment_transitions(consolidated)
# 4. Visualize
plot_transitions_network(transitions)Further Reading
- Function documentation:
?consolidate_overlapping,?consolidate_by_employer,?consolidate_adjacent,?consolidate_short_gaps - Performance vignette:
vignette("performance-optimizations", package = "longworkR") - Transition analysis:
?analyze_employment_transitions - vecshift package: Understanding the
over_idcolumn
Session Info
sessionInfo()
#> R version 4.5.3 (2026-03-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] data.table_1.18.2.1 longworkR_0.8.2 vecshift_1.0.5
#>
#> loaded via a namespace (and not attached):
#> [1] sass_0.4.10 generics_0.1.4 lattice_0.22-9 hms_1.1.4
#> [5] digest_0.6.39 magrittr_2.0.5 evaluate_1.0.5 grid_4.5.3
#> [9] RColorBrewer_1.1-3 fastmap_1.2.0 jsonlite_2.0.0 Matrix_1.7-4
#> [13] progress_1.2.3 survival_3.8-6 scales_1.4.0 textshaping_1.0.5
#> [17] jquerylib_0.1.4 cli_3.6.5 rlang_1.2.0 crayon_1.5.3
#> [21] splines_4.5.3 cachem_1.1.0 yaml_2.3.12 otel_0.2.0
#> [25] tools_4.5.3 parallel_4.5.3 dplyr_1.2.1 ggplot2_4.0.2
#> [29] vctrs_0.7.2 R6_2.6.1 matrixStats_1.5.0 lifecycle_1.0.5
#> [33] fs_2.0.1 htmlwidgets_1.6.4 ragg_1.5.2 cluster_2.1.8.2
#> [37] pkgconfig_2.0.3 desc_1.4.3 pkgdown_2.2.0 bslib_0.10.0
#> [41] pillar_1.11.1 gtable_0.3.6 glue_1.8.0 Rcpp_1.1.1
#> [45] systemfonts_1.3.2 collapse_2.1.6 xfun_0.57 tibble_3.3.1
#> [49] tidyselect_1.2.1 knitr_1.51 farver_2.1.2 htmltools_0.5.9
#> [53] rmarkdown_2.31 ggalluvial_0.12.6 compiler_4.5.3 prettyunits_1.2.0
#> [57] S7_0.2.1