Frequently Asked Questions (FAQ)

This document covers common questions about the GeoLift-SDID package and its applications in marketing measurement.

Multi-Cell Test Strategies

Test Length Handling

Q2: Do the power calculations have a “test length” parameter, or is the length of the test accounted for in a different way?

A2: Test length is inherently accounted for through the implementation. In GeoLift-SDID, the test length is determined by:

  1. The intervention_date parameter, which marks the start of treatment

  2. The end_date parameter (if specified) or the end of available data

The framework automatically calculates the appropriate pre-treatment and post-treatment periods based on these dates:

# From GeoLiftMultiCell class
def _load_geolift_data(self):
    # Extract dates and convert to time indices
    intervention_date = self.params.get('intervention_date')
    end_date = self.params.get('end_date')
    
    # Calculate treatment start/end indices
    self.params['treatment_start'] = dates.index(intervention_date)
    if end_date is not None:
        self.params['treatment_end'] = dates.index(end_date)
    else:
        self.params['treatment_end'] = len(dates) - 1
        print(f"WARN: 'end_date' not specified or null, using end of data: {dates[-1]} (time={len(dates)-1})")

Implementation details:

  • Test length is automatically determined from temporal data

  • Effects are normalized by post-treatment period length

  • Statistical inference adjusts for test duration

  • Time-weighted estimates account for temporal dynamics

  • No need for explicit “test length” parameter in power calculations

Treatment and Control Group Selection

Q1: Let’s say I have identified treatment and control groups. How can I validate that I’ve selected the locations properly? Is there a specific metric to use?

A1: Yes, the GeoLift-SDID package provides several methods to validate your treatment and control group selection.

Pre-Treatment Fit Quality

The most important validation is to check how well your synthetic control matches the treatment unit(s) in the pre-treatment period.

  1. Visual Inspection: Run your model and examine the resulting plots. The package automatically generates time series visualization showing how closely the synthetic control line tracks the treatment line before the intervention date. A tight fit in the pre-treatment period suggests well-selected control markets.

  2. Pre-Treatment RMSE: The diagnostics output includes Root Mean Squared Error between your treatment and synthetic control units in the pre-treatment period. Lower values indicate better matching.

    from recipes.geolift_single_cell import GeoLiftSingleCell
    
    # Configure and run your model
    config = {
        'data_path': 'data/GeoLift_Data.csv',
        'treatment_units': [501],
        'intervention_date': '2025-01-01',
        'output_dir': 'outputs/analysis'
    }
    
    # Initialize and run analysis
    analyzer = GeoLiftSingleCell(config)
    analyzer.run_analysis()
    
    # Access diagnostics to check fit quality
    pre_treatment_rmse = analyzer.diagnostics.get('pre_treatment_rmse')
    print(f"Pre-treatment RMSE: {pre_treatment_rmse}")
    
  3. Examine Unit Weights: The package automatically saves weight diagnostics that show which control units contribute most to your synthetic control. You can find these in the output directory and visualize them through the built-in plots.

Very concentrated weights (e.g., one control unit getting 90% weight) might indicate poor diversity in your control pool.

Placebo Tests

The package supports inference methods including placebo tests to validate your model’s robustness.

  1. Implementation with GeoLift-SDID:

    # Run analysis with placebo inference
    config = {
        'data_path': 'data/GeoLift_Data.csv',
        'treatment_units': [501],
        'intervention_date': '2025-01-01',
        'inference_method': 'placebo',  # Use placebo test for inference
        'inference_samples': 50,        # Number of placebo samples
        'output_dir': 'outputs/analysis'
    }
    
    analyzer = GeoLiftSingleCell(config)
    analyzer.run_analysis()
    
    # Access results including p-values from placebo tests
    p_value = analyzer.results.get('p_value')
    print(f"Placebo test p-value: {p_value}")
    
  2. Interpretation: The package automatically calculates and reports the empirical p-value from placebo tests. Your actual effect should be larger (in absolute terms) than most placebo effects for statistical significance.

Potential Caveats

  • Control and treatment markets with very different scales may lead to biased estimates.

  • Consider normalizing your data (e.g., per capita or log transformations) when markets vary greatly in size.

  • For extremely small markets, consider exclusion or special handling to prevent undue influence.

  • Check if large markets dominate the weights by artificially scaling up smaller markets and observing how weights shift.

Pre-Test Power Analysis

For prospective studies, use the PowerCalculator from the pre_test_power module to evaluate if your market selection has sufficient statistical power.

from recipes.geolift_multi_cell import GeoLiftMultiCell

# Configure power analysis
config = {
    'data_path': 'data/GeoLift_Multicell.csv',
    'treatment_units': [501, 502, 503],
    'estimator': 'sdid',
    'inference_method': 'jackknife',
    'output_dir': 'outputs/power_analysis'
}

# Initialize and run power analysis
analyzer = GeoLiftMultiCell(config)
analyzer.power_analysis()

# Access power analysis results
print(f"Minimum detectable effect: {analyzer.power_results['mde']}")
print(f"Statistical power: {analyzer.power_results['power']}")

Effect Size and Investment Considerations

Q2: What is the relationship between marketing investment levels and detectable effect sizes? How does this impact experimental design when working with different investment scenarios?

A2: This question addresses a fundamental consideration in marketing measurement - the relationship between investment level and measurable effect size.

Effect Size vs. Investment Relationship

In marketing measurement, we typically observe these patterns:

  • Higher investment generally produces larger effects: More advertising spend or deeper discounts typically create larger lift, making effects easier to detect with the same sample size.

  • Diminishing returns apply: As investment increases, the incremental effect often diminishes. Doubling your ad spend rarely doubles your effect size.

  • Threshold effects exist: Some marketing activities require minimum investment levels before producing measurable effects (e.g., a certain level of frequency or reach).

Statistical Power Implications

When designing geo experiments with different investment levels:

  • Small effects require more statistical power: Detecting a 1% lift requires substantially more statistical power than detecting a 10% lift. This can be achieved through:

    • More test markets

    • Longer measurement periods

    • More precise matching between treatment and control

    • Higher data granularity

  • Low investment tests present challenges: When investment levels are low:

    • Expected effect sizes are smaller

    • Measurement noise may overwhelm the signal

    • Required sample sizes increase dramatically

    • Cost of measurement may exceed value of insights

  • Example calculation: To detect an effect half the size, you typically need approximately four times the sample size to maintain the same statistical power.

Scenarios Where Lower Investment Might Be Valid

Despite these challenges, testing lower investment levels can be valid in certain scenarios:

  • Efficiency optimization: Finding the minimum effective investment level before diminishing returns

  • New channel exploration: Initial small tests to validate a channel’s potential before larger investment

  • Budget constraint testing: Determining if limited budgets can still produce measurable results in priority markets

Best Practices for Small Effect Detection

To effectively measure small effects from lower investment tests:

  • Use more granular data: Daily or hourly data instead of weekly

  • Extend measurement periods: Longer post-periods allow small effects to accumulate

  • Reduce noise through better controls: More sophisticated matching methods

  • Consider pooled analysis: Aggregate results across multiple similar tests

  • Focus on higher-signal metrics: Conversion metrics often show larger effects than awareness metrics

  • Use the PowerCalculator: Assess minimum detectable effect (MDE) before running the experiment

from src.pre_test_power import PowerCalculator

# Create a power calculator with your data
power_calc = PowerCalculator(df, treatment_units, control_candidates)

# Estimate power for detecting a specific effect size
power = power_calc.power_given_effect(min_effect_size=0.05, confidence_level=0.9)
print(f"Statistical power: {power}")

A power of at least 0.8 (80%) is typically considered sufficient for marketing measurement studies.

Remember that understanding the expected relationship between investment and effect size for your specific business context is crucial for proper experimental design in marketing measurement.

Implementation and Methodology

Q3: How does the GeoLift-SDID implementation differ from the original academic papers?

A3: The GeoLift-SDID package is based on the Arkhangelsky et al. (2019) paper “Synthetic Difference in Differences,” not the original Abadie et al. (2007) Synthetic Control paper. There are several key implementation differences to be aware of:

Theoretical Foundation

  • SDID vs. Original Synthetic Control: The traditional Synthetic Control (SC) approach from Abadie et al. creates control units using only pre-treatment data. In contrast, SDID (Synthetic Difference in Differences) incorporates elements of both synthetic control and difference-in-differences methodologies, using weights for both time periods and units.

  • Time Weights: GeoLift-SDID estimates lambda (time weights) in addition to omega (unit weights), which helps control for time-specific shocks that might affect all units similarly.

  • Optimization Objective: The package minimizes the mean squared prediction error between the treatment units and a synthetic control constructed using both unit and time weights.

Implementation Specifics

  • Grid Search for Zeta: GeoLift-SDID implements a grid search to find the optimal zeta parameter (balancing between SC and DiD approaches). The original paper provides guidance on zeta selection, but our implementation makes this process more automated.

  • Variance Estimation: The variance calculation in GeoLift-SDID follows the jackknife procedure described in the SDID paper but has some minor computational differences to improve numerical stability.

  • Treatment Effect Calculation: Our implementation uses hat_tau() to calculate the average treatment effect on the treated (ATT), which follows the SDID paper’s methodology but with matrix operations optimized for the Python environment.

Notable Adaptations for Marketing Applications

  • Multiple Treatment Units: GeoLift-SDID extends the original methodology to handle multiple treatment units simultaneously, which is particularly useful for marketing campaigns that span multiple regions.

  • PowerCalculator: We’ve added pre-test power calculation capabilities through the PowerCalculator class, which isn’t part of the original SDID methodology but is critical for marketing experimentation.

  • Market Scale Adjustments: The package includes tools to assess the impact of market scale disparities on estimation (see scaletesting_of_donorpools.py), addressing a common challenge in marketing applications.

  • Enhanced Visualization: GeoLift-SDID integrates plotting capabilities that are tailored to marketing use cases, though some plotting functions may require customization for specific marketing contexts.

Practical Implications of These Differences

  • Weight Interpretation: Due to the dual weighting system (unit and time weights), interpreting the control unit contributions is slightly more complex than in traditional SC.

  • Test Reference Values: If you’re comparing results to academic papers, note that test values may differ slightly due to implementation variations and numerical precision differences.

  • Sensitivity to Input Parameters: Our implementation may show different sensitivity to input parameters like pre/post periods than implementations strictly following the original papers.

If you’re familiar with traditional Synthetic Control or DiD methods, keep these differences in mind when interpreting your GeoLift-SDID results.

Study Design Considerations

Q4: How should I select pre-treatment and post-treatment periods for my analysis?

A4: Selecting appropriate pre-treatment and post-treatment periods is crucial for accurate causal inference. Here are guidelines for both periods:

Pre-Treatment Period Selection

The pre-treatment period establishes the relationship between your treatment and control units before intervention. Consider these factors:

  1. Length Requirements:

    • Minimum recommendation: At least 3-4 times the length of your expected post-period

    • Ideal scenario: 10+ time periods (e.g., weeks or months) to establish stable patterns

  2. Stability Considerations:

    • Avoid including major events in the pre-period that affected only some markets

    • Check for structural breaks or trend changes that might complicate modeling

    • Ensure the relationship between units is consistent throughout the pre-period

  3. Seasonality Coverage:

    • Ideally include full seasonal cycles in your pre-period

    • For annual seasonality, use at least 12 months of pre-data

    • For weekly patterns, include multiple weeks of each season

  4. Data Granularity Trade-offs:

    • Daily data provides more pre-period observations but introduces more noise

    • Weekly data reduces noise but requires more calendar time to get sufficient observations

    • Monthly data is more stable but requires years of historical data

Post-Treatment Period Selection

The post-treatment period captures the effect of your intervention:

  1. Duration Considerations:

    • Immediate effects: 1-4 weeks may be sufficient for short-term campaign impacts

    • Carryover effects: 8-12+ weeks better captures lasting impact

    • Investment level: Lower investment typically requires longer measurement to detect effects

  2. Avoiding Contamination:

    • End post-period before any changes to the treatment (e.g., campaign expansions)

    • Exclude periods when control markets receive similar treatments

    • Be cautious of competitive reactions that might affect treatment or control markets

  3. Effect Dynamics:

    • Consider whether your treatment effect is expected to grow, decay, or pulse over time

    • For media campaigns with diminishing returns, focus on early post-period

    • For interventions with network effects, allow sufficient time for effects to materialize

Practical Implementation

In the GeoLift-SDID package, pre and post periods are defined when initializing the SynthDID class:

from src.model import SynthDID

# Define periods (using time indices or dates if your DataFrame has a DateTimeIndex)
pre_period = [0, 50]  # First 51 time periods as pre-period
post_period = [51, 70]  # Next 20 time periods as post-period

# Initialize model
sdid = SynthDID(df, pre_period, post_period, treatment_units)

Sensitivity Testing for Period Selection

Test the robustness of your results to different period selections:

  1. Vary pre-period start: Exclude some early pre-period data and check if results remain stable

  2. Vary treatment start date: If there’s uncertainty about when the treatment truly began, test different treatment start dates

  3. Examine rolling post-periods: Calculate cumulative treatment effects for different post-period lengths

# Example of sensitivity test with different pre-periods
results = []
for pre_start in range(0, 30, 5):  # Try different pre-period starts
    test_pre = [pre_start, 50]
    test_model = SynthDID(df, test_pre, post_period, treatment_units)
    test_model.fit()
    results.append({
        'pre_start': pre_start,
        'effect': test_model.hat_tau(),
        'se': test_model.cal_se()
    })
    
# Examine stability of results
import pandas as pd
sensitivity_df = pd.DataFrame(results)
print(sensitivity_df)

Remember that period selection should be justified by your understanding of the business context and treatment mechanism, not by what produces the largest effect. Documenting your period selection criteria before analysis helps prevent inadvertent p-hacking.

Diagnostic and Validation Considerations

Q5: What diagnostics are most important when evaluating GeoLift-SDID results? How should I balance statistical rigor with practical considerations?

A5: Evaluating the quality and reliability of GeoLift-SDID results requires a thoughtful approach to diagnostics. Here’s a practical guide to what matters most:

Essential Diagnostics (High Priority)

  1. Pre-Period Fit Quality

    • What to look for: Close tracking between treatment and synthetic control lines before intervention

    • Why it matters: Poor pre-period fit suggests the model can’t reliably predict the counterfactual

    • Statistical reality: Even a model with good overall fit metrics may miss crucial trend changes or level shifts

  2. Weight Distribution

    • What to look for: How concentrated the control unit weights (omega values) are

    • Red flags: One control unit getting >70% of weight suggests over-reliance on a single market

    • Practical approach: There’s no “perfect” distribution, but extreme concentration increases risk

  3. Placebo Test Percentile Rank

    • What to look for: Where your treatment effect falls in the distribution of placebo effects

    • Rule of thumb: Effect should be in the top/bottom 10% of placebo distribution for reliable inference

    • Measurement reality: This is often more informative than p-values for small sample sizes

  4. Robustness to Period Selection

    • What to look for: Stability of effect estimates when varying pre/post periods slightly

    • Why it matters: Effects that appear/disappear with small period changes suggest fragile results

    • Practical check: Try 3-4 reasonable period variations; substantial shifts warrant caution

Secondary Diagnostics (Helpful but Not Always Critical)

  1. Time Weight Analysis

    • What to look for: The distribution of lambda (time weights)

    • Contextual importance: More important when there are sharp period-specific shocks

    • Practical reality: Often less intuitive to interpret than unit weights

  2. Individual Unit Contributions

    • What to look for: How each control unit’s weighted outcome contributes to the synthetic control

    • Measurement folklore vs. reality: Control units with negative weights aren’t necessarily problematic

    • Practical approach: Focus on the largest contributors to understand what drives your estimate

  3. Pre-Period RMSE Compared to Magnitude of Effect

    • What to look for: Effect size should be substantially larger than pre-period RMSE

    • Rule of thumb: Effect > 2x pre-period RMSE suggests signal over noise

    • Statistical caveat: This isn’t a formal test but a practical diagnostic

  4. Variance Stability

    • What to look for: Consistency of outcome variable variance across time and units

    • Edge case awareness: High variance periods/units may disproportionately influence results

    • Practical fix: Consider variance-stabilizing transformations (log, square root) for highly variable data

Rigor vs. Practicality Tradeoffs

  1. Sample Size Realities

    • Statistical ideal: Large samples with many control units (20+)

    • Marketing reality: Often limited to 50 US states or fewer geographic units

    • Practical approach: With small samples, rely more on domain knowledge and multiple validation approaches

  2. Placebo Testing Scope

    • Statistical ideal: Run all possible in-time and in-space placebos

    • Practical reality: Complete placebo testing can be computationally expensive

    • Balanced approach: Focus on in-space placebos for your most important outcomes

  3. P-value Interpretation

    • Measurement folklore: “p < 0.05 means the effect is real”

    • Statistical reality: p-values measure evidence against null, not effect size or importance

    • Better approach: Report effect sizes with confidence intervals and practical significance

  4. Pre-specification vs. Exploration

    • Statistical ideal: All analysis decisions pre-specified before seeing results

    • Marketing reality: Exploratory analysis often uncovers valuable insights

    • Ethical approach: Clearly distinguish confirmatory from exploratory findings

Edge Cases to Watch For

  1. Heterogeneous Treatment Units

    • Warning sign: Large differences in effect estimates across treatment units

    • Potential issue: Treatment may work differently in different contexts

    • Practical approach: Consider splitting analysis by unit types

  2. Pre-Period Structural Breaks

    • Warning sign: Clear change in relationship between treatment and controls pre-intervention

    • Fix: Either exclude data before the break or model the break explicitly

  3. Extreme Seasonality

    • Challenge: Seasonal patterns dominate underlying trends

    • Practical approach: Ensure adequate seasonal representation in pre-period or consider seasonal adjustment

  4. Low Signal-to-Noise Ratio

    • Warning sign: High week-to-week variation relative to expected effect size

    • Practical approach: Consider aggregating to higher time granularity or longer post-periods

Remember that diagnostics should be interpreted holistically - no single metric determines validity. The most important diagnostic is whether results make sense given your domain knowledge and business context.

Troubleshooting Statistical Issues

Q6: What are common statistical issues when using GeoLift-SDID, and why is it insufficient to look at a single metric like RMSE?

A6: Several statistical challenges can arise when working with GeoLift-SDID, and relying on a single metric for validation can lead to misleading conclusions. Here’s a guide to common issues and more robust evaluation approaches:

Why Single Metrics Are Insufficient

  1. The RMSE Limitation

    • What it measures: Average squared deviation between actual and predicted values

    • What it misses: Patterns in residuals, outlier influence, and temporal dependencies

    • Example scenario: A model with good overall RMSE but systematically underpredicting during key business periods

    # Beyond just calculating RMSE
    import matplotlib.pyplot as plt
    import numpy as np
    
    # Get residuals from pre-period
    residuals = pre_Y_treated - pre_Y_synthetic
    
    # Plot residuals over time
    plt.figure(figsize=(10, 6))
    plt.plot(residuals.index, residuals)
    plt.axhline(y=0, color='r', linestyle='-')
    plt.title('Residuals Over Time')
    plt.show()
    
    # Check for autocorrelation in residuals
    from statsmodels.graphics.tsaplots import plot_acf
    plot_acf(residuals, lags=20)
    plt.show()
    
  2. The p-value Trap

    • Common misinterpretation: Small p-value = large effect

    • Statistical reality: p-values measure evidence against null, not effect size or importance

    • Better approach: Report effect sizes with confidence intervals and practical significance

  3. R-squared Misconceptions

    • What it measures: Proportion of variance explained by the model

    • Why it’s insufficient: High R-squared can occur with biased predictions

    • Example scenario: A model with 99% R-squared that still misses key turning points

Common Statistical Issues and Solutions

  1. Autocorrelated Errors

    • Symptoms: Residuals show clear patterns over time

    • Why it matters: Leads to underestimated standard errors and overconfident conclusions

    • Diagnosis: Use autocorrelation plots of residuals

    • Solution approaches:

      • Aggregate to lower time granularity (weekly instead of daily)

      • Use Newey-West standard errors for inference

      • Include time-series features in matching process

    # Testing for autocorrelation in residuals
    from statsmodels.stats.diagnostic import acorr_ljungbox
    
    # Ljung-Box test for autocorrelation
    lb_test = acorr_ljungbox(residuals, lags=[10])
    print(f"Ljung-Box test p-value: {lb_test[1][0]}")
    
  2. Overfitting to Pre-Period

    • Symptoms: Extremely good pre-period fit but implausible control weights

    • Why it matters: Model may be capturing noise rather than signal

    • Diagnosis: Compare in-sample fit to out-of-sample forecasts in pre-period

    • Solution: Cross-validation within pre-period or regularization

    # Cross-validation approach (simplified example)
    # Split pre-period into training and validation
    train_end = pre_period[0] + (pre_period[1] - pre_period[0]) * 2 // 3
    validation_start = train_end + 1
    
    # Fit on training portion
    train_model = SynthDID(df, [pre_period[0], train_end], post_period, treatment_units)
    train_model.fit()
    
    # Get forecast for validation period
    omega, lambda_weights = train_model.estimated_params()
    # Check performance on validation period
    # ... (implementation details depend on internal SynthDID code)
    
  3. Heterogeneity Across Treatment Units

    • Symptoms: Large differences in unit-specific treatment effects

    • Why it matters: Average effect may mask important differences or be driven by outliers

    • Diagnosis: Examine the distribution of unit-specific effects

    • Solution: Consider stratified analysis or reporting distributional information

    # Examining treatment effect heterogeneity
    unit_effects = []
    for unit in treatment_units:
        # Fit model for single treatment unit
        unit_model = SynthDID(df, pre_period, post_period, [unit])
        unit_model.fit()
        unit_effects.append({
            'unit': unit,
            'effect': unit_model.hat_tau(),
            'se': unit_model.cal_se()
        })
    
    # Analyze distribution of effects
    unit_effects_df = pd.DataFrame(unit_effects)
    print(unit_effects_df.describe())
    
  4. Non-Parallel Pre-Period Trends

    • Symptoms: Treatment and control trends diverge before intervention

    • Why it matters: Suggests fundamental violation of model assumptions

    • Diagnosis: Visual inspection and formal trend comparison

    • Solution: Consider alternative specifications, shorter pre-period, or trend adjustments

    # Testing for parallel trends in pre-period
    import statsmodels.formula.api as smf
    
    # Create panel dataframe
    panel_data = []
    for t in range(pre_period[0], pre_period[1]+1):
        for unit in df.columns:
            is_treated = unit in treatment_units
            panel_data.append({
                'time': t,
                'unit': unit,
                'outcome': df.loc[t, unit],
                'treated': is_treated
            })
    
    panel_df = pd.DataFrame(panel_data)
    
    # Test interaction between time and treatment
    model = smf.ols('outcome ~ time * treated + treated', data=panel_df)
    results = model.fit()
    print(results.summary())
    
  5. Scale Sensitivity

    • Symptoms: Results change dramatically with log or other transformations

    • Why it matters: Suggests model may be overly influenced by units with largest values

    • Diagnosis: Compare results with different outcome transformations

    • Solution: Consider appropriate transformations or weighted approaches

Comprehensive Evaluation Framework

Instead of relying on a single metric, use this multi-faceted framework:

  1. Fit Quality Assessment:

    • Pre-period RMSE (overall fit)

    • Residual patterns analysis (systematic errors)

    • Cross-validation performance (robustness)

  2. Causal Inference Validation:

    • Placebo tests (statistical significance)

    • Robustness to specification changes (stability)

    • Covariate balance assessment (comparability)

  3. Assumption Testing:

    • Parallel trends tests (for DiD component)

    • No interference between units (spillover assessment)

    • No anticipation effects (pre-treatment effect check)

  4. Sensitivity Analysis:

    • Donor pool variations (leave-one-out tests)

    • Outcome transformations (log, per capita)

    • Period definition adjustments (intervention timing)

This combined approach provides a more complete picture of result validity than any single metric can provide.

# Example of comprehensive diagnostic approach
def comprehensive_diagnostics(df, pre_period, post_period, treatment_units):
    # 1. Basic model fit
    sdid = SynthDID(df, pre_period, post_period, treatment_units)
    sdid.fit()
    effect = sdid.hat_tau()
    se = sdid.cal_se()
    
    # 2. Pre-period fit quality
    synth_outcome = sdid.sdid_potential_outcome()
    pre_Y_treated = df.loc[pre_period[0]:pre_period[1], treatment_units].mean(axis=1)
    pre_Y_synthetic = synth_outcome.loc[pre_period[0]:pre_period[1]]
    pre_rmse = np.sqrt(((pre_Y_treated - pre_Y_synthetic) ** 2).mean())
    
    # 3. Residual analysis
    residuals = pre_Y_treated - pre_Y_synthetic
    ljung_box = acorr_ljungbox(residuals, lags=[10])
    has_autocorrelation = ljung_box[1][0] < 0.05
    
    # 4. Placebo tests (simplified)
    control_units = [c for c in df.columns if c not in treatment_units]
    placebo_effects = []
    for unit in np.random.choice(control_units, min(10, len(control_units)), replace=False):
        placebo = SynthDID(df, pre_period, post_period, [unit])
        placebo.fit()
        placebo_effects.append(placebo.hat_tau())
    percentile_rank = sum(abs(effect) > abs(p) for p in placebo_effects) / len(placebo_effects)
    
    # 5. Sensitivity to pre-period
    alt_pre = [pre_period[0] + (pre_period[1] - pre_period[0])//4, pre_period[1]]
    alt_model = SynthDID(df, alt_pre, post_period, treatment_units)
    alt_model.fit()
    alt_effect = alt_model.hat_tau()
    effect_stability = abs(effect - alt_effect) / abs(effect) if effect != 0 else float('inf')
    
    # Return comprehensive diagnostics
    return {
        'effect': effect,
        'se': se,
        'pre_rmse': pre_rmse,
        'rmse_to_effect_ratio': pre_rmse / abs(effect) if effect != 0 else float('inf'),
        'has_autocorrelation': has_autocorrelation,
        'placebo_percentile': percentile_rank,
        'effect_stability': effect_stability
    }

Remember that statistical issues should be evaluated in context of your specific application. Marketing measurement often involves unique challenges like seasonality, competitive activity, and carryover effects that require domain-specific knowledge to properly address.

Workflow and Implementation

Q7: What are the key stages in GeoLift modeling that a marketing analyst should follow?

A7: The GeoLift-SDID modeling process can be broken down into four essential stages that guide you from planning through implementation to final insights. Each stage builds upon the previous one and contributes to producing reliable causal estimates of marketing effects.

Stage 1: Study Design and Data Preparation

The foundation of any successful GeoLift analysis begins with thoughtful planning and data organization:

Start by clearly defining your research question and hypotheses about the marketing intervention’s effect. Specify exactly what intervention you’re measuring, which markets received it, and when it occurred. This clarity helps avoid post-hoc explanations that can undermine causal inference.

Next, identify your outcome metrics. Select metrics that your marketing activity can plausibly influence and that are measurable at the geographic level of your analysis. For media campaigns, this might be website visits, store traffic, or sales; for pricing tests, this could be unit sales or revenue.

Data preparation involves collecting time series data for both treatment and potential control markets. Ensure your data:

  • Covers sufficient pre-treatment history (ideally at least 3-4 times the length of your post-period)

  • Has consistent measurement across all geographic units

  • Is at an appropriate time granularity (daily, weekly, or monthly)

  • Includes any relevant covariates that might improve matching

Finally, inspect your data for quality issues, extreme outliers, or structural breaks that might need addressing before modeling. Consider whether transformations (log, per-capita normalization) are appropriate given the distribution of your outcome variable.

# Example of data preparation
import pandas as pd
import matplotlib.pyplot as plt

# Load and inspect data
df = pd.read_csv('geo_data.csv', index_col='date', parse_dates=True)

# Visualize all markets to identify potential issues
plt.figure(figsize=(12, 8))
for col in df.columns:
    plt.plot(df.index, df[col], alpha=0.5, label=col if col in treatment_units else None)

# Highlight treatment markets
for unit in treatment_units:
    plt.plot(df.index, df[unit], linewidth=2, label=f"{unit} (Treatment)")
    
plt.axvline(x=pd.to_datetime(intervention_date), color='red', linestyle='--')
plt.legend(loc='best')
plt.title('Pre-Modeling Data Inspection')
plt.show()

Stage 2: Model Specification and Estimation

With your data prepared, you’re ready to specify and estimate your GeoLift-SDID model:

Begin by defining your pre-treatment and post-treatment periods. The pre-period should end just before your intervention began, while the post-period starts when the intervention begins and continues as long as you want to measure its effect.

Next, initialize the SynthDID model with your data, periods, and treatment units. The model will estimate two sets of weights: omega (unit weights) that determine the contribution of each control market to your synthetic control, and lambda (time weights) that account for time-specific shocks.

The model estimation process involves:

  1. Selecting the zeta parameter that balances between synthetic control and difference-in-differences approaches

  2. Optimizing the weights to minimize prediction error in the pre-treatment period

  3. Calculating the treatment effect by comparing actual outcomes to the counterfactual

from src.model import SynthDID

# Define periods
pre_period = ['2022-01-01', '2022-06-30']  # Pre-intervention period
post_period = ['2022-07-01', '2022-09-30']  # Post-intervention period
treatment_units = ['New York', 'California']  # Markets that received treatment

# Initialize and fit model
model = SynthDID(df, pre_period, post_period, treatment_units)
model.fit()  # This estimates the weights and prepares for effect estimation

# Check model fit
omega, lambda_weights = model.estimated_params()
print("Top control markets by weight:")
print(omega.sort_values('sdid_weight', ascending=False).head(5))

Stage 3: Validation and Diagnostics

Before trusting your results, validate the model through multiple diagnostic approaches:

First, evaluate the pre-treatment fit quality. The package automatically generates visualizations showing how well the synthetic control tracks the treatment unit before intervention. Examine the pre-treatment RMSE in the diagnostics output and compare it to the magnitude of your estimated effect.

Review the weight distributions to understand which control markets are driving your synthetic control. Markets with similar characteristics to your treatment markets should generally receive higher weights. The package creates visualizations showing these weights.

Leverage the built-in inference methods (jackknife, bootstrap, or placebo) to establish the statistical significance of your findings. The package handles complex resampling calculations automatically and provides confidence intervals and p-values.

Test the robustness of your results through sensitivity analyses: vary the treatment date, exclude influential control units, or try different outcome transformations. Results that remain stable across these variations inspire more confidence.

# Configure and run GeoLift analysis with inference
config = {
    'data_path': 'data/GeoLift_Data.csv',
    'treatment_units': [501],
    'intervention_date': '2025-01-01',
    'inference_method': 'jackknife',
    'output_dir': 'outputs/analysis'
}

analyzer = GeoLiftSingleCell(config)
analyzer.run_analysis()

# Access and display results
effect = analyzer.results['att']
se = analyzer.results['se']
print(f"Estimated effect: {effect:.4f} (SE: {se:.4f})")
print(f"95% Confidence Interval: [{effect - 1.96*se:.4f}, {effect + 1.96*se:.4f}]")
print(f"p-value: {analyzer.results['p_value']:.3f}")

# Multiple runs with different parameters for sensitivity analysis
date_ranges = [
    ('2024-12-01', None),  # Earlier intervention date
    ('2025-02-01', None)    # Later intervention date
]

for intervention_date, end_date in date_ranges:
    config['intervention_date'] = intervention_date
    config['end_date'] = end_date
    config['output_dir'] = f"outputs/sensitivity_{intervention_date}"
    
    sensitivity_analyzer = GeoLiftSingleCell(config)
    sensitivity_analyzer.run_analysis()
    
    print(f"Sensitivity with {intervention_date}: {sensitivity_analyzer.results['att']:.4f}")

Stage 4: Interpretation and Business Translation

The final stage transforms statistical results into actionable business insights:

Quantify the impact of your marketing intervention in business terms. Convert the estimated effect into metrics that stakeholders understand: incremental sales, revenue lift, return on ad spend, or cost per acquisition.

Contextualize the effect size relative to:

  • Pre-treatment baseline (percentage lift)

  • Investment level (efficiency metrics)

  • Business goals (contribution to objectives)

  • Industry benchmarks (comparative performance)

Address potential alternative explanations and limitations of your analysis. Acknowledge factors that couldn’t be controlled for and discuss their potential impact on your conclusions.

Finally, develop clear recommendations based on your findings. These might include scaling successful interventions, adjusting underperforming ones, or designing follow-up experiments to answer new questions raised by your analysis.

# Business translation example
baseline = df.loc[pre_period[0]:pre_period[1], treatment_units].mean().mean()
percentage_lift = effect / baseline * 100

# Calculate ROI (assuming you have cost data)
total_investment = 1000000  # Example marketing spend
total_incremental_revenue = effect * price_per_unit * num_periods * num_markets
roi = (total_incremental_revenue - total_investment) / total_investment

print(f"Marketing Intervention Results:")
print(f"Percentage Lift: {percentage_lift:.2f}%")
print(f"Total Incremental Revenue: ${total_incremental_revenue:,.2f}")
print(f"Return on Investment: {roi:.2f}x")

These four stages provide a structured approach to GeoLift-SDID modeling that balances statistical rigor with practical business application. Each stage builds upon the previous one, creating a workflow that moves from careful planning through technical implementation to meaningful business insights.