GeoLift-SDID Command Line Interface Documentation

This document provides a comprehensive reference for the command-line interfaces available in the GeoLift-SDID toolset. Each tool is designed for a specific stage in the GeoLift-SDID workflow.

Core Command-Line Tools

1. donor_evaluator.py

Evaluates potential donor locations (control units) for Synthetic Difference-in-Differences analysis based on pre-treatment similarity metrics.

python recipes/donor_evaluator.py [OPTIONS]

OPTIONS:
  --config CONFIG         Path to configuration YAML file (REQUIRED)
  --output-dir OUTPUT_DIR Output directory for results (overrides config)

The configuration file should include:

  • data_path: Path to input data CSV file (REQUIRED)

  • treatment_locations: List of treatment location IDs (REQUIRED)

  • treatment_date: Intervention date (YYYY-MM-DD) (REQUIRED)

  • pre_treatment_periods: Number of pre-treatment periods to analyze

  • min_donors: Minimum number of donors to recommend (default: 5)

  • max_donors: Maximum number of donors to recommend (default: 10)

  • min_correlation_threshold: Minimum correlation threshold for donors (default: 0.7)

  • max_rmse_threshold: Maximum RMSE threshold for donors (default: 5000)

  • shapemap_file: Path to GeoJSON file for map visualization (optional)

2. power_calculator.py

Calculates the statistical power of a GeoLift-SDID analysis for different combinations of effect sizes and post-treatment durations.

python recipes/power_calculator.py [OPTIONS]

OPTIONS:
  --mode {power,selection}  Analysis mode: 'power' to calculate statistical power,
                           or 'selection' to select optimal markets (REQUIRED)
  --config CONFIG          Path to configuration YAML file (REQUIRED)
  --output OUTPUT          Output directory path (overrides config)

The configuration file should include:

  • data_path: Path to input data CSV file

  • treatment_units: List of treatment unit IDs

  • donor_units: List of control unit IDs (recommended donors)

  • intervention_date: Intervention date (YYYY-MM-DD)

  • min_expected_lift: Minimum expected lift percentage (default: 5)

  • max_expected_lift: Maximum expected lift percentage (default: 20)

  • lift_step: Step size for lift percentage calculations (default: 5)

  • min_post_periods: Minimum number of post-treatment periods (default: 7)

  • max_post_periods: Maximum number of post-treatment periods (default: 42)

  • post_periods_step: Step size for post-treatment period calculations (default: 7)

  • n_sims: Number of simulations to run (default: 30)

  • confidence_level: Confidence level for power calculation (default: 0.9)

3. geolift_single_cell.py

Runs a GeoLift-SDID analysis for a single treatment unit (location).

python recipes/geolift_single_cell.py [OPTIONS]

OPTIONS:
  --config CONFIG                   Path to configuration YAML file
  --data DATA                       Path to the input data file (overrides config)
  --treatment TREATMENT             Treatment unit ID (overrides config)
  --intervention-date DATE          Intervention date YYYY-MM-DD (overrides config)
  --end-date DATE                   End date for analysis YYYY-MM-DD (overrides config)
  --output OUTPUT                   Output directory (overrides config)

4. geolift_multi_cell.py

Runs a GeoLift-SDID analysis for multiple treatment units (locations) with heterogeneous effects.

python recipes/geolift_multi_cell.py [OPTIONS]

OPTIONS:
  --config CONFIG                   Path to analysis config file (default: configs/geolift_analysis_config.yaml)
  --viz-config CONFIG               Path to visualization config file (default: configs/visuals_config.yaml)
  --shapemap SHAPEMAP               Path to shapemap file for location visualization
  --data DATA                       Path to input data file (overrides config)
  --treatments TREATMENTS [...]     Names of treatment locations
  --donor-recommendations FILE      Path to donor_recommendations.yaml file to use for control units
  --intervention-date DATE          Start date of intervention (YYYY-MM-DD)
  --end-date DATE                   End date of intervention (YYYY-MM-DD)
  --output OUTPUT                   Output directory (overrides config)

5. generate_analysis_report.py

Generates an AI-powered interpretation of GeoLift-SDID analysis results using large language models.

python recipes/generate_analysis_report.py [OPTIONS]

OPTIONS:
  --outputs OUTPUTS               Path to the outputs directory containing analysis results (REQUIRED)
  --api-key API_KEY               API key for the model (optional, can also be set via environment variables)
  --model MODEL                   Model to use for interpretation (default: "deepseek-r1")
  --verbose                       Enable verbose logging

Supported models include: deepseek-r1, deepseek-coder, llama3, mistral, and gpt-4. Each model can use an API key from the corresponding environment variable (e.g., DEEPSEEK_API_KEY, OPENAI_API_KEY).

Example Usage

Complete Analysis Workflow

# 1. Evaluate donors for treatment locations
# Single-cell donor evaluation
python recipes/donor_evaluator.py --config configs/donor_eval_config_singlecell.yaml

# Multi-cell donor evaluation
python recipes/donor_evaluator.py --config configs/donor_eval_config_multicell.yaml
# 2. Run power analysis with recommended donors
# Using directly configured files
python recipes/power_calculator.py --mode power --config configs/power_analysis_config_singlecell.yaml
python recipes/power_calculator.py --mode power --config configs/power_analysis_config_multicell.yaml

# Alternatively, using donor evaluator output
python recipes/power_calculator.py --mode power --config outputs/donor_eval_YYYYMMDD_HHMMSS/power_analysis_config.yaml
# 3. Perform GeoLift-SDID analysis
python recipes/geolift_single_cell.py --data data/GeoLift_Singlecell.csv --treatment 501 --intervention-date 2024-03-01 --output outputs/singlecell_analysis
# 4. Generate AI-powered report
python recipes/generate_analysis_report.py --outputs outputs/singlecell_analysis

Notes

  • File paths can be relative or absolute

  • For date parameters, use YYYY-MM-DD format

  • The donor_evaluator.py script will generate a recommended donors file and power analysis configuration

  • The GeoLift-SDID implementation is in the synthdid package directory (not src)

  • When using older versions of matplotlib, you may encounter style compatibility issues with ‘seaborn-whitegrid’. In newer versions, use ‘seaborn-v0_8-whitegrid’ instead.

  • Different configuration files exist for single-cell and multi-cell analyses

  • Make sure all required parameters (especially data_path) are included in your configuration files