GeoLift-SDID Command Line Interface Documentation¶
This document provides a comprehensive reference for the command-line interfaces available in the GeoLift-SDID toolset. Each tool is designed for a specific stage in the GeoLift-SDID workflow.
Core Command-Line Tools¶
1. donor_evaluator.py¶
Evaluates potential donor locations (control units) for Synthetic Difference-in-Differences analysis based on pre-treatment similarity metrics.
python recipes/donor_evaluator.py [OPTIONS]
OPTIONS:
--config CONFIG Path to configuration YAML file (REQUIRED)
--output-dir OUTPUT_DIR Output directory for results (overrides config)
The configuration file should include:
data_path: Path to input data CSV file (REQUIRED)treatment_locations: List of treatment location IDs (REQUIRED)treatment_date: Intervention date (YYYY-MM-DD) (REQUIRED)pre_treatment_periods: Number of pre-treatment periods to analyzemin_donors: Minimum number of donors to recommend (default: 5)max_donors: Maximum number of donors to recommend (default: 10)min_correlation_threshold: Minimum correlation threshold for donors (default: 0.7)max_rmse_threshold: Maximum RMSE threshold for donors (default: 5000)shapemap_file: Path to GeoJSON file for map visualization (optional)
2. power_calculator.py¶
Calculates the statistical power of a GeoLift-SDID analysis for different combinations of effect sizes and post-treatment durations.
python recipes/power_calculator.py [OPTIONS]
OPTIONS:
--mode {power,selection} Analysis mode: 'power' to calculate statistical power,
or 'selection' to select optimal markets (REQUIRED)
--config CONFIG Path to configuration YAML file (REQUIRED)
--output OUTPUT Output directory path (overrides config)
The configuration file should include:
data_path: Path to input data CSV filetreatment_units: List of treatment unit IDsdonor_units: List of control unit IDs (recommended donors)intervention_date: Intervention date (YYYY-MM-DD)min_expected_lift: Minimum expected lift percentage (default: 5)max_expected_lift: Maximum expected lift percentage (default: 20)lift_step: Step size for lift percentage calculations (default: 5)min_post_periods: Minimum number of post-treatment periods (default: 7)max_post_periods: Maximum number of post-treatment periods (default: 42)post_periods_step: Step size for post-treatment period calculations (default: 7)n_sims: Number of simulations to run (default: 30)confidence_level: Confidence level for power calculation (default: 0.9)
3. geolift_single_cell.py¶
Runs a GeoLift-SDID analysis for a single treatment unit (location).
python recipes/geolift_single_cell.py [OPTIONS]
OPTIONS:
--config CONFIG Path to configuration YAML file
--data DATA Path to the input data file (overrides config)
--treatment TREATMENT Treatment unit ID (overrides config)
--intervention-date DATE Intervention date YYYY-MM-DD (overrides config)
--end-date DATE End date for analysis YYYY-MM-DD (overrides config)
--output OUTPUT Output directory (overrides config)
4. geolift_multi_cell.py¶
Runs a GeoLift-SDID analysis for multiple treatment units (locations) with heterogeneous effects.
python recipes/geolift_multi_cell.py [OPTIONS]
OPTIONS:
--config CONFIG Path to analysis config file (default: configs/geolift_analysis_config.yaml)
--viz-config CONFIG Path to visualization config file (default: configs/visuals_config.yaml)
--shapemap SHAPEMAP Path to shapemap file for location visualization
--data DATA Path to input data file (overrides config)
--treatments TREATMENTS [...] Names of treatment locations
--donor-recommendations FILE Path to donor_recommendations.yaml file to use for control units
--intervention-date DATE Start date of intervention (YYYY-MM-DD)
--end-date DATE End date of intervention (YYYY-MM-DD)
--output OUTPUT Output directory (overrides config)
5. generate_analysis_report.py¶
Generates an AI-powered interpretation of GeoLift-SDID analysis results using large language models.
python recipes/generate_analysis_report.py [OPTIONS]
OPTIONS:
--outputs OUTPUTS Path to the outputs directory containing analysis results (REQUIRED)
--api-key API_KEY API key for the model (optional, can also be set via environment variables)
--model MODEL Model to use for interpretation (default: "deepseek-r1")
--verbose Enable verbose logging
Supported models include: deepseek-r1, deepseek-coder, llama3, mistral, and gpt-4. Each model can use an API key from the corresponding environment variable (e.g., DEEPSEEK_API_KEY, OPENAI_API_KEY).
Example Usage¶
Complete Analysis Workflow¶
# 1. Evaluate donors for treatment locations
# Single-cell donor evaluation
python recipes/donor_evaluator.py --config configs/donor_eval_config_singlecell.yaml
# Multi-cell donor evaluation
python recipes/donor_evaluator.py --config configs/donor_eval_config_multicell.yaml
# 2. Run power analysis with recommended donors
# Using directly configured files
python recipes/power_calculator.py --mode power --config configs/power_analysis_config_singlecell.yaml
python recipes/power_calculator.py --mode power --config configs/power_analysis_config_multicell.yaml
# Alternatively, using donor evaluator output
python recipes/power_calculator.py --mode power --config outputs/donor_eval_YYYYMMDD_HHMMSS/power_analysis_config.yaml
# 3. Perform GeoLift-SDID analysis
python recipes/geolift_single_cell.py --data data/GeoLift_Singlecell.csv --treatment 501 --intervention-date 2024-03-01 --output outputs/singlecell_analysis
# 4. Generate AI-powered report
python recipes/generate_analysis_report.py --outputs outputs/singlecell_analysis
Notes¶
File paths can be relative or absolute
For date parameters, use YYYY-MM-DD format
The
donor_evaluator.pyscript will generate a recommended donors file and power analysis configurationThe GeoLift-SDID implementation is in the
synthdidpackage directory (notsrc)When using older versions of matplotlib, you may encounter style compatibility issues with ‘seaborn-whitegrid’. In newer versions, use ‘seaborn-v0_8-whitegrid’ instead.
Different configuration files exist for single-cell and multi-cell analyses
Make sure all required parameters (especially
data_path) are included in your configuration files