# GeoLift-SDID Command Line Interface Documentation This document provides a comprehensive reference for the command-line interfaces available in the GeoLift-SDID toolset. Each tool is designed for a specific stage in the GeoLift-SDID workflow. ## Core Command-Line Tools ### 1. `donor_evaluator.py` Evaluates potential donor locations (control units) for Synthetic Difference-in-Differences analysis based on pre-treatment similarity metrics. ```bash python recipes/donor_evaluator.py [OPTIONS] OPTIONS: --config CONFIG Path to configuration YAML file (REQUIRED) --output-dir OUTPUT_DIR Output directory for results (overrides config) ``` The configuration file should include: - `data_path`: Path to input data CSV file (REQUIRED) - `treatment_locations`: List of treatment location IDs (REQUIRED) - `treatment_date`: Intervention date (YYYY-MM-DD) (REQUIRED) - `pre_treatment_periods`: Number of pre-treatment periods to analyze - `min_donors`: Minimum number of donors to recommend (default: 5) - `max_donors`: Maximum number of donors to recommend (default: 10) - `min_correlation_threshold`: Minimum correlation threshold for donors (default: 0.7) - `max_rmse_threshold`: Maximum RMSE threshold for donors (default: 5000) - `shapemap_file`: Path to GeoJSON file for map visualization (optional) ### 2. `power_calculator.py` Calculates the statistical power of a GeoLift-SDID analysis for different combinations of effect sizes and post-treatment durations. ```bash python recipes/power_calculator.py [OPTIONS] OPTIONS: --mode {power,selection} Analysis mode: 'power' to calculate statistical power, or 'selection' to select optimal markets (REQUIRED) --config CONFIG Path to configuration YAML file (REQUIRED) --output OUTPUT Output directory path (overrides config) ``` The configuration file should include: - `data_path`: Path to input data CSV file - `treatment_units`: List of treatment unit IDs - `donor_units`: List of control unit IDs (recommended donors) - `intervention_date`: Intervention date (YYYY-MM-DD) - `min_expected_lift`: Minimum expected lift percentage (default: 5) - `max_expected_lift`: Maximum expected lift percentage (default: 20) - `lift_step`: Step size for lift percentage calculations (default: 5) - `min_post_periods`: Minimum number of post-treatment periods (default: 7) - `max_post_periods`: Maximum number of post-treatment periods (default: 42) - `post_periods_step`: Step size for post-treatment period calculations (default: 7) - `n_sims`: Number of simulations to run (default: 30) - `confidence_level`: Confidence level for power calculation (default: 0.9) ### 3. `geolift_single_cell.py` Runs a GeoLift-SDID analysis for a single treatment unit (location). ```bash python recipes/geolift_single_cell.py [OPTIONS] OPTIONS: --config CONFIG Path to configuration YAML file --data DATA Path to the input data file (overrides config) --treatment TREATMENT Treatment unit ID (overrides config) --intervention-date DATE Intervention date YYYY-MM-DD (overrides config) --end-date DATE End date for analysis YYYY-MM-DD (overrides config) --output OUTPUT Output directory (overrides config) ``` ### 4. `geolift_multi_cell.py` Runs a GeoLift-SDID analysis for multiple treatment units (locations) with heterogeneous effects. ```bash python recipes/geolift_multi_cell.py [OPTIONS] OPTIONS: --config CONFIG Path to analysis config file (default: configs/geolift_analysis_config.yaml) --viz-config CONFIG Path to visualization config file (default: configs/visuals_config.yaml) --shapemap SHAPEMAP Path to shapemap file for location visualization --data DATA Path to input data file (overrides config) --treatments TREATMENTS [...] Names of treatment locations --donor-recommendations FILE Path to donor_recommendations.yaml file to use for control units --intervention-date DATE Start date of intervention (YYYY-MM-DD) --end-date DATE End date of intervention (YYYY-MM-DD) --output OUTPUT Output directory (overrides config) ``` ### 5. `generate_analysis_report.py` Generates an AI-powered interpretation of GeoLift-SDID analysis results using large language models. ```bash python recipes/generate_analysis_report.py [OPTIONS] OPTIONS: --outputs OUTPUTS Path to the outputs directory containing analysis results (REQUIRED) --api-key API_KEY API key for the model (optional, can also be set via environment variables) --model MODEL Model to use for interpretation (default: "deepseek-r1") --verbose Enable verbose logging ``` Supported models include: deepseek-r1, deepseek-coder, llama3, mistral, and gpt-4. Each model can use an API key from the corresponding environment variable (e.g., DEEPSEEK_API_KEY, OPENAI_API_KEY). ## Example Usage ### Complete Analysis Workflow ```bash # 1. Evaluate donors for treatment locations # Single-cell donor evaluation python recipes/donor_evaluator.py --config configs/donor_eval_config_singlecell.yaml # Multi-cell donor evaluation python recipes/donor_evaluator.py --config configs/donor_eval_config_multicell.yaml # 2. Run power analysis with recommended donors # Using directly configured files python recipes/power_calculator.py --mode power --config configs/power_analysis_config_singlecell.yaml python recipes/power_calculator.py --mode power --config configs/power_analysis_config_multicell.yaml # Alternatively, using donor evaluator output python recipes/power_calculator.py --mode power --config outputs/donor_eval_YYYYMMDD_HHMMSS/power_analysis_config.yaml # 3. Perform GeoLift-SDID analysis python recipes/geolift_single_cell.py --data data/GeoLift_Singlecell.csv --treatment 501 --intervention-date 2024-03-01 --output outputs/singlecell_analysis # 4. Generate AI-powered report python recipes/generate_analysis_report.py --outputs outputs/singlecell_analysis ``` ## Notes - File paths can be relative or absolute - For date parameters, use YYYY-MM-DD format - The `donor_evaluator.py` script will generate a recommended donors file and power analysis configuration - The GeoLift-SDID implementation is in the `synthdid` package directory (not `src`) - When using older versions of matplotlib, you may encounter style compatibility issues with 'seaborn-whitegrid'. In newer versions, use 'seaborn-v0_8-whitegrid' instead. - Different configuration files exist for single-cell and multi-cell analyses - Make sure all required parameters (especially `data_path`) are included in your configuration files