# GeoLift-SDID Command Line Interface Documentation

This document provides a comprehensive reference for the command-line interfaces available in the GeoLift-SDID toolset. Each tool is designed for a specific stage in the GeoLift-SDID workflow.

## Core Command-Line Tools

### 1. `donor_evaluator.py`

Evaluates potential donor locations (control units) for Synthetic Difference-in-Differences analysis based on pre-treatment similarity metrics.

```bash
python recipes/donor_evaluator.py [OPTIONS]

OPTIONS:
  --config CONFIG         Path to configuration YAML file (REQUIRED)
  --output-dir OUTPUT_DIR Output directory for results (overrides config)
```

The configuration file should include:
- `data_path`: Path to input data CSV file (REQUIRED)
- `treatment_locations`: List of treatment location IDs (REQUIRED)
- `treatment_date`: Intervention date (YYYY-MM-DD) (REQUIRED)
- `pre_treatment_periods`: Number of pre-treatment periods to analyze
- `min_donors`: Minimum number of donors to recommend (default: 5)
- `max_donors`: Maximum number of donors to recommend (default: 10)
- `min_correlation_threshold`: Minimum correlation threshold for donors (default: 0.7)
- `max_rmse_threshold`: Maximum RMSE threshold for donors (default: 5000)
- `shapemap_file`: Path to GeoJSON file for map visualization (optional)

### 2. `power_calculator.py`

Calculates the statistical power of a GeoLift-SDID analysis for different combinations of effect sizes and post-treatment durations.

```bash
python recipes/power_calculator.py [OPTIONS]

OPTIONS:
  --mode {power,selection}  Analysis mode: 'power' to calculate statistical power,
                           or 'selection' to select optimal markets (REQUIRED)
  --config CONFIG          Path to configuration YAML file (REQUIRED)
  --output OUTPUT          Output directory path (overrides config)
```

The configuration file should include:
- `data_path`: Path to input data CSV file
- `treatment_units`: List of treatment unit IDs
- `donor_units`: List of control unit IDs (recommended donors)
- `intervention_date`: Intervention date (YYYY-MM-DD)
- `min_expected_lift`: Minimum expected lift percentage (default: 5)
- `max_expected_lift`: Maximum expected lift percentage (default: 20)
- `lift_step`: Step size for lift percentage calculations (default: 5)
- `min_post_periods`: Minimum number of post-treatment periods (default: 7)
- `max_post_periods`: Maximum number of post-treatment periods (default: 42)
- `post_periods_step`: Step size for post-treatment period calculations (default: 7)
- `n_sims`: Number of simulations to run (default: 30)
- `confidence_level`: Confidence level for power calculation (default: 0.9)

### 3. `geolift_single_cell.py`

Runs a GeoLift-SDID analysis for a single treatment unit (location).

```bash
python recipes/geolift_single_cell.py [OPTIONS]

OPTIONS:
  --config CONFIG                   Path to configuration YAML file
  --data DATA                       Path to the input data file (overrides config)
  --treatment TREATMENT             Treatment unit ID (overrides config)
  --intervention-date DATE          Intervention date YYYY-MM-DD (overrides config)
  --end-date DATE                   End date for analysis YYYY-MM-DD (overrides config)
  --output OUTPUT                   Output directory (overrides config)
```

### 4. `geolift_multi_cell.py`

Runs a GeoLift-SDID analysis for multiple treatment units (locations) with heterogeneous effects.

```bash
python recipes/geolift_multi_cell.py [OPTIONS]

OPTIONS:
  --config CONFIG                   Path to analysis config file (default: configs/geolift_analysis_config.yaml)
  --viz-config CONFIG               Path to visualization config file (default: configs/visuals_config.yaml)
  --shapemap SHAPEMAP               Path to shapemap file for location visualization
  --data DATA                       Path to input data file (overrides config)
  --treatments TREATMENTS [...]     Names of treatment locations
  --donor-recommendations FILE      Path to donor_recommendations.yaml file to use for control units
  --intervention-date DATE          Start date of intervention (YYYY-MM-DD)
  --end-date DATE                   End date of intervention (YYYY-MM-DD)
  --output OUTPUT                   Output directory (overrides config)
```

### 5. `generate_analysis_report.py`

Generates an AI-powered interpretation of GeoLift-SDID analysis results using large language models.

```bash
python recipes/generate_analysis_report.py [OPTIONS]

OPTIONS:
  --outputs OUTPUTS               Path to the outputs directory containing analysis results (REQUIRED)
  --api-key API_KEY               API key for the model (optional, can also be set via environment variables)
  --model MODEL                   Model to use for interpretation (default: "deepseek-r1")
  --verbose                       Enable verbose logging
```

Supported models include: deepseek-r1, deepseek-coder, llama3, mistral, and gpt-4. Each model can use an API key from the corresponding environment variable (e.g., DEEPSEEK_API_KEY, OPENAI_API_KEY).

## Example Usage

### Complete Analysis Workflow

```bash
# 1. Evaluate donors for treatment locations
# Single-cell donor evaluation
python recipes/donor_evaluator.py --config configs/donor_eval_config_singlecell.yaml

# Multi-cell donor evaluation
python recipes/donor_evaluator.py --config configs/donor_eval_config_multicell.yaml
# 2. Run power analysis with recommended donors
# Using directly configured files
python recipes/power_calculator.py --mode power --config configs/power_analysis_config_singlecell.yaml
python recipes/power_calculator.py --mode power --config configs/power_analysis_config_multicell.yaml

# Alternatively, using donor evaluator output
python recipes/power_calculator.py --mode power --config outputs/donor_eval_YYYYMMDD_HHMMSS/power_analysis_config.yaml
# 3. Perform GeoLift-SDID analysis
python recipes/geolift_single_cell.py --data data/GeoLift_Singlecell.csv --treatment 501 --intervention-date 2024-03-01 --output outputs/singlecell_analysis
# 4. Generate AI-powered report
python recipes/generate_analysis_report.py --outputs outputs/singlecell_analysis
```

## Notes

- File paths can be relative or absolute
- For date parameters, use YYYY-MM-DD format
- The `donor_evaluator.py` script will generate a recommended donors file and power analysis configuration
- The GeoLift-SDID implementation is in the `synthdid` package directory (not `src`)
- When using older versions of matplotlib, you may encounter style compatibility issues with 'seaborn-whitegrid'. In newer versions, use 'seaborn-v0_8-whitegrid' instead.
- Different configuration files exist for single-cell and multi-cell analyses
- Make sure all required parameters (especially `data_path`) are included in your configuration files