Renewable Energy ROI Analysis Across U.S. States
Large-Scale Data Integration & Statistical Inference
I led a technical data analysis project evaluating the long-term economic return on investment (ROI) of transitioning U.S. states from fossil-fuel electricity to locally available renewable energy. The pipeline integrates multiple federal datasets and computes three complementary ROI metrics: per-MWh efficiency, total state economic impact, and per-capita equity.
The Challenge: Integrating Disparate Data Sources
Evaluating state-level renewable transitions requires combining:
- EPA eGRID: Current electricity generation mix by state
- EIA Data: Historical electricity prices and consumption patterns
- NREL Renewable Potential: Solar, wind, geothermal, hydro resource quality
- DOE USEER: Energy sector employment and economic data
- Census Data: Population, land area, demographic information
Each dataset uses different state identifiers, units, temporal resolutions, and reporting standards. The first technical challenge was building a robust data integration pipeline that handled these inconsistencies without losing information.
System Architecture
1. Data Ingestion & Cleaning
- Automated download and parsing of federal datasets
- Standardized state identifiers (FIPS codes, names, abbreviations)
- Unit conversions and temporal alignment
- Missing data imputation using domain-appropriate methods
2. Feature Engineering
Built composite features from raw data:
- Current Fossil Fuel Dependence: % of electricity from coal, natural gas, petroleum
- Renewable Resource Quality: Weighted average of solar, wind, hydro potential
- Grid Infrastructure Readiness: Transmission capacity, interconnection density
- Economic Baseline: Current energy costs, employment, state GDP contribution
3. State Feasibility Factor (SFF)
To ground technical potential in operational reality, I introduced a State Feasibility Factor that incorporates:
- Renewable resource quality (physics constraint)
- Grid readiness (infrastructure constraint)
- Current renewable adoption (momentum indicator)
- Population density (distribution efficiency)
The SFF weights technical ROI by how achievable it actually is for each state.
4. ROI Metric Computation
Three complementary perspectives:
A. Per-MWh Efficiency
- Cost savings per unit energy transitioned
- Rewards resource quality and low implementation costs
- Use Case: Identifies states with best marginal returns
B. Total State Economic Impact
- Absolute dollar value of statewide transition
- Accounts for state size and consumption
- Use Case: Prioritizes large-scale economic benefits
C. Per-Capita Equity
- Economic benefit per resident
- Normalizes by population
- Use Case: Ensures small states aren’t overlooked in national policy
Statistical Validation & Robustness
To ensure results were defensible, I applied rigorous statistical testing:
1. Correlation Significance Testing
- Identified which factors most strongly predict ROI
- Controlled for multiple comparisons (Bonferroni correction)
- Reported confidence intervals, not just point estimates
2. Outlier Detection & Analysis
- Flagged statistical outliers using robust z-scores
- Investigated physical causes (e.g., Hawaii’s unique energy economics)
- Reported results with and without outliers for transparency
3. Sensitivity Analysis
Tested robustness to modeling assumptions:
- Cost Assumptions: Varied solar/wind installation costs by ±30%
- Resource Quality Weighting: Tested different aggregation methods
- Discount Rates: Evaluated long-term ROI under different economic scenarios
Result, Core rankings remained stable across reasonable assumption ranges, indicating robust conclusions.
4. Cross-Validation Against External Benchmarks
Compared results to:
- Actual state renewable adoption rates (correlation = 0.71)
- Independent economic analyses from NREL and EIA
- State energy policy rankings from external organizations
Key Results
Top ROI States (Per-MWh Efficiency)
- Wyoming: Exceptional wind resources, low population density
- New Mexico: High solar potential, low installation costs
- Texas: Massive scale, diverse renewables, existing infrastructure
Highest Total Economic Impact
- California: Largest energy consumption, strong solar/wind
- Texas: Scale + resource quality
- Florida: High electricity demand, excellent solar potential
Best Per-Capita Returns
- North Dakota: Wind-rich, low population
- Wyoming: Similar profile to ND
- Montana: Hydro + wind potential
Statistical Insights
- Correlation Analysis: Wind potential is the strongest predictor of per-MWh ROI (r = 0.68)
- Regional Patterns: Southwest states dominate solar ROI, Great Plains lead in wind
- Grid Readiness: States with existing renewable infrastructure show 2.3x higher near-term ROI
Why This Matters for ML/Space Roles
This project demonstrates skills directly transferable to space industry work:
- Multi-Source Data Integration: Space missions combine telemetry, ground observations, simulation, same integration challenges
- Statistical Rigor: Mission trade studies require defensible analysis under uncertainty
- Decision Support Modeling: Translating technical potential → operational recommendations
- Reproducibility: All analysis is version-controlled, documented, and replicable
- Systems Thinking: Understanding constraints beyond the technical (cost, infrastructure, policy)
These are the same patterns used in:
- Mission feasibility analysis
- Spacecraft design trade studies
- Orbital mechanics optimization
- Ground system capacity planning
Technical Stack
- Python: pandas, NumPy, SciPy (core analysis)
- Matplotlib / Seaborn: Visualization
- Statsmodels: Statistical testing, regression analysis
- Geopandas: Spatial analysis and mapping
- Jupyter: Reproducible analysis notebooks
- Git: Version control for data + code
Reproducibility & Open Science
The entire analysis is:
- Version-controlled: Git repository with full commit history
- Documented: Markdown documentation for every decision
- Reproducible: Jupyter notebooks with step-by-step execution
- Transparent: Assumptions, limitations, and uncertainties explicitly stated
While originally developed in an academic setting, the project is structured and presented as a standalone technical analysis, not coursework.
Current Status
Complete, Fully Reproducible Analysis
All code, data, and documentation are available. The methodology is extensible to:
- Updated federal datasets (annual releases)
- International comparisons
- More granular regional analysis
- Integration with climate models
Code Repository
View on GitHub (link to be added)
Key Insight, Good data science isn’t just about getting an answer, it’s about building confidence that the answer is right. Statistical validation, sensitivity analysis, and reproducibility are how you earn that confidence.