Renewable Energy ROI Analysis Across U.S. States

4 minute read

Large-Scale Data Integration & Statistical Inference

I led a technical data analysis project evaluating the long-term economic return on investment (ROI) of transitioning U.S. states from fossil-fuel electricity to locally available renewable energy. The pipeline integrates multiple federal datasets and computes three complementary ROI metrics: per-MWh efficiency, total state economic impact, and per-capita equity.

The Challenge: Integrating Disparate Data Sources

Evaluating state-level renewable transitions requires combining:

EPA eGRID: Current electricity generation mix by state
EIA Data: Historical electricity prices and consumption patterns
NREL Renewable Potential: Solar, wind, geothermal, hydro resource quality
DOE USEER: Energy sector employment and economic data
Census Data: Population, land area, demographic information

Each dataset uses different state identifiers, units, temporal resolutions, and reporting standards. The first technical challenge was building a robust data integration pipeline that handled these inconsistencies without losing information.

System Architecture

1. Data Ingestion & Cleaning

Automated download and parsing of federal datasets
Standardized state identifiers (FIPS codes, names, abbreviations)
Unit conversions and temporal alignment
Missing data imputation using domain-appropriate methods

2. Feature Engineering

Built composite features from raw data:

Current Fossil Fuel Dependence: % of electricity from coal, natural gas, petroleum
Renewable Resource Quality: Weighted average of solar, wind, hydro potential
Grid Infrastructure Readiness: Transmission capacity, interconnection density
Economic Baseline: Current energy costs, employment, state GDP contribution

3. State Feasibility Factor (SFF)

To ground technical potential in operational reality, I introduced a State Feasibility Factor that incorporates:

Renewable resource quality (physics constraint)
Grid readiness (infrastructure constraint)
Current renewable adoption (momentum indicator)
Population density (distribution efficiency)

The SFF weights technical ROI by how achievable it actually is for each state.

4. ROI Metric Computation

Three complementary perspectives:

A. Per-MWh Efficiency

Cost savings per unit energy transitioned
Rewards resource quality and low implementation costs
Use Case: Identifies states with best marginal returns

B. Total State Economic Impact

Absolute dollar value of statewide transition
Accounts for state size and consumption
Use Case: Prioritizes large-scale economic benefits

C. Per-Capita Equity

Economic benefit per resident
Normalizes by population
Use Case: Ensures small states aren’t overlooked in national policy

Statistical Validation & Robustness

To ensure results were defensible, I applied rigorous statistical testing:

1. Correlation Significance Testing

Identified which factors most strongly predict ROI
Controlled for multiple comparisons (Bonferroni correction)
Reported confidence intervals, not just point estimates

2. Outlier Detection & Analysis

Flagged statistical outliers using robust z-scores
Investigated physical causes (e.g., Hawaii’s unique energy economics)
Reported results with and without outliers for transparency

3. Sensitivity Analysis

Tested robustness to modeling assumptions:

Cost Assumptions: Varied solar/wind installation costs by ±30%
Resource Quality Weighting: Tested different aggregation methods
Discount Rates: Evaluated long-term ROI under different economic scenarios

Result, Core rankings remained stable across reasonable assumption ranges, indicating robust conclusions.

4. Cross-Validation Against External Benchmarks

Compared results to:

Actual state renewable adoption rates (correlation = 0.71)
Independent economic analyses from NREL and EIA
State energy policy rankings from external organizations

Key Results

Top ROI States (Per-MWh Efficiency)

Wyoming: Exceptional wind resources, low population density
New Mexico: High solar potential, low installation costs
Texas: Massive scale, diverse renewables, existing infrastructure

Highest Total Economic Impact

California: Largest energy consumption, strong solar/wind
Texas: Scale + resource quality
Florida: High electricity demand, excellent solar potential

Best Per-Capita Returns

North Dakota: Wind-rich, low population
Wyoming: Similar profile to ND
Montana: Hydro + wind potential

Statistical Insights

Correlation Analysis: Wind potential is the strongest predictor of per-MWh ROI (r = 0.68)
Regional Patterns: Southwest states dominate solar ROI, Great Plains lead in wind
Grid Readiness: States with existing renewable infrastructure show 2.3x higher near-term ROI

Why This Matters for ML/Space Roles

This project demonstrates skills directly transferable to space industry work:

Multi-Source Data Integration: Space missions combine telemetry, ground observations, simulation, same integration challenges
Statistical Rigor: Mission trade studies require defensible analysis under uncertainty
Decision Support Modeling: Translating technical potential → operational recommendations
Reproducibility: All analysis is version-controlled, documented, and replicable
Systems Thinking: Understanding constraints beyond the technical (cost, infrastructure, policy)

These are the same patterns used in:

Mission feasibility analysis
Spacecraft design trade studies
Orbital mechanics optimization
Ground system capacity planning

Technical Stack

Python: pandas, NumPy, SciPy (core analysis)
Matplotlib / Seaborn: Visualization
Statsmodels: Statistical testing, regression analysis
Geopandas: Spatial analysis and mapping
Jupyter: Reproducible analysis notebooks
Git: Version control for data + code

Reproducibility & Open Science

The entire analysis is:

Version-controlled: Git repository with full commit history
Documented: Markdown documentation for every decision
Reproducible: Jupyter notebooks with step-by-step execution
Transparent: Assumptions, limitations, and uncertainties explicitly stated

While originally developed in an academic setting, the project is structured and presented as a standalone technical analysis, not coursework.

Current Status

Complete, Fully Reproducible Analysis

All code, data, and documentation are available. The methodology is extensible to:

Updated federal datasets (annual releases)
International comparisons
More granular regional analysis
Integration with climate models

Code Repository

View on GitHub (link to be added)

Key Insight, Good data science isn’t just about getting an answer, it’s about building confidence that the answer is right. Statistical validation, sensitivity analysis, and reproducibility are how you earn that confidence.

Share on

X Facebook LinkedIn Bluesky

Joseph Rodriguez