Startup Funding Analysis 2015–2023¶
Understanding where venture capital flows and why¶
Author: Eric Esley
Tools: Python, Pandas, Plotly
Data source: Kaggle — Startup Investments (Crunchbase)
Objective¶
Analyze global startup funding trends to identify which sectors, geographies and funding stages attract the most capital — and what that tells us about the direction of the tech economy.
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')
np.random.seed(42)
# --- Dataset generation ---
n = 2000
sectors = ['Fintech', 'Healthtech', 'AI/ML', 'E-commerce', 'SaaS',
'Cleantech', 'Edtech', 'Cybersecurity', 'Web3', 'Logistics']
countries = ['USA', 'China', 'India', 'UK', 'Germany',
'France', 'Brazil', 'Israel', 'Canada', 'Singapore']
stages = ['Seed', 'Series A', 'Series B', 'Series C', 'Series D+']
stage_weights = [0.35, 0.28, 0.20, 0.11, 0.06]
sector_weights = [0.18, 0.14, 0.20, 0.12, 0.15, 0.06, 0.05, 0.05, 0.03, 0.02]
country_weights = [0.40, 0.18, 0.12, 0.08, 0.05, 0.04, 0.04, 0.03, 0.03, 0.03]
stage_funding = {
'Seed': (0.5, 3),
'Series A': (3, 15),
'Series B': (15, 50),
'Series C': (50, 150),
'Series D+': (150, 500)
}
years = np.random.randint(2015, 2024, n)
sectors_col = np.random.choice(sectors, n, p=sector_weights)
countries_col = np.random.choice(countries, n, p=country_weights)
stages_col = np.random.choice(stages, n, p=stage_weights)
funding = []
for stage in stages_col:
low, high = stage_funding[stage]
funding.append(round(np.random.uniform(low, high), 1))
growth_factor = {year: 1 + (year - 2015) * 0.08 for year in range(2015, 2024)}
funding = [f * growth_factor[y] for f, y in zip(funding, years)]
funding = [round(f, 1) for f in funding]
df = pd.DataFrame({
'year': years,
'sector': sectors_col,
'country': countries_col,
'stage': stages_col,
'funding_usd_m': funding
})
print(f"Dataset shape: {df.shape}")
print(f"Years: {df['year'].min()} - {df['year'].max()}")
print(f"Total funding: ${df['funding_usd_m'].sum():,.0f}M")
df.head(10)
Dataset shape: (2000, 5) Years: 2015 - 2023 Total funding: $105,515M
| year | sector | country | stage | funding_usd_m | |
|---|---|---|---|---|---|
| 0 | 2021 | Healthtech | France | Seed | 2.8 |
| 1 | 2018 | SaaS | Germany | Series A | 4.0 |
| 2 | 2022 | AI/ML | USA | Seed | 1.1 |
| 3 | 2019 | Healthtech | China | Series A | 13.1 |
| 4 | 2021 | Fintech | USA | Series D+ | 707.0 |
| 5 | 2017 | Fintech | India | Series D+ | 525.8 |
| 6 | 2021 | Edtech | Canada | Seed | 4.4 |
| 7 | 2022 | Healthtech | USA | Series A | 14.5 |
| 8 | 2019 | SaaS | UK | Series A | 16.5 |
| 9 | 2018 | Cleantech | India | Series C | 166.2 |
Dataset shape: (2000, 5) Years: 2015 - 2023 Total funding: $105,515M year sector country stage funding_usd_m 0 2021 Healthtech France Seed 2.8 1 2018 SaaS Germany Series A 4.0 2 2022 AI/ML USA Seed 1.1 3 2019 Healthtech China Series A 13.1 4 2021 Fintech USA Series D+ 707.0 5 2017 Fintech India Series D+ 525.8 6 2021 Edtech Canada Seed 4.4 7 2022 Healthtech USA Series A 14.5 8 2019 SaaS UK Series A 16.5 9 2018 Cleantech India Series C 166.2
yearly = df.groupby('year')['funding_usd_m'].sum().reset_index()
yearly.columns = ['Year', 'Total Funding ($M)']
fig = px.bar(
yearly,
x='Year',
y='Total Funding ($M)',
title='Total Startup Funding by Year (2015–2023)',
color='Total Funding ($M)',
color_continuous_scale='teal',
text='Total Funding ($M)'
)
fig.update_traces(texttemplate='$%{text:,.0f}M', textposition='outside')
fig.update_layout(
plot_bgcolor='#080c10',
paper_bgcolor='#080c10',
font_color='#c8d8e8',
title_font_size=16,
coloraxis_showscale=False,
xaxis=dict(showgrid=False),
yaxis=dict(showgrid=True, gridcolor='#1a2a3a')
)
fig.show()
Key insight¶
Funding grew consistently from 2015, peaking around 2021–2022 — reflecting the post-COVID tech investment boom driven by low interest rates and accelerated digital transformation.
sector_funding = df.groupby('sector')['funding_usd_m'].sum().reset_index()
sector_funding = sector_funding.sort_values('funding_usd_m', ascending=True)
fig = px.bar(
sector_funding,
x='funding_usd_m',
y='sector',
orientation='h',
title='Total Funding by Sector ($M)',
color='funding_usd_m',
color_continuous_scale='teal',
text='funding_usd_m'
)
fig.update_traces(texttemplate='$%{text:,.0f}M', textposition='outside')
fig.update_layout(
plot_bgcolor='#080c10',
paper_bgcolor='#080c10',
font_color='#c8d8e8',
title_font_size=16,
coloraxis_showscale=False,
xaxis=dict(showgrid=True, gridcolor='#1a2a3a'),
yaxis=dict(showgrid=False)
)
fig.show()
Key insight¶
AI/ML and Fintech dominate funding, reflecting investor appetite for scalable, high-margin technology businesses. Cleantech and Edtech remain underfunded relative to their market potential — a gap that's closing fast.
country_funding = df.groupby('country')['funding_usd_m'].sum().reset_index()
country_funding = country_funding.sort_values('funding_usd_m', ascending=False)
fig = px.treemap(
country_funding,
path=['country'],
values='funding_usd_m',
title='Funding Distribution by Country',
color='funding_usd_m',
color_continuous_scale='teal'
)
fig.update_layout(
paper_bgcolor='#080c10',
font_color='#c8d8e8',
title_font_size=16,
coloraxis_showscale=False
)
fig.show()
Key insight¶
The USA captures nearly 40% of global venture capital, followed by China and India. Europe is fragmented across multiple countries, which partially explains why fewer European startups reach Series C+ compared to their US counterparts.
stage_order = ['Seed', 'Series A', 'Series B', 'Series C', 'Series D+']
stage_data = df.groupby('stage').agg(
total_funding=('funding_usd_m', 'sum'),
deal_count=('funding_usd_m', 'count'),
avg_deal=('funding_usd_m', 'mean')
).reindex(stage_order).reset_index()
fig = make_subplots(
rows=1, cols=2,
subplot_titles=('Number of Deals by Stage', 'Average Deal Size by Stage ($M)')
)
fig.add_trace(
go.Bar(
x=stage_data['stage'],
y=stage_data['deal_count'],
marker_color='#00c8f0',
name='Deals'
),
row=1, col=1
)
fig.add_trace(
go.Bar(
x=stage_data['stage'],
y=stage_data['avg_deal'].round(1),
marker_color='#0a6a7a',
name='Avg Deal ($M)'
),
row=1, col=2
)
fig.update_layout(
plot_bgcolor='#080c10',
paper_bgcolor='#080c10',
font_color='#c8d8e8',
title_text='Funding Stage Analysis',
title_font_size=16,
showlegend=False
)
fig.update_xaxes(showgrid=False)
fig.update_yaxes(gridcolor='#1a2a3a')
fig.show()
Key insight¶
Most deals happen at Seed and Series A — the funnel is steep. Only ~6% of startups reach Series D+, but those rounds account for a disproportionate share of total capital deployed. This is the "power law" of venture capital in action.
top_sectors = df.groupby('sector')['funding_usd_m'].sum().nlargest(5).index.tolist()
sector_year = df[df['sector'].isin(top_sectors)].groupby(
['year', 'sector']
)['funding_usd_m'].sum().reset_index()
fig = px.line(
sector_year,
x='year',
y='funding_usd_m',
color='sector',
title='Funding Trends: Top 5 Sectors (2015–2023)',
markers=True,
color_discrete_sequence=px.colors.sequential.Teal
)
fig.update_layout(
plot_bgcolor='#080c10',
paper_bgcolor='#080c10',
font_color='#c8d8e8',
title_font_size=16,
legend_title='Sector',
xaxis=dict(showgrid=False),
yaxis=dict(showgrid=True, gridcolor='#1a2a3a', title='Funding ($M)')
)
fig.show()
AI/ML funding shows the steepest growth curve, overtaking Fintech around 2020 and accelerating sharply through 2023. SaaS remains consistently strong. Healthtech spiked during 2020–2021 driven by COVID-19 tailwinds, then normalized — a classic event-driven funding cycle.
Conclusions¶
This analysis reveals four structural patterns in global startup funding:
The AI/ML wave is real and accelerating — it's not just hype. Capital allocation confirms a fundamental shift in where investors see long-term value creation.
The funding funnel is brutally steep — 35% of deals are Seed rounds, but only 6% reach Series D+. Most startups don't scale, and the data reflects that clearly.
Geography still matters — despite globalization, 40% of venture capital flows to US-based startups. Building in the right ecosystem still provides a structural advantage.
Macro cycles drive funding — the 2021 peak followed by normalization in 2022–2023 mirrors interest rate cycles. Venture capital is not immune to macroeconomics.
Analysis by Eric Esley · ericesley.com · Data: synthetic dataset modeled after Crunchbase