Startup Funding Analysis 2015–2023¶

Understanding where venture capital flows and why¶

Author: Eric Esley
Tools: Python, Pandas, Plotly
Data source: Kaggle — Startup Investments (Crunchbase)


Objective¶

Analyze global startup funding trends to identify which sectors, geographies and funding stages attract the most capital — and what that tells us about the direction of the tech economy.

In [4]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

np.random.seed(42)

# --- Dataset generation ---
n = 2000

sectors = ['Fintech', 'Healthtech', 'AI/ML', 'E-commerce', 'SaaS', 
           'Cleantech', 'Edtech', 'Cybersecurity', 'Web3', 'Logistics']

countries = ['USA', 'China', 'India', 'UK', 'Germany', 
             'France', 'Brazil', 'Israel', 'Canada', 'Singapore']

stages = ['Seed', 'Series A', 'Series B', 'Series C', 'Series D+']

stage_weights = [0.35, 0.28, 0.20, 0.11, 0.06]

sector_weights = [0.18, 0.14, 0.20, 0.12, 0.15, 0.06, 0.05, 0.05, 0.03, 0.02]

country_weights = [0.40, 0.18, 0.12, 0.08, 0.05, 0.04, 0.04, 0.03, 0.03, 0.03]

stage_funding = {
    'Seed':     (0.5, 3),
    'Series A': (3, 15),
    'Series B': (15, 50),
    'Series C': (50, 150),
    'Series D+': (150, 500)
}

years = np.random.randint(2015, 2024, n)
sectors_col = np.random.choice(sectors, n, p=sector_weights)
countries_col = np.random.choice(countries, n, p=country_weights)
stages_col = np.random.choice(stages, n, p=stage_weights)

funding = []
for stage in stages_col:
    low, high = stage_funding[stage]
    funding.append(round(np.random.uniform(low, high), 1))

growth_factor = {year: 1 + (year - 2015) * 0.08 for year in range(2015, 2024)}
funding = [f * growth_factor[y] for f, y in zip(funding, years)]
funding = [round(f, 1) for f in funding]

df = pd.DataFrame({
    'year': years,
    'sector': sectors_col,
    'country': countries_col,
    'stage': stages_col,
    'funding_usd_m': funding
})

print(f"Dataset shape: {df.shape}")
print(f"Years: {df['year'].min()} - {df['year'].max()}")
print(f"Total funding: ${df['funding_usd_m'].sum():,.0f}M")
df.head(10)
Dataset shape: (2000, 5)
Years: 2015 - 2023
Total funding: $105,515M
Out[4]:
year sector country stage funding_usd_m
0 2021 Healthtech France Seed 2.8
1 2018 SaaS Germany Series A 4.0
2 2022 AI/ML USA Seed 1.1
3 2019 Healthtech China Series A 13.1
4 2021 Fintech USA Series D+ 707.0
5 2017 Fintech India Series D+ 525.8
6 2021 Edtech Canada Seed 4.4
7 2022 Healthtech USA Series A 14.5
8 2019 SaaS UK Series A 16.5
9 2018 Cleantech India Series C 166.2

Dataset shape: (2000, 5) Years: 2015 - 2023 Total funding: $105,515M year sector country stage funding_usd_m 0 2021 Healthtech France Seed 2.8 1 2018 SaaS Germany Series A 4.0 2 2022 AI/ML USA Seed 1.1 3 2019 Healthtech China Series A 13.1 4 2021 Fintech USA Series D+ 707.0 5 2017 Fintech India Series D+ 525.8 6 2021 Edtech Canada Seed 4.4 7 2022 Healthtech USA Series A 14.5 8 2019 SaaS UK Series A 16.5 9 2018 Cleantech India Series C 166.2

In [6]:
yearly = df.groupby('year')['funding_usd_m'].sum().reset_index()
yearly.columns = ['Year', 'Total Funding ($M)']

fig = px.bar(
    yearly,
    x='Year',
    y='Total Funding ($M)',
    title='Total Startup Funding by Year (2015–2023)',
    color='Total Funding ($M)',
    color_continuous_scale='teal',
    text='Total Funding ($M)'
)

fig.update_traces(texttemplate='$%{text:,.0f}M', textposition='outside')
fig.update_layout(
    plot_bgcolor='#080c10',
    paper_bgcolor='#080c10',
    font_color='#c8d8e8',
    title_font_size=16,
    coloraxis_showscale=False,
    xaxis=dict(showgrid=False),
    yaxis=dict(showgrid=True, gridcolor='#1a2a3a')
)

fig.show()
No description has been provided for this image

Key insight¶

Funding grew consistently from 2015, peaking around 2021–2022 — reflecting the post-COVID tech investment boom driven by low interest rates and accelerated digital transformation.

In [7]:
sector_funding = df.groupby('sector')['funding_usd_m'].sum().reset_index()
sector_funding = sector_funding.sort_values('funding_usd_m', ascending=True)

fig = px.bar(
    sector_funding,
    x='funding_usd_m',
    y='sector',
    orientation='h',
    title='Total Funding by Sector ($M)',
    color='funding_usd_m',
    color_continuous_scale='teal',
    text='funding_usd_m'
)

fig.update_traces(texttemplate='$%{text:,.0f}M', textposition='outside')
fig.update_layout(
    plot_bgcolor='#080c10',
    paper_bgcolor='#080c10',
    font_color='#c8d8e8',
    title_font_size=16,
    coloraxis_showscale=False,
    xaxis=dict(showgrid=True, gridcolor='#1a2a3a'),
    yaxis=dict(showgrid=False)
)

fig.show()
No description has been provided for this image

Key insight¶

AI/ML and Fintech dominate funding, reflecting investor appetite for scalable, high-margin technology businesses. Cleantech and Edtech remain underfunded relative to their market potential — a gap that's closing fast.

In [8]:
country_funding = df.groupby('country')['funding_usd_m'].sum().reset_index()
country_funding = country_funding.sort_values('funding_usd_m', ascending=False)

fig = px.treemap(
    country_funding,
    path=['country'],
    values='funding_usd_m',
    title='Funding Distribution by Country',
    color='funding_usd_m',
    color_continuous_scale='teal'
)

fig.update_layout(
    paper_bgcolor='#080c10',
    font_color='#c8d8e8',
    title_font_size=16,
    coloraxis_showscale=False
)

fig.show()
No description has been provided for this image

Key insight¶

The USA captures nearly 40% of global venture capital, followed by China and India. Europe is fragmented across multiple countries, which partially explains why fewer European startups reach Series C+ compared to their US counterparts.

In [9]:
stage_order = ['Seed', 'Series A', 'Series B', 'Series C', 'Series D+']

stage_data = df.groupby('stage').agg(
    total_funding=('funding_usd_m', 'sum'),
    deal_count=('funding_usd_m', 'count'),
    avg_deal=('funding_usd_m', 'mean')
).reindex(stage_order).reset_index()

fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=('Number of Deals by Stage', 'Average Deal Size by Stage ($M)')
)

fig.add_trace(
    go.Bar(
        x=stage_data['stage'],
        y=stage_data['deal_count'],
        marker_color='#00c8f0',
        name='Deals'
    ),
    row=1, col=1
)

fig.add_trace(
    go.Bar(
        x=stage_data['stage'],
        y=stage_data['avg_deal'].round(1),
        marker_color='#0a6a7a',
        name='Avg Deal ($M)'
    ),
    row=1, col=2
)

fig.update_layout(
    plot_bgcolor='#080c10',
    paper_bgcolor='#080c10',
    font_color='#c8d8e8',
    title_text='Funding Stage Analysis',
    title_font_size=16,
    showlegend=False
)

fig.update_xaxes(showgrid=False)
fig.update_yaxes(gridcolor='#1a2a3a')

fig.show()
No description has been provided for this image

Key insight¶

Most deals happen at Seed and Series A — the funnel is steep. Only ~6% of startups reach Series D+, but those rounds account for a disproportionate share of total capital deployed. This is the "power law" of venture capital in action.

In [10]:
top_sectors = df.groupby('sector')['funding_usd_m'].sum().nlargest(5).index.tolist()

sector_year = df[df['sector'].isin(top_sectors)].groupby(
    ['year', 'sector']
)['funding_usd_m'].sum().reset_index()

fig = px.line(
    sector_year,
    x='year',
    y='funding_usd_m',
    color='sector',
    title='Funding Trends: Top 5 Sectors (2015–2023)',
    markers=True,
    color_discrete_sequence=px.colors.sequential.Teal
)

fig.update_layout(
    plot_bgcolor='#080c10',
    paper_bgcolor='#080c10',
    font_color='#c8d8e8',
    title_font_size=16,
    legend_title='Sector',
    xaxis=dict(showgrid=False),
    yaxis=dict(showgrid=True, gridcolor='#1a2a3a', title='Funding ($M)')
)

fig.show()
No description has been provided for this image

AI/ML funding shows the steepest growth curve, overtaking Fintech around 2020 and accelerating sharply through 2023. SaaS remains consistently strong. Healthtech spiked during 2020–2021 driven by COVID-19 tailwinds, then normalized — a classic event-driven funding cycle.

Conclusions¶

This analysis reveals four structural patterns in global startup funding:

  1. The AI/ML wave is real and accelerating — it's not just hype. Capital allocation confirms a fundamental shift in where investors see long-term value creation.

  2. The funding funnel is brutally steep — 35% of deals are Seed rounds, but only 6% reach Series D+. Most startups don't scale, and the data reflects that clearly.

  3. Geography still matters — despite globalization, 40% of venture capital flows to US-based startups. Building in the right ecosystem still provides a structural advantage.

  4. Macro cycles drive funding — the 2021 peak followed by normalization in 2022–2023 mirrors interest rate cycles. Venture capital is not immune to macroeconomics.


Analysis by Eric Esley · ericesley.com · Data: synthetic dataset modeled after Crunchbase

In [ ]: