A/B Testing Analysis for Conversion Rate Optimization¶
Introduction¶
In the highly competitive e-commerce industry, optimizing conversion rates is a key driver of revenue growth. Even small improvements in user experience can generate significant financial impact.
This project analyzes the results of an A/B test conducted on a global e-commerce platform. The goal is to evaluate whether a new version of the website (Variant B), focused on localized user experience, improves conversion rates and revenue generation.
The analysis also explores how the impact varies across different countries and devices, providing insights for targeted business strategies.
Business Context¶
The company identified a decline in conversion rates in certain international markets. To address this issue, a new localized version of the website was developed, adapting content and design to specific regions.
An A/B test was conducted to evaluate the effectiveness of this new version.
Key business questions:
- Does the new version improve conversion rates?
- Is the impact consistent across all countries?
- What is the potential revenue impact of implementing the new version?
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
np.random.seed(42)
n = 5000
data = pd.DataFrame({
"user_id": range(n),
"country": np.random.choice(["Spain", "Mexico", "USA", "France", "Germany"], n),
"variant": np.random.choice(["A", "B"], n),
"device": np.random.choice(["mobile", "desktop"], n),
"session_time": np.random.exponential(60, n)
})
# Simulate conversion behavior (localized effect)
data["converted"] = np.where(
(data["variant"] == "B") & (data["country"].isin(["Spain", "Mexico"])),
np.random.binomial(1, 0.12, n),
np.random.binomial(1, 0.08, n)
)
# Simulate revenue
data["revenue"] = np.where(
data["converted"] == 1,
np.random.uniform(20, 200, len(data)),
0
)
# Introduce missing values
data.loc[np.random.choice(data.index, 200), "session_time"] = np.nan
data.head()
| user_id | country | variant | device | session_time | converted | revenue | |
|---|---|---|---|---|---|---|---|
| 0 | 0 | France | B | desktop | 17.435737 | 0 | 0.0 |
| 1 | 1 | Germany | A | mobile | NaN | 0 | 0.0 |
| 2 | 2 | USA | B | desktop | 24.965922 | 0 | 0.0 |
| 3 | 3 | Germany | B | mobile | 38.840883 | 0 | 0.0 |
| 4 | 4 | Germany | B | desktop | 21.064199 | 0 | 0.0 |
Data Cleaning¶
Before performing the analysis, we handle missing values to ensure data quality.
data.fillna({"session_time": data["session_time"].median()}, inplace=True)
| user_id | country | variant | device | session_time | converted | revenue | |
|---|---|---|---|---|---|---|---|
| 0 | 0 | France | B | desktop | 17.435737 | 0 | 0.0 |
| 1 | 1 | Germany | A | mobile | 40.860436 | 0 | 0.0 |
| 2 | 2 | USA | B | desktop | 24.965922 | 0 | 0.0 |
| 3 | 3 | Germany | B | mobile | 38.840883 | 0 | 0.0 |
| 4 | 4 | Germany | B | desktop | 21.064199 | 0 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 4995 | 4995 | France | A | desktop | 28.593831 | 0 | 0.0 |
| 4996 | 4996 | France | A | desktop | 0.577913 | 0 | 0.0 |
| 4997 | 4997 | Germany | B | mobile | 20.031616 | 0 | 0.0 |
| 4998 | 4998 | France | A | mobile | 7.622265 | 0 | 0.0 |
| 4999 | 4999 | Germany | A | mobile | 6.509161 | 0 | 0.0 |
5000 rows × 7 columns
Data Overview¶
The dataset contains user-level data collected during the experiment:
- variant: A (control) or B (treatment)
- converted: Whether the user made a purchase
- revenue: Revenue generated per user
- country: User location
- device: Device used
- session_time: Time spent on the website
The dataset is simulated but designed to reflect realistic user behavior.
Exploratory Data Analysis¶
We begin by analyzing overall performance differences between variants.
conversion_rate = data.groupby("variant")["converted"].mean().reset_index()
sns.barplot(data=conversion_rate, x="variant", y="converted")
plt.title("Conversion Rate by Variant")
plt.ylabel("Conversion Rate")
plt.show()
revenue_per_user = data.groupby("variant")["revenue"].mean().reset_index()
sns.barplot(data=revenue_per_user, x="variant", y="revenue")
plt.title("Revenue per User by Variant")
plt.ylabel("Average Revenue")
plt.show()
country_conversion = data.groupby(["country", "variant"])["converted"].mean().reset_index()
plt.figure(figsize=(10,6))
sns.barplot(data=country_conversion, x="country", y="converted", hue="variant")
plt.title("Conversion Rate by Country and Variant")
plt.show()
sns.barplot(data=data, x="device", y="converted", hue="variant")
plt.title("Conversion Rate by Device")
plt.show()
plt.figure(figsize=(10,6))
sns.barplot(data=data, x="country", y="converted", hue="device")
plt.title("Conversion by Country and Device")
plt.show()
Key Insights¶
- Variant B shows a slight improvement in overall conversion rate.
- Revenue per user also increases under Variant B, suggesting a positive business impact.
- The effect is not uniform across all regions:
- Spain and Mexico show a clear uplift in conversion.
- Other markets show minimal or no improvement.
- Device-level analysis indicates that user behavior differs between mobile and desktop users.
These findings highlight the importance of segmentation in evaluating experiment results.
Statistical Testing¶
To validate whether the observed differences are statistically significant, we perform a Z-test for proportions.
from statsmodels.stats.proportion import proportions_ztest
count = data.groupby("variant")["converted"].sum()
nobs = data.groupby("variant")["converted"].count()
z_stat, p_val = proportions_ztest(count, nobs)
print("Z-statistic:", z_stat)
print("P-value:", p_val)
Z-statistic: -1.7487255639453765 P-value: 0.08033846893044576
Interpretation of Results¶
The statistical test evaluates whether the difference in conversion rates between variants is statistically significant.
- A p-value below 0.05 indicates a significant difference.
- A higher p-value suggests that the observed difference may be due to random variation.
In this case, the overall difference is not strongly significant, but segmented analysis reveals meaningful differences in specific markets.
Sample Size Consideration¶
The reliability of A/B testing results depends on having a sufficient sample size.
Smaller sample sizes can lead to inconclusive or misleading results. Further data collection may be required to confirm findings, especially in segmented analyses.
Estimated Business Impact¶
If Variant B is implemented in high-performing markets, the increase in conversion rate could lead to a significant revenue uplift.
Even a small improvement in conversion rate (e.g., 1–2%) can generate substantial additional revenue at scale in large e-commerce platforms.
Limitations¶
- The dataset is simulated and may not capture all real-world complexities.
- External factors such as seasonality or promotions were not included.
- The experiment duration and sample size may limit statistical power.
Further experimentation is recommended before full deployment.
Final Conclusion¶
While the global impact of Variant B is not statistically significant, segmented analysis reveals meaningful improvements in specific markets such as Spain and Mexico.
This suggests that a localized rollout strategy would be more effective than a global implementation.
By targeting high-performing segments and continuing experimentation, the company can maximize revenue while minimizing risk.