Determining the Optimal Target Market for a Video Game Company Using Machine Learning
Problem Statement
A video game company needs to determine the most suitable global region to market its products based on past video game sales data. This article outlines how I solved this problem using machine learning.
Dataset
The dataset used is from Kaggle. The dataset generally contains sales data for different video games and their sales in 4 regions: North America, Europe, Japan, and the rest of the world since 1985.
Methodology
The approach I took to solve this problem was to fit a model that best predicts the global sales for each video game using sales for each of those regions as the features.
Exploratory Data Analysis
A brief analysis of the dataset.
Model Fitting
The first task is dropping all the columns and remaining with the region sales and the global sales columns.
Scaling the dataset using MinMax Scaler from sci-kit Learn.
scaler = MinMaxScaler()
cols=['NA_Sales','EU_Sales','JP_Sales','Other_Sales','Global_Sales']
games_df[cols] = scaler.fit_transform(games_df[cols])
games_df.head()
Train test splitting.
X = games_df.drop('Global_Sales', axis=1)
y = games_df['Global_Sales']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)
Fitting the models. I tried a couple of models and linear regression was the best performing with a Mean Square Error is 2.2 and an R-Squared of 0.86.
linreg = LinearRegression(fit_intercept=True,positive=False)
linreg.fit(X_train, y_train)
y_pred = linreg.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error (MSE):", mse)
print("Mean Absolute Error (MAE):", mae)
print("R-squared (R2) Score:", r2)
Feature Importances
To find the region that best consumes video game products we have to plot the feature importance to find the feature that affects the model the most.
import matplotlib.pyplot as plt
importances = np.abs(linreg.coef_)
# Get the feature names
feature_names = ['NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales']
indices = np.argsort(importances)[::-1]
# Plot the feature importances
plt.figure(figsize=(10, 6))
plt.bar(range(len(importances)), importances[indices])
plt.xticks(range(len(importances)), [feature_names[i] for i in indices], rotation='vertical')
plt.xlabel('Features')
plt.ylabel('Importance')
plt.title('Feature Importances')
plt.show()
Conclusion
The optimal target market for a video game company in this case would be North America.
You can find the notebook here.