Determining the Optimal Target Market for a Video Game Company Using Machine Learning

Kevin Kibe
3 min readJun 1, 2023

--

Problem Statement

A video game company needs to determine the most suitable global region to market its products based on past video game sales data. This article outlines how I solved this problem using machine learning.

Dataset

The dataset used is from Kaggle. The dataset generally contains sales data for different video games and their sales in 4 regions: North America, Europe, Japan, and the rest of the world since 1985.

Methodology

The approach I took to solve this problem was to fit a model that best predicts the global sales for each video game using sales for each of those regions as the features.

Exploratory Data Analysis

A brief analysis of the dataset.

Distribution of video game sales over the years

Model Fitting

The first task is dropping all the columns and remaining with the region sales and the global sales columns.

The dataset information

Scaling the dataset using MinMax Scaler from sci-kit Learn.

scaler = MinMaxScaler()
cols=['NA_Sales','EU_Sales','JP_Sales','Other_Sales','Global_Sales']
games_df[cols] = scaler.fit_transform(games_df[cols])
games_df.head()

Train test splitting.

X = games_df.drop('Global_Sales', axis=1)
y = games_df['Global_Sales']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

Fitting the models. I tried a couple of models and linear regression was the best performing with a Mean Square Error is 2.2 and an R-Squared of 0.86.

linreg = LinearRegression(fit_intercept=True,positive=False)
linreg.fit(X_train, y_train)
y_pred = linreg.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error (MSE):", mse)
print("Mean Absolute Error (MAE):", mae)
print("R-squared (R2) Score:", r2)

Feature Importances

To find the region that best consumes video game products we have to plot the feature importance to find the feature that affects the model the most.


import matplotlib.pyplot as plt

importances = np.abs(linreg.coef_)
# Get the feature names
feature_names = ['NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales']

indices = np.argsort(importances)[::-1]

# Plot the feature importances
plt.figure(figsize=(10, 6))
plt.bar(range(len(importances)), importances[indices])
plt.xticks(range(len(importances)), [feature_names[i] for i in indices], rotation='vertical')
plt.xlabel('Features')
plt.ylabel('Importance')
plt.title('Feature Importances')
plt.show()

Conclusion

The optimal target market for a video game company in this case would be North America.

You can find the notebook here.

--

--