Product Placement Recommendation
Exploring Machine Learning approaches to decide how to position slot games on a website to maximize performance
Product placement is becoming an increasingly important way for brands to reach their target audience in subtle ways. Businesses are using product placement to increase their sales, brand awareness, and draw in customers — all without “traditional” ads. It’s the art of determining where your products appear within a store or website through planning, negotiation, and design.
Behavioural science studies show that product placement does have a net positive impact on product performance and brand awareness. But can we find the best position a product should be placed at to attract more customers?
In this project I did, various machine learning approaches were explored that identify the best position of a slot game on a given website, in order to maximize turnover, number of bets and player loyalty.
To accomplish the above, a three-step approach was implemented.
- A regression task that predicts the daily turnover of a game based on its unique characteristics and given location.
- Identify key features that influence the algorithm’s decision in order to implement an interpretability pipeline that gives a detailed output of feature importance.
- Counterfactual explanations were used combined with our algorithm to implement a system that recommends the ideal position of each game given a specific goal.
The process
The first step was to deploy an algorithm to predict the turnover of a game on a given day. This process involved data wrangling and feature engineering in order to find the best combination of features to better predict the outcome. After the feature engineering, a combination of algorithms was used in an ensemble technique, in which Extreme Gradient Boosting and Deep Neural Networks combined to form one predictor. Iterating through feature engineering and modeling with parameters experimentation yielded the best possible results.
The next step of this approach was to be able to develop a model explainability pipeline which would output the feature importance in each decision that the algorithm made. This step was vital to guide us through data issues and to better understand what our predictor was learning and then with data engineering we managed to make our predictor learn differently.
The final step was to recommend a games’ position and it was made possible through our algorithm from the first step combined with counterfactuals explanations.
Let’s explore our data we had in our disposal
We have been given tabular data from two different data sources which contain performance data, positional data or game’s metadata. The challenges that we faced was:
- First we had data quality issues that we had to solve and
- Second we had issues when trying to merge the data since we had two different sources with incompatible id’s. The volume of our data was 14 thousand rows of combines performance and positional data
Most important features:
Turnover — which is the measure of a game’s performance and the target variable in our implementation.
Positional data, Column and Row which represent the number of column and row a game is located on the grid. As we saw in the previous slide, the game’s are hosted in websites like elements in matrices, or components on a grid.
The remaining features explain the physical characteristics of a game like the theme, the volatility, whether or not a game rewards a player for achievements and more.
Data Wrangling
Clean, clean, clean. Yes you guested it, data was not ready to use. No worries though, we have a strategy to take it easy and show them some love.
Before beginning our research and implementation we had to unify our data. The company gave us 7 different data tables in csv format as I mentioned earlier, which were generated from two different databases and had totally different formatting with one another. We had over 400 thousand rows of position data, however we had to filter only those records that we also had performance data. Since those records did not share the same ID’s, we had to come up with other ways to filter and combine the data, and we choose to use a technique called Levenshtein distance and compute the difference in characters like the name of the games, the name of the creator of that game and then cross reference that with the exact date of the record. Whenever we had a similarity score of over 85%, we were sure that we were matching the records correctly.
Data Exploration
After completing all the data wrangling tasks and brought the data into one unified formatting, we proceeded to the exploratory data analysis and we encountered the following issues:
The issue that troubled us the most was the fact that at first there was no direct correlation between turnover and position.
The second issue was that the turnover was heavily influenced by player’s favorite games. We don’t have the exact metric for our data but by understanding the business, we are certain that the biggest percentage of our turnover was not a product of good positioning but rather a product of customer’s favorite games that would search directly from the search bar. Since we aggregated the turnover on a daily bases per game to preserve customer’s anonymity we could not exclude customer’s with favorite games that resulted in more turnover regardless of the game’s position and this is why our implementation did not include a normal recommendation engine but rather a new approach on recommendations that could bypass the above issues.
Finally, we had to compensate for imbalanced data since we had many games with more than 500 days of appearance and many others for only 5 days of appearance which added a lot of noise to our data.
Given all of the above issues, our estimator at first would consider the game itself the deciding factor of turnover and could not see the effect of position and performance through the layers of data noise.
We will later see how we managed to clean our data from all that noise and through proper feature engineering we managed to shed light on the position and turnover correlation.
Tackling the problem
Now let’s look into the approach we took to tackle this problem.
Just for a reminder, we have tabular data with turnover as our target variable. The first step of our approach was to create a model that can estimate turnover when given a game’s characteristics and position. We conducted experiments with and without the positional data to see how that affects the decision making. Our main goal was to create a robust estimator that given a specific game, on a specific date, can inform us about it’s expected turnover within a small margin of error.
At the next step, we attempted to understand how and why our estimator makes its decisions. We used various machine learning techniques to show us whether or not the position of a game plays an important role in the turnover. After that we had to make a decision based on our findings.
If our estimator did not consider the position an important factor in the value of turnover we would have to move back to feature engineering to remove more noise from the data and start again at step 1 to repeat the process. If we can prove that the estimator is influenced by position when estimating the value of turnover, then we could proceed to the third step of the approach.
The third and final step includes the recommendation of the ideal game positioning with the purpose of maximizing performance measured in turnover. We used a machine learning technique called Counterfactual explanations which will present later.
Since we have stated our problem and agreed on the methodology that we will follow to solve it, we can move forward with our system pipeline.
In this figure we can see our system pipeline from start to finish. We have the Data Engineering and Feature Engineering steps regarding our data, followed by the modelling step, the explanations step and finishing with the recommendations with counterfactuals. This pipeline has two iterative processes in it with the purpose of finding the best combination of feature engineering and modeling.
Through the first iterative process we will find the best estimator and this takes place between the Data Engineering and Modelling step. This iterative process is quite usual in machine learning and is involves experimentation with different ways of cleaning, scaling and transforming data, feature engineering and modelling till we find the best possible estimator to fit our data.
Once we have completed the first iterative process and have concluded on the best estimator, we proceed to the second iterative process which starts at the feature engineering step and by utilizing mostly the best estimator, explores the most important features in the decision making. Through that process we conduct multiple experiments with feature engineering to shed the maximum light into the relation between Position and Turnover. After that, we can move onto the counterfactuals step and generate recommendations.
Let’s now explore each step of the pipeline
Data Engineering
The first step of the implementation includes the data engineering phase. After evaluating our data, we found that we had disproportional turnover values, outliers, duplicated data, inconsistencies and missing data. Some of the above were easier to remedy and some other had to be carefully addressed. Regarding the features that we would feed the model, we had to carefully select those as we had limited data in our disposal.
In selecting features we used either manual ways or multivariate selection with backwards elimination. Finally, we treated all remaining data quality issues by imputing missing values, scaling and normalizing disproportional values, encoding features and fixed duplicated data.
Feature Engineering
The second part of the implementation was to utilize and construct the best possible features to remove the noisy layers from our data and shed light into the effect of position and turnover.
Two of the most effective feature engineering strategies were to:
- scale all games to the same degree and remove some characteristics from the equation (for example mute famous or favorite games) and
- widen our position perspective.
When talking about muting the games, we mean that we want to put them all into one scale where all games are equal. We want to see the effect of the position without the bias of each game at each position. If a famous game has a high turnover on a specific position of the matrix, the estimator may think that this particular position is important, just because of the game. However we want to see the effect of each position as is, without the contribution of each game being hosted there each day. By applying this filter in our data, we make sure that our estimator is unbiased regarding the game in each position.
The second most effective feature engineering approach was to widen our perspective of position. Since tile-level effect did not give us enough insight about the position and turnover relation, we decided to explore field-level effect which would in theory give us more information about the relation however in less details. Therefore, our goal was to find the best combination of field size and detail of information when trying to separate our grid into smaller sections.
Modelling
Moving on to the next part of the implementation, we are at the stage of constructing the estimator. We used many different algorithms and the ones with the best results came with an ensemble of Extreme Gradient Boost algorithm combined with deep neural networks.
Evaluation
Model evaluation is just as important as model selection therefore we had to find ways to better interpret the estimator’s results. When estimating the turnover, we care about the monetary value and therefore we selected a metric that is easy to understand measured in dollars, and that is Mean Absolute Error for the final evaluation and RMSE for comparing the algorithms with each other.
The test set was divided in buckets for a more precise evaluation based on the distribution of the turnover values. Our primary goal is to evaluate each bucket separately and to explore the percentage of error when compared to the value of turnover.
Finally, for better supervision, we used one more machine learning technique, called Anchors which can detect anomalies in the decision making of our estimator. It’s a deep model evaluation technique based on the paper of Ribeiro, Singh and Guestrin of 2018.
Following that paper, we turned our problem into a classification one and we were monitoring if any of the algorithm’s decisions was overestimated or underestimated by a lot. By pinpointing these mistakes, we could better understand the decision making of our algorithm and make sure that it did not fall into any major pitfalls.
The results of the anchors showed us that our algorithm did not make any grave errors and with the knowledge that we have a robust estimator, we were ready to continue with our next step.
Explanations
The next step of the implementation was the model explainability phase. As you can recall from previous slides, we conducted many experiments to find the best combination of feature engineering that would give us the most insight in the relation of position and turnover. All these experiments were evaluated with Shapley explanations which is the technique we used to monitor feature importance in our decision making.
Want to learn more about shapley values? (https://shap.readthedocs.io/en/latest/)
What are Shapley explanations though and how we used them in our implementation? The concept of Shapley is as follows:
Let’s say that we have a team game, what Shapley values tell us is how much each player contributed to the result.
In our implementation, in the team effort of estimating Turnover, how much each feature contributed to the estimation of Turnover. Our goal was to find whether the position data would contribute the most in the decision making of the algorithm, which is the first question of this capstone project. To make sure that all requirements are satisfied we were guided by the paper called The Shapley Value in Machine Learning of Rozemberczki 2022 which was a catalyst for implementing this technique while maintaining statistical importance.
Recommendations
Using our best estimator and the knowledge of its decision making, we continue our implementation with a machine learning technique called Counterfactual Explanations, which is proposed by the Harvard machine learning team at 2018 as a method to model explainability. To explain what it does we will use the following example.
Let’s say that I have one record as input in my system, Counterfactuals will suggest a slightly modified version of that input which will result in different user defined outcomes.
For our particular system, we will be feeding the system with a record of a game and we will be defining our goal in turnover (let’s say our goal is 25% increase in turnover), and then the counterfactuals will suggest modifications of that game that will lead to than increase in turnover.
Dice
We used DICE for our implementation which is a Python Package that implements counterfactual explanations for regression tasks and gives us the ability to dictate our own constraints, which makes it ideal for our problem.
Want to try out for yourself? (https://github.com/interpretml/DiCE)
The results of the counterfactual explanations can then be used to form a general optimized solution for all games on the grid.
For each record, it will suggest the least amount of modification that will result in our desired turnover. The modification of the records will obey to our rules, meaning, we can allow changes to the position of the game only, and then we can specify an accepted range in altering the position.
After implementing this counterfactual generation in our system, we found out that it indeed works as promised, and manages to find the best modification and turnover increase ratio. If the system decides that a game needs a better position to increase turnover it will assign the game in that position, but if it decides that a game does not need a better position, it will assign the game in an even worse position to make room for other games that need better positioning.
That makes counterfactuals very appealing for our problem, since the websites that host the games, may have arrangements for specific games to be in specific locations, hence we need a system that given those rules can find the best positioning for all games, regardless of the constraints that we impose.
Concluding
To conclude with, this project has considered and developed a strategy that utilizes machine learning models and explanation techniques to recommend the ideal positioning of a game in order to maximize its performance. Overall, the results presented from this approach demonstrate the superior ability that machine learning models have in predicting an outcome given features or alter features that would predict a given outcome.
The predictor from step one of this system managed to estimate the turnover of each game with satisfactory results given the complexity of the problem and the number of records. Shapley values on step two worked as expected and gave valuable insight on the algorithms’ decision making. That kind of insight was instrumental for this approach and allowed the system to proceed to step three of this implementation and propose ideal positioning with counterfactual explanations, a technique that even though it is unconventional, it has proven to be effective and computationally efficient.
The fact that given very little data and by utilizing a simple algorithm combined with counterfactual explanations to deploy a recommendation system is very satisfactory and exhibits the value that data-driven methods and machine learning models can provide.
Thank you!