Customer segmentation is a powerful tool that enables you to get to know your clients and thus – treat them differently and better fulfil their various needs, which is the key factor of success in the retail industry. In this blog post, I want to dive into customer segmentation data science and explain one of the most widely used customer segmentation techniques – RFM analysis.
What is RFM analysis?
RFM stands for Recency, Frequency, Monetary Value. The RFM analysis is the technique of customer segmentation based on their transaction history. It allows us to collect insights about consumer behaviour and optimize marketing strategy accordingly. In particular, one can leverage RFM to create personalized special offers to improve sales and decrease customer retention.
The RFM analysis is based on three metrics, which measure different (but equally important) customer characteristics: How much time passed since the last purchase? (Recency), How many transactions were made? (Frequency), and How much money was spent?(Monetary Value).
Without further ado, let me show you how to conduct the RFM analysis.
Datasets used for RFM analysis
For our RFM analysis, we used ‘Online Retail II UCI’ dataset from Kaggle (available at https://www.kaggle.com/mashlyn/online-retail-ii-uci/data)
This Kaggle dataset contains information about made transactions with its date (‘trans_date’), amount of money (‘trans_amount’) and customer id (‘customer_id’).
Customer_id | Trans_date | Trans_amount |
CS5295 | 2013-02-11 | 35 |
CS4768 | 2015-03-15 | 39 |
CS2122 | 2013-02-26 | 52 |
CS1217 | 2011-11-16 | 99 |
CS1850 | 2013-11-20 | 78 |
To perform the analysis, we need to transform the dataset in such a way that each row contains data regarding one customer:
- number of months since the last purchase (Recency),
- number of made purchases (Frequency),
- the total amount of spent money (Monetary Value).
Below, you can see the fragment of the transformed data frame.
Customer_id | Recency | Frequency | Monetary value |
CS1112 | 2.004 | 15 | 1012 |
CS1113 | 1.150 | 20 | 1490 |
CS1114 | 1.051 | 19 | 1432 |
CS1115 | 0.361 | 22 | 1659 |
CS1116 | 6.670 | 13 | 857 |
Assigning RFM scores
The first step of the RFM analysis is to score each customer based on transaction characteristics.
There are several ways of doing this:
- by using quantiles: we rank the customers using the chosen metric from the best to the worst one, then divide ranked customers into groups of equal sizes and assign each group a score.
- by using predefined boundaries: we predefine what score is assigned to the given value of a metric based on business knowledge. For example, for frequency, customers who made 0-10 purchases get score 1, 10-20 – score 2 and so on.
- by using machine learning: we will cover this case in the next post.
In our case, we used the quantiles method. For each metric, we divided customers into five groups and assigned each group a score from 1 (the “worst”) to 5 (the “best”).
Recency score 1 is given to customers who made the last purchase a long time ago and 5 to those ones who bought something recently. For frequency/monetary value score 1 means the lowest number of transactions/amount of spent money, whereas 5 means the greatest number of transactions/amount of spent money.
Creating segments for the RFM analysis
There are a few approaches to create segments for the RFM analysis. I want to focus on the following two ways:
- based on 3-digit code created by concatenating scores,
- based on the sum of scores.
Concat scores
In this method, all scores are concatenated to 3-digit code. The “worst” customers will have a code 111 and the “best” ones 555. These codes are then used to assign customers to the segments.
In the table below you can find a description of each segment.
Segment | Characteristic |
Champions | The best customers, they bought and spent a lot and made the latest purchase recently. |
Loyal Customers | Very good customers, they spent a lot. |
Potential Loyalist | They are recent customers, but they already spent a lot. |
New Customers | Recent customers, who made only some purchases. |
Promising | Bought often and spent quite much, but made last. purchase some time ago. |
Need Attention | Recency and monetary value above average. |
About To Sleep | Below average recency and monetary value. |
At Risk | They bought frequently but didn’t make any purchase for a long time. |
Cannot Lose Them | The customers who spent a lot, but have been inactive for a while. |
Hibernating | Customers with low frequency and monetary value, who have not bought anything for a long time. |
Lost | The worst customers, they didn’t make any purchase for a long time and they have never spent a lot. |
Now, we can look at our segments.
Sum scores
The simpler approach is to divide customers into groups based on the sum of their scores (S), which in our case varies between 3 and 15. It is arbitrary how one decides to choose segments’ boundaries. In this analysis, we divide customers into 3 groups:
- bronze: S < 5
- silver: 5 <= S < 10
- gold: S >= 10
This method gives a quick insight into which customers are more valuable. Although, a drawback is that customers with different buying behaviours can be assigned to one segment.
Segment | Mean recency | Mean frequency | Mean monetary value | Number of users |
Gold | 1.6 | 21.8 | 1510.9 | 3675 |
Silver | 2.8 | 14.9 | 889.6 | 2352 |
Bronze | 6.7 | 11.3 | 556.0 | 862 |
Closing notes
The final step of the RFM analysis is to use knowledge about customers in practice by creating a marketing strategy and personalized offers.
Stay tuned and find out how to use machine learning in customer analysis in my next article very soon.