Customer segmentation is a powerful tool that enables you to get to know your clients and thus – treat them differently and better fulfil their various needs, which is the key factor of success in the retail industry. In this blog post, I want to dive into customer segmentation data science and explain one of the most widely used customer segmentation techniques – RFM analysis.

What is RFM analysis?

RFM stands for Recency, Frequency, Monetary Value. The RFM analysis is the technique of customer segmentation based on their transaction history. It allows us to collect insights about consumer behaviour and optimize marketing strategy accordingly. In particular, one can leverage RFM to create personalized special offers to improve sales and decrease customer retention.

The RFM analysis is based on three metrics, which measure different (but equally important) customer characteristics: How much time passed since the last purchase? (Recency), How many transactions were made? (Frequency), and How much money was spent?(Monetary Value).

 Without further ado, let me show you how to conduct the RFM analysis. 

Datasets used for RFM analysis

For our RFM analysis, we used ‘Online Retail II UCI’ dataset from Kaggle (available at https://www.kaggle.com/mashlyn/online-retail-ii-uci/data)  

This Kaggle dataset contains information about made transactions with its date (‘trans_date’), amount of money (‘trans_amount’) and customer id (‘customer_id’). 

Customer_idTrans_dateTrans_amount
CS52952013-02-1135
CS47682015-03-1539
CS21222013-02-2652
CS12172011-11-1699
CS18502013-11-2078

To perform the analysis, we need to transform the dataset in such a way that each row contains data regarding one customer:

  • number of months since the last purchase (Recency),
  • number of made purchases (Frequency),
  • the total amount of spent money (Monetary Value).

Below, you can see the fragment of the transformed data frame. 

Customer_idRecencyFrequencyMonetary value
CS11122.004151012
CS11131.150201490
CS11141.051191432
CS11150.361221659
CS11166.67013857
customer segmentation consultation banner
1 hour free consultation
Have something specific in mind? Don’t hesitate to contact us for an initial conversation!
Learn more

Assigning RFM scores

The first step of the RFM analysis is to score each customer based on transaction characteristics.

There are several ways of doing this:

  • by using quantiles: we rank the customers using the chosen metric from the best to the worst one, then divide ranked customers into groups of equal sizes and assign each group a score. 
  • by using predefined boundaries: we predefine what score is assigned to the given value of a metric based on business knowledge. For example, for frequency, customers who made 0-10 purchases get score 1, 10-20 – score 2 and so on.
  • by using machine learning: we will cover this case in the next post.

In our case, we used the quantiles method. For each metric, we divided customers into five groups and assigned each group a score from 1 (the “worst”) to 5 (the “best”).

Recency score 1 is given to customers who made the last purchase a long time ago and 5 to those ones who bought something recently. For frequency/monetary value score 1 means the lowest number of transactions/amount of spent money, whereas 5 means the greatest number of transactions/amount of spent money.

Creating segments for the RFM analysis

There are a few approaches to create segments for the RFM analysis. I want to focus on the following two ways:

  1. based on 3-digit code created by concatenating scores,
  2. based on the sum of scores.

Concat scores

In this method, all scores are concatenated to 3-digit code. The “worst” customers will have a code 111 and the “best” ones 555. These codes are then used to assign customers to the segments. 

In the table below you can find a description of each segment.

SegmentCharacteristic
ChampionsThe best customers, they bought and spent a lot and made the latest purchase recently. 
Loyal CustomersVery good customers, they spent a lot. 
Potential LoyalistThey are recent customers, but they already spent a lot. 
New CustomersRecent customers, who made only some purchases. 
PromisingBought often and spent quite much, but made last. purchase some time ago. 
Need AttentionRecency and monetary value above average. 
About To SleepBelow average recency and monetary value.
At RiskThey bought frequently but didn’t make any purchase for a long time.  
Cannot Lose ThemThe customers who spent a lot, but have been inactive for a while. 
HibernatingCustomers with low frequency and monetary value, who have not bought anything for a long time.
LostThe worst customers, they didn’t make any purchase for a long time and they have never spent a lot. 

Now, we can look at our segments.

Sum scores

The simpler approach is to divide customers into groups based on the sum of their scores (S), which in our case varies between 3 and 15. It is arbitrary how one decides to choose segments’ boundaries. In this analysis, we divide customers into 3 groups:

  • bronze: S < 5
  • silver: 5 <= S < 10
  • gold: S >= 10

This method gives a quick insight into which customers are more valuable. Although, a drawback is that customers with different buying behaviours can be assigned to one segment.

SegmentMean recencyMean frequencyMean monetary valueNumber of users
Gold1.621.81510.93675
Silver2.814.9889.62352
Bronze6.711.3556.0862

Closing notes

The final step of the RFM analysis is to use knowledge about customers in practice by creating a marketing strategy and personalized offers. 

Stay tuned and find out how to use machine learning in customer analysis in my next article very soon.