Giter VIP home page Giter VIP logo

rfm's Introduction





PyPI Latest Release PyPI Package Status License Downloads Per Month

rfm

rfm: Python Package for RFM Analysis and Customer Segmentation

Info

rfm is a Python package that provides recency, frequency, monetary analysis results for a certain transactional dataset within a snap. Its flexible structure and multiple automated functionalities provide easy and intuitive approach to RFM Analysis in an automated fashion. It aims to be a ready-made python package with high-level and quick prototyping. On practical hand, real world data is easily suited and adapted by the package. Additionally, it can make colorful, intuitive graphs using a matplotlib backend without breaking a sweat.

Installation

Dependencies

  • Python (>=3.7)
  • Pandas (>=1.2.4)
  • NumPy (>=1.20.1)
  • matplotlib (>=3.3.4)

To install the current release (Ubuntu and Windows):

$ pip install rfm

Usage

# predefine a transaction dataset as df

>>> from rfm import RFM

>>> r = RFM(df, customer_id='CustomerID', transaction_date='InvoiceDate', amount='Amount')

>>> r.plot_segment_distribution()

License

MIT

Documentation

<-- Temporarily Hosted Here -->

Initialization

Read required dataframe

>>> df = pd.read_csv('~./data.csv')

Import RFM package and start rfm analysis automatically:

>>> from rfm import RFM

>>> r = RFM(df, customer_id='CustomerID', transaction_date='InvoiceDate', amount='Amount') 

>>> r.rfm_table

If you want to do rfm analysis manually:

>>> r = RFM(df, customer_id='CustomerID', transaction_date='InvoiceDate', amount='Amount', automated=False)

Attributes

RFM.rfm_table

returns resultant rfm table df generated with recency, frequency & monetary values and scores along with segments

>>> r.rfm_table

RFM.segment_table

returns segment table df with 10 unique categories i.e. Champions, Loyal Accounts etc.

>>> r.segment_table

Methods

RFM.plot_rfm_histograms()

Plots recency, frequency and monetary histograms in a single row

>>> r.plot_rfm_histograms()

RFM.plot_rfm_order_distribution()

Plots orders by customer number

>>> r.plot_rfm_order_distribution()

RFM.plot_versace_plot(column1, column2)

Plots scatterplot of two input columns

>>> r.plot_versace_plot(column1='recency',column2='monetary_value')

>>> r.plot_versace_plot(column1='recency',column2='frequency')

>>> r.plot_versace_plot(column1='frequency',column2='monetary_value')

RFM.plot_distribution_by_segment(column, take)

Plots Distribution of input column by segment

>>> r.plot_distribution_by_segment(column='recency',take='median')

>>> r.plot_distribution_by_segment(column='frequency',take='median')

>>> r.plot_distribution_by_segment(column='monetary_value',take='median')

RFM.plot_column_distribution(column)

Plots column distribution of input column

>>> r.plot_column_distribution(column='recency')

>>> r.plot_column_distribution(column='frequency')

>>> r.plot_column_distribution(column='monetary_value')

RFM.plot_segment_distribution()

>>> r.plot_segment_distribution()

Plots Segment Distribution, i.e. Segments vs no. of customers


RFM.find_customers(segment)

returns rfm results df with input category

>>> r.find_customers('Champions')

rfm's People

Contributors

sonwanesuresh95 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

fyx2019

rfm's Issues

Clarifying how the input data should look like

Hi, LOVE this library. Super helpful.

Only problem I had was with the date format :(. It took me a while to decipher how the input format for date ought to look like and then to transform the data.

The solution that I had for my input data which looked like this: (str) 2022-06-22 was
df['date']= pd.to_datetime(df['date'])
df['new_formatted_date'] = df.date.dt.strftime('%m/%d/%y %H:%M')

How to solve the issue?
I recommend to either:
A) Clarify exactly how the input data needs to look like
B) Enable multiple common date type input and transform them on the back-end

TypeError: unsupported operand type(s) for -: 'str' and 'str' when subtracting `datetime64[ns]`

During RFM calculation, datetime64[ns] strangely becomes str. MWE:

from rfm import RFM
import pandas as pd

df = pd.DataFrame({'timestamp': pd.to_datetime(['01-06-2018 23:15:00',  # Creating data
                                                '02-09-2019 01:48:00',
                                                '08-06-2020 13:20:00',
                                                '07-03-2021 14:50:00']),
                    'amount': range(0, 4),
                    'customerid': range(0, 4)})
print(df.dtypes)
print(type(df['timestamp'].max()))
r = RFM(df, customer_id='customerid', transaction_date='timestamp', amount='amount')
print(p.segment_table)

outputs

timestamp     datetime64[ns]
amount                 int64
customerid             int64
dtype: object
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [2], line 11
      4 df = pd.DataFrame({'timestamp': pd.to_datetime(['01-06-2018 23:15:00',  # Creating data
      5                                                 '02-09-2019 01:48:00',
      6                                                 '08-06-2020 13:20:00',
      7                                                 '07-03-2021 14:50:00']),
      8                     'amount': range(0, 4),
      9                     'customerid': range(0, 4)})
     10 print(df.dtypes)
---> 11 r = RFM(df, customer_id='customerid', transaction_date='timestamp', amount='amount')
     12 print(p.segment_table)

File ~/Library/Python/3.10/lib/python/site-packages/rfm/rfm.py:30, in RFM.__init__(self, df, customer_id, transaction_date, amount, automated)
     28 # automated operations
     29 if automated:
---> 30     df_grp = self.produce_rfm_dateset(self.df)
     31     df_grp = self.calculate_rfm_score(df_grp)
     32     self.rfm_table = self.find_segments(df_grp)

File ~/Library/Python/3.10/lib/python/site-packages/rfm/rfm.py:70, in RFM.produce_rfm_dateset(self, df)
     67 # finding r,f,m values
     69 latest_date = df[self.transaction_date].max()
---> 70 df_grp['recency'] = df_grp[self.transaction_date].apply(lambda x: (latest_date - x[-1]).days)
     71 df_grp['frequency'] = df_grp[self.amount].apply(len)
     72 df_grp['monetary_value'] = df_grp[self.amount].apply(sum)

File ~/Library/Python/3.10/lib/python/site-packages/pandas/core/series.py:4771, in Series.apply(self, func, convert_dtype, args, **kwargs)
   4661 def apply(
   4662     self,
   4663     func: AggFuncType,
   (...)
   4666     **kwargs,
   4667 ) -> DataFrame | Series:
   4668     """
   4669     Invoke function on values of Series.
   4670 
   (...)
   4769     dtype: float64
   4770     """
-> 4771     return SeriesApply(self, func, convert_dtype, args, kwargs).apply()

File ~/Library/Python/3.10/lib/python/site-packages/pandas/core/apply.py:1105, in SeriesApply.apply(self)
   1102     return self.apply_str()
   1104 # self.f is Callable
-> 1105 return self.apply_standard()

File ~/Library/Python/3.10/lib/python/site-packages/pandas/core/apply.py:1156, in SeriesApply.apply_standard(self)
   1154     else:
   1155         values = obj.astype(object)._values
-> 1156         mapped = lib.map_infer(
   1157             values,
   1158             f,
   1159             convert=self.convert_dtype,
   1160         )
   1162 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1163     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1164     #  See also GH#25959 regarding EA support
   1165     return obj._constructor_expanddim(list(mapped), index=obj.index)

File ~/Library/Python/3.10/lib/python/site-packages/pandas/_libs/lib.pyx:2918, in pandas._libs.lib.map_infer()

File ~/Library/Python/3.10/lib/python/site-packages/rfm/rfm.py:70, in RFM.produce_rfm_dateset.<locals>.<lambda>(x)
     67 # finding r,f,m values
     69 latest_date = df[self.transaction_date].max()
---> 70 df_grp['recency'] = df_grp[self.transaction_date].apply(lambda x: (latest_date - x[-1]).days)
     71 df_grp['frequency'] = df_grp[self.amount].apply(len)
     72 df_grp['monetary_value'] = df_grp[self.amount].apply(sum)

TypeError: unsupported operand type(s) for -: 'str' and 'str'

documentation for useful function

I see that there are lot of attributes to r object. Meaning if I type r. it populates lot of methods like dynamic cut offs, dynamic rfm score etc. It would be really useful to have documentation for these methods. So, we can experiment/customize thresholds based on our requirements

rfm object changes `NaN` to `nan`

I have a column in my dataframe (that is not used in rfm analysis) like as below

image

However, once I execute the below code

r = RFM(df_new, customer_id='unique_key', transaction_date='Date', amount='Revenue')

the same column market segment MC becomes like as below. I verified multiple times. Not sure why is this happening. Is there any shallow copy, deep copy etc stuff happening that is causing change in my input dataframe?

image

segment criteria - RFM

Hi,

I recently came across this package and it is very useful package. Helps people who lack extensive coding knowledge. Appreciate it.

However, I have one quick question.

How can I find out the definition and RFM score criteria for each segment?

For ex: I get a graph like below and how do I customers with which score go into Champions segment?

Can direct me to a documentation where I can find the definition and score criteria for each segment? There is no documentation link in your readme. page

Meaning, if it is 555 it is champions. What if it is 541? is it still considered champions? So, I would like to know the cut off points that you chose for each segment.

`How much drop in revenue value or recency value or frequency value is required for a record to be put into another segment? Is it 100$ or 2 days for recency or 2 points for frequency?

image

Basically, am looking for a table like below which I got from tutorial https://www.analyticseducator.com/Blog/Customer_Segmentation_Using_RFM_with_Python.html

image

Additionally, is this a new package? Can it be used for real world business insights or is it still under testing?

Nonetheless, great effort and useful package for people like me

Customers in multiple segments - expected?

I was going through your code and pasting a small section of it

 if (r in (4,5)) & (f in (4,5)) & (m in (4,5)):
                classes_append({rec[cust]:'Champions'})
            elif (r in (4,5)) & (f in (1,2)) & (m in (3,4,5)):
                classes_append({rec[cust]:'Promising'})
            elif (r in (3,4,5)) & (f in (3,4,5)) & (m in (3,4,5)):
                classes_append({rec[cust]:'Loyal Accounts'})

Based on this definition, a customer who is a Champion will also be put under the Loyal accounts category. Am I right? Is this how RFM works as well? I thought each customer will only be in one segment only

So, it is possible that same customer will be present in multiple segment?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.