Cohort Analytics Offshore Outsource Bangalore

Cohort Analytics

Cohort analysis is a broad topic, and it has many variations. Let's first give some basic definitions:

  • Cohort - this is a group of people or events who share a common characteristic over time
  • Cohort analysis - this is the study of activity / behavior of a particular cohort or a group of them over time (or other iteration).

In metacode, a cohort analysis is similar to this: 'take a dataset which has uniquely identified entity (customer in this case), define unique cohorts by which the items can be grouped (in this case we will use the year and month of first purchase), and follow the behavior of the cohorts over time (in this case the sum of the OrderValue for each cohort per timeslice)'.

It may seem a bit complex at first sight: However it is straightforward in practice. In this case we are asking the questions: It may seem a bit complex at first sight: However it is straightforward in practice. In this case we are asking the questions:

  • 'How do our users behave as they get further away from their first purchase date?'
  • 'Do they keep buying equally a lot as they did in the first month of their activity, or do they phase out as time passes?'
  • 'Is there a pattern when we compare the first purchase months of different cohorts?'

Let's load the libraries first:

  • library(dplyr)/span>
  • library(reshape2)
  • library(ggplot2)
  • library(RODBC)

Then we will connect to SQL Server and load the data from the view we created into memory:

  • cn<-odbcDriverConnect(connection="Driver={SQL Server};server=localhost;database=AdventureWorksDW2012;trusted_connection=yes;")
  • orders <- sqlFetch(cn, 'vCustomer_Data', colnames=FALSE,rows_at_time=1000)

Then we have to convert and format the OrderDate variable:

  • orders$OrderDate <- as.Date(orders$OrderDate, format='%Y-%m-%d')
After this, we will create a new dataset in memory. This dataset will have additional features that we’ll need for our graphical representation later on:
  • cohort <- orders %>%arrange(CustomerKey, OrderDate) %>%group_by(CustomerKey) %>%mutate(FPD = min(OrderDate),
  • CohortMonthly = paste0('Cohort-', format(FPD, format='%Y-%m')),
  • LifetimeMonth = ifelse(OrderDate==FPD, 1, ceiling(as.numeric(OrderDate-FPD)/30))) %>% ungroup()
Here is what the code does:
  • Takes the orders dataset and arranges it by CustomerKey and OrderDate (this is to make sure the data is in chronological order per user)
  • Groups it by Customerkey, since we need data on cohort level
  • Creates three new variables (columns) in our dataset: FPD (first purchase date), CohortMonthly and LifetimeMonth.


Knowledge Base