In a Data Science interview a year ago, I was challenged to use a small data set from our friends at FiveThirtyEight to suggest how best to design a good-selling candy. “Based on ‘market research’ you see here,” the prompt gestured, “advise the product design team on the best features of a candy that we can sell alongside brand-name candies.”
Targeted marketing requires that we identify cohorts — or segments — of customers, and define what the characteristics are that bring them together. When developing a new brand, for example, you want to know if the new offering is likely to acquire customers from other parts of your portfolio who may or may not be highly-engaged with your current offerings. When defining and flagging audiences, you might want to define different cohorts of viewing behaviors. There are just as many applications of segmentation as there are ways to implement it in python using ML.
In this tutorial, my goal is to guide you through the initial experimentation for customer segmentation. I’ll be using the Kaggle Teleco data to help you…
Businesses want to understand the types of experiences their customers are having, and what elements of a shopping experiences are successes or areas for opportunity. In the old days, we might send surveyors out to poll what people are saying, and then read through a few hundred to pull out key elements of their experience.
That has all changed in the age of online reviews, where millions reviews are globally posted per day. We can’t read all of those. We need to analyze them differently, and we need to share the results in a visual way.
If you’re interested in mining online reviews to find customer pain points or successes in a few lines of code, without any models, this is the tutorial for you! …
Histograms are a crucial part of Exploratory Data Analysis. They save us from being tricked by something like “On average, our widgets reach 1000 people.” That’s a great average! But what’s the range? Distribution? Standard deviation? Variance? These can all be answered through one simple visualization: a histogram.
But there’s a problem with histograms as they’re typically made: Binning. Time for a PSA:
Let’s take a leap into being better, more rigorous communicators and think about binning our histograms to realistically reflect the quantitative landscape. Below, I start with the concept of histograms and how they’re important. Then I jump into some code and math. …