Dividing pandas dataframe into bins using qcut and cut
Lets use a sample dataframe
1
2
df = sns.load_dataset('iris')
df.head()
Lets say we want to divide the dataframe into 5 bins
based on the petal length. We can do that using qcut or cut
.
Using qcut
qcut
tries to divide the dataframe into bins such that similar proportion of data numbers are present in each bin.
1
df['qcut_bin'] = pd.qcut(df['petal_length'],5)
using cut
If we use cut
to divide into 5 bins using petal_length
, then it will generate the bins into 5 equal proportion based on the petal length values
.
1
df['cut_bin'] = pd.cut(df['petal_length'],5, include_lowest=True)
The following picture represents different bins and numbers of data in each bin
1
2
3
4
5
6
7
8
plt.figure(figsize=(15,7))
plt.subplot(1,2,1)
sns.countplot(df['qcut_bin'])
plt.xticks(rotation=90)
plt.subplot(1,2,2)
sns.countplot(df['cut_bin'])
plt.xticks(rotation=90)