首页 > 编程学习 > kaggle实战:基于超市消费数据的用户个性化分析案例

大家好,今天给大家分享一篇 kaggle 数据集的新文章:基于一份超市消费数据集的用户个性化分析以及用户分群的实现。

更多详细内容参考原数据集地址:

https://www.kaggle.com/code/sonalisingh1411/customer-personality-analysis-segmentation/data?select=marketing_campaign.csv

主要内容

本文的主要内容:

  • 数据和字段的基本信息

  • 缺失值分析与处理

  • 从4个方面来具体探索字段信息

  • 时间字段的处理

  • 双变量分析

  • 相关性分析

  • 用户分群与可视化

缺失值情况:

技术提升

本文由技术群粉丝分享,项目源码、数据、技术交流提升,均可加交流群获取,群友已超过2000人,添加时最好的备注方式为:来源+兴趣方向,方便找到志同道合的朋友

方式①、添加微信号:dkl88191,备注:来自CSDN +研究方向
方式②、微信搜索公众号:Python学习与数据挖掘,后台回复:加群

字段含义

主要是涉及到4个方面的字段:人、商品、促销、地点

PEOPLE

  1. ID: Customer’s unique identifier.

  2. Year_Birth: Customer’s birth year.

  3. Education: Customer’s education level.

  4. Marital_Status: Customer’s marital status.

  5. Income: Customer’s yearly household income.

  6. Kidhome: Number of children in customer’s household.

  7. Teenhome: Number of teenagers in customer’s household.

  8. Dt_Customer: Date of customer’s enrollment with the company.

  9. Recency: Number of days since customer’s last purchase.

  10. Complain: 1 if customer complained in the last 2 years, 0 otherwise.

PRODUCTS

  1. MntWines: Amount spent on wine in last 2 years.

  2. MntFruits: Amount spent on fruits in last 2 years.

  3. MntMeatProducts: Amount spent on meat in last 2 years.

  4. MntFishProducts: Amount spent on fish in last 2 years.

  5. MntSweetProducts: Amount spent on sweets in last 2 years.

  6. MntGoldProds: Amount spent on gold in last 2 years.

PROMOTION

  1. NumDealsPurchases: Number of purchases made with a discount.

  2. AcceptedCmp1: 1 if customer accepted the offer in the 1st campaign, 0 otherwise.

  3. AcceptedCmp2: 1 if customer accepted the offer in the 2nd campaign, 0 otherwise.

  4. AcceptedCmp3: 1 if customer accepted the offer in the 3rd campaign, 0 otherwise.

  5. AcceptedCmp4: 1 if customer accepted the offer in the 4th campaign, 0 otherwise.

  6. AcceptedCmp5: 1 if customer accepted the offer in the 5th campaign, 0 otherwise.

  7. Response: 1 if customer accepted the offer in the last campaign, 0 otherwise.

PLACE

  1. NumWebPurchases: Number of purchases made through the company’s web site.

  2. NumCatalogPurchases: Number of purchases made using a catalogue.

  3. NumStorePurchases: Number of purchases made directly in stores.

  4. NumWebVisitsMonth: Number of visits to company’s web site in the last month.

双变量分析

相关性

分群结果

plt.figure(figsize= (15,8))  
plt.scatter(X[y == 0, 0], X[y == 0, 1],   s = 25, c = 'mediumblue',   label = 'one')  
plt.scatter(X[y == 1, 0], X[y == 1, 1],   s = 25, c = 'turquoise',   label = 'two')  
plt.scatter(X[y == 2, 0], X[y == 2, 1],   s = 25, c = 'red',   label = 'three')  
plt.scatter(X[y == 3, 0], X[y == 3, 1],   s = 25, c = 'green',   label = 'four')  
plt.scatter(X[y == 4, 0], X[y == 4, 1],   s = 25, c = 'yellow',   label = 'five')  plt.scatter(kmeans.cluster_centers_[:, 0],  kmeans.cluster_centers_[:, 1],   s = 55, c = 'black',   label = 'Centroids')  plt.title('Clusters of customers',fontsize = 20)  
plt.xlabel('Income',fontsize = 15)  
plt.ylabel('Expenses',fontsize = 15)  plt.legend(fontsize = 15)  
plt.show()  


本文链接:https://www.ngui.cc/article/show-747463.html
Copyright © 2010-2022 ngui.cc 版权所有 |关于我们| 联系方式| 豫B2-20100000