学术报告:Sparse Clustering for Customer Segmentation with High-dimensional Mixed-type Data
报告时间:5月14日(星期二)下午14:00-15:00
报告地点:沙河校区,二教112
报告人:王菲菲,中国人民大学统计学院,副教授
报告摘要:Customer segmentation has wide applications in business activities, such as personalized marketing and targeted product development. To realize customer segmentation, clustering methods are commonly used. However, modern customer segmentation encounters challenges characterized by high-dimensionality and mixed-type variables (i.e., the mixture of continuous variables and categorical variables). It brings great challenges to customer segmentation, because most existing clustering methods are only designed for data with one single type of variables. Furthermore, the existence of noise variables highlights the necessity of simultaneous variable selection and data clustering. Motivated by these issues, we develop a Davies-Bouldin index based sparse clustering (DBI-SC) method for customer segmentation with high-dimensional mixed-type data. In this method, we define dissimilarity measures for continuous variables and categorical variables separately. Then an adjusted DBI criterion is designed to measure the contribution of each variable to clustering. For variable selection, we apply the sparse clustering framework and introduce different penalty parameters for the mixed-type variables. The screening consistency property of the DBI-SC method is also investigated. Extensive simulation studies demonstrate the satisfactory performance of the DBI-SC method in both clustering and variable selection. Finally, a designated driving service dataset is analyzed for customer segmentation using the proposed method.
报告人简介:王菲菲,中国人民大学统计学院副教授,博士生导师,数据科学与大数据统计系系主任,中国人民大学吴玉章青年学者。研究上关注文本挖掘及其商业应用、社交网络分析、大数据建模等,研究论文发表于JOE、JBES、JMLR,中国科学(数学)等国内外高水平期刊上。主持国家自然科学基金青年、面上项目、全国统计科学研究重大项目等。曾获中国人民大学优秀科研成果奖、课外教学优秀奖等。