Customer Clustering


listen on castbox.fmlisten on google podcastslisten on player.fmlisten on pocketcastslisten on podcast addictlisten on tuninlisten on Amazon Musiclisten on Stitcher

--:--
--:--

Customer Clustering

In this episode, we speak with Ehsan Barkhordar, a 26-year-old Data Scientist at market.com. Ehsan holds a Masters degree in Computer Science from the American University of Technology. His research interest focuses on mailing customer and mark analysis, NLP, Computer Vision, Time Series, etc.

Ehsan began by explaining the kind of tasks that can be done with customer analytics, especially when using clustering techniques. He discussed how clustering gives in depth insights into several customer behaviors.

Using the banking sector as a case study, Ehsan explained how features are classified into two: static and dynamic features. Each of these classes of features has its place in helping the Data Scientists understand hidden patterns in a customer’s activities.

Moving on, Ehsan shed light into the algorithms he used in his research paper. His paper attempts to extract insights from customers’ bank transactions. Ehsan discussed the two algorithms he used to build his model. First, the conventional LSTM autoencoder and second, a dynamic time warping algorithm. The data used to train the model was from Berkire banking system.

But the LSTM autoencoder and the dynamic time warping had their unique functions which he extensively talked about. Ehsan then discussed how he ensembled both algorithms and performed some visualization to interpret the results. Ehsan also talked about how important it is to select the right features in a machine learning project. In his case, the models he chose helped with great feature extraction.

Ehsan then gave an intuition as to why he modeled the problem as a sequential data problem and used seq2seq related algorithms: LSTM and Dynamic Time Warping.

But you may be asking. What is the practical application of this work? Ehsan touched on the actionable insights that can be derived from the model’s result. Ehsan then closed by speaking on the evaluation metrics he used and why he used K-means, rather than K-median for the clustering algorithm for data with a decent number of outliers. You can reach out to Ehsan on Twitter or LinkedIn, or better still, shoot him an email.