K-Means in Practice


listen on castbox.fmlisten on google podcastslisten on player.fmlisten on pocketcastslisten on podcast addictlisten on tuninlisten on Amazon Musiclisten on Stitcher

--:--
--:--


2022-04-04

K-Means in Practice

In today’s episode, Mujtaba Anwer joins us to discuss the real-life applications of k-means clustering. Mujtaba is a researcher, Data Scientist, a policy adviser to the government on digital and data entities. 

Mujtaba started by discussing his first encounter with k-means in the workplace. It was a fraud detection where he had to classify fraudulent customers. Since the dataset was without labels, Mujtaba was forced to use an unsupervised learning approach. He talked about the hypothesis used when modeling the problem, and the challenges he faced when interpreting the results. 

Mujtaba then went on to explain the best use case to use a given clustering technique. For instance, when is the best situation to use K-means, K-median, K-mode, or spectral clustering?

Going forward, he explained how he used k-means in Customer Relationship Management (CRM). Specifically, for customer segmentation. This was a better problem to solve with k-means than in fraudulent customer classification. Mujtaba explained the reasons. He also discussed how the feature engineering process entailed and whether it was critical in optimizing the results.  

The data scientist then talked about how you can transform real-life customer data to simple X and y coordinates for a clustering algorithm to consume. But beyond feeding the data in the clustering algorithm, determining the right number of clusters to use, and making sense of the cluster output are equally vital. Mujtaba extensively discussed how to make an informed decision on the best k and in interpreting the results. 

He also particularly mentioned how important exploratory data analysis (EDA) and some domain knowledge come in handy in making sense of the results. Mujtaba closed with an important take home, which was centered around understanding your model architecture and playing around with different variants. You can follow him on Twitter @onlymujtaba

Mujtaba Anwer

Based out of Finland, Mujtaba is the co-founder of "Teal Technology Group", a technology solutions consultancy, providing solutions related to fintech, martech, marketing measurement and business intelligence. Mujtaba is also an advisor to the board of a government authority in the GCC, which was recently formed to support digital businesses. Mujtaba has previously worked as a Senior Consultant under the Analytics & Cognitive Consulting service line at Deloitte, where he worked on architecture, machine learning and business intelligence projects for large corporates, with a focus on developing scaleable architecture, impact/causal analysis, and predicting the future course of events.


Thanks to our sponsors for their support

ClearML is an open-source MLOps solution users love to customize, helping you easily Track, Orchestrate, and Automate ML workflows at scale.
https://clear.ml/