--:--

--:--

Today, we speak with Jason Xu, an Assistant Professor of Statistical Science at Duke University who also has an affiliation with biostatistics and bioinformatics. In this episode, Jason talks about his power K-means clustering paper.

Jason kicked off by discussing the fascinating evolutions in Machine Learning and then shifted gear to talk about why K-means became necessary in his research. While he was looking for a way to design optimization algorithms, he stumbled upon K harmonic means which was an improvement on K means. He then decided to extend the work from K-harmonic means.

But you may be asking. What was the improvement about? Jason talked about the problems that needed attention in the K-means algorithm.

Speaking of the problems with K-means, Jason further explained what it means to have a locally optimal solution. One other way was to have a more intentional cluster initialization - a method used in the K-means ++ algorithm. Jason spoke about how this works. Another way is to use the power K-means algorithm. If you are wondering what the power K-means is about, Jason talked extensively about it in the episode.

Going forward, Jason talked about the computational expense when using the power K-means approach. Another interesting feature about power K-means is how you can systematically tune the hyperparameter to tweak the result without much computational demand.

Jason also spoke about the mathematical framework that makes power K-means a reliable algorithm in terms of its performance. As a way of evaluating the algorithm performance, Jason discussed how the power K-means measured with other algorithms. He also talked about the empirical findings from the evaluation.

Jason then dwelled on the specific kind of tweaking you can do to boost the algorithm’s performance. Jason later spoke about when the K-power means would feature in popular machine learning libraries. He is optimistic that with the steps being taken, it is only a matter of time.

The Assistant Professor then spoke about how other contexts can be merged with K-power means, particularly in feature selection. Here, the model does not only determine the clusters but also points out the most useful features in finding the clusters. He has other research works in spectral clustering, stochastic problem models with missing data, solving problems that were classically not solvable, and so on.

You can reach Jason on his website where you would find links to his works.

Jason Xu is an Assistant Professor in the Department of Statistical Science at Duke University, with a secondary appointment in Biostatistics and Bioinformatics. He holds a PhD in Statistics from the University of Washington, and completed a postdoctoral fellowship at UCLA before joining the faculty at Duke.

His research specializes in theory and methods for modern statistical computing and machine learning. He develops cutting-edge methodology for inference tasks ranging from learning stochastic processes and networks describing epidemic data to high-dimensional regression and clustering. Professor Xu has contributed several algorithms that overcome long-standing challenges in problems that were previously considered intractable. Despite tackling messy problems and using sophisticated theoretical tools under the hood, his work emphasizes simple algorithms that are interpretable and easy to implement.

Thanks to our sponsors for their support

ClearML is an open-source MLOps solution users love to customize, helping you easily Track, Orchestrate, and Automate ML workflows at scale.

https://clear.ml/Springboard offers end-to-end online data career programs that encompass data science, data analytics, data engineering, and machine learning engineering.

https://springboard.com