Weekly Machine Learning Research Papers — Edition #8
Written on
Overview of Recent Research
In this week's installment (from September 21, 2020, to September 27, 2020), we highlight three significant research papers in the field of machine learning.
Density-Ratio Based Clustering for Varying Densities
Authors: Ye Zhu, Kai Ming Ting, Mark J. Carman
Published in: Pattern Recognition
Link to Paper: [URL]
Abstract:
Density-based clustering methods are adept at recognizing clusters of various shapes and sizes within noisy datasets. However, many of these algorithms struggle with datasets that contain clusters with significantly different densities due to their reliance on a global density threshold. This paper analyzes the scenarios where these algorithms falter and introduces a density-ratio based approach to address these limitations. The proposed solution can be executed in two ways: modifying an existing density-based algorithm to utilize density ratios through its density estimator, or by rescaling the dataset before applying a conventional density-based clustering algorithm. The paper demonstrates through empirical evaluation using DBSCAN, OPTICS, and SNN that both methods effectively identify clusters of differing densities that would otherwise remain undetected.
A Comprehensive Framework for Clustering Uncertain Data
Authors: Erich Schubert, Alexander Koos, Tobias Emrich, Andreas Züfle, Klaus Arthur Schmid, Arthur Zimek
Published in: Proceedings of the VLDB Endowment
Link to Paper: [URL]
Abstract:
The increasing complexity of uncertain data presents challenges in querying and mining. This paper discusses a general framework designed for clustering uncertain data, which aids in visualizing how various uncertainty models affect data mining outcomes. This framework corresponds to release 0.7 of ELKI (http://elki.dbs.ifi.lmu.de/), featuring a wide range of algorithm implementations, distance metrics, indexing methods, evaluation metrics, and visualization tools.
Isolation Set-Kernel for Multi-Instance Learning
Authors: Bi-Cun Xu, Kai Ming Ting, Zhi-Hua Zhou
Published in: KDD ’19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
Link to Paper: [URL]
Abstract:
In the realm of machine learning, set-level problems hold equal importance to instance-level challenges. The critical aspect of addressing set-level issues is the measurement of similarity between sets. This paper introduces the Isolation Set-Kernel, which is entirely reliant on data distribution, eliminating the need for class information or explicit learning processes. Unlike most current set similarity measures, which do not consider the data distribution, the Isolation Set-Kernel is theoretically analyzed and shown to accelerate set-kernel computations significantly. This kernel is applied to Multi-Instance Learning (MIL) with an SVM classifier, demonstrating superior performance compared to existing set-kernels and other MIL solutions.
Previous Editions of the Reading List
- Weekly reading list #1
- Weekly reading list #2
- Weekly reading list #3
- Weekly reading list #4
- Weekly reading list #5
- Weekly reading list #6
- Weekly reading list #7
About the Author
I am Durgesh Samariya, currently pursuing my Ph.D. in Machine Learning at FedUni, Australia, and I am recognized online as TheMLPhDStudent.
Stay updated with my insights by subscribing to my newsletter.
Connect with Me Online
Follow my journey on Instagram, Kaggle, GitHub, and Medium.
Thanks for your interest in my research!