Analytics Log - Adil Khan

View Original

Under The Hood: nstart = x value Within K Means Clustering In R Programming

In the original post on website content segmentation via k means clustering in r, the code running the initial k means cluster had a ‘nstart’ value.

See this content in the original post

If you check the StatQuest [BAM!] video as well, it talks about finding centroids or points where the variation is minimized [variation being Euclidian distance between the centre and the points in that proposed cluster].

Well, with nstart = 25 above, it’s asking R to attempt this 25 times before settling on where to place the centroids.