Let's implement the k-means algorithm for a single variable. You will start with one dimensional vector, which has 20 records, as shown here:
data = [1,2,3,2,1,3,9,8,11,12,10,11,14,25,26,24,30,22,24,27] trace1 = go.Scatter( x=data, y=[0 for x in data], mode='markers', name='Data', marker=dict( size=12 ) ) layout = go.Layout( title='1D vector',
)
traces = [trace1]
fig = go.Figure(data=traces, layout=layout)
plot(fig)
This will output following plot, as shown in this diagram:
Our aim is to find 3 clusters which are visible in the data. In order to start implementing the k-means algorithm, you need to initialize cluster centers by choosing random indexes, as shown here:
n_clusters = 3
c_centers = np.random.choice(X, n_clusters)
print(c_centers)
# [ 1 22 26...