- Watch https://www.youtube.com/watch?v=JnnaDNNb380 for a very basic overview of supervised vs. unsupervised learning as well as an intro to k-means clustering. Disregard the last part about autoencoders--that's beyond the scope of this course.
- Open Orange with a blank canvas/project.
- Drop a 'datasets' widget onto the canvas.
- Configure it to pull the Iris dataset. (find it, then double-click on it)
- Connect a data table widget to ensure that you're getting the correct data.
- From the data table widget, connect a k-means widget (blue, unsupervised learning).
- Configure it for three (3) fixed clusters.
- Connect a scatter plot widget to visualize the output, selecting the "Color" option to use the Cluster.
- Leaving that window open, go back to the k-means widget config and change the number of clusters--2, 4, etc. to see what happens.
- Your submission will be:
- a minimum of 100 words explaining the basic functionality of k-means, including your observations from changing the 'k' parameter (the number of target clusters)
- For a good (and corny but entertaining) explanation of k-means, watch https://youtu.be/4b5d3muPQmA
- a screenshot of your scatter plot with 3 clusters (k=3)
- a single Word document with your written explanation and screenshot
- a minimum of 100 words explaining the basic functionality of k-means, including your observations from changing the 'k' parameter (the number of target clusters)
Other stuff...
Fascinating blog post from an Orange Data Mining developer on an encounter with a statistician stuck in old ways and how data can lead to interesting stories.