Supervised vs. Unsupervised Machine Learning

My sister is six years old. What I’ve noticed over the past few years as she’s become more ‘sentient’ is that she learns from observation more than from direct command. You can tell her to put away the milk after pouring a glass of cereal, but modeling that actually cements that behavior as the correct and expected one from her. On the other hand, teaching her how to read or write cannot rely on observation alone. For that, heavy instruction is needed with proper reward functions to keep the motivation going. Machine learning models are like children in a lot of ways; there are different ways to teach them based on what you are trying to get them to learn. Often, in complex problems multiple learning techniques are used together (i.e. Large Language Models). Supervised and Unsupervised learning are two different techniques used to train models. Today, I’ll briefly compare the two and discuss use cases for both.

Supervised Learning is the most simple to understand. Its fundamental idea lies in Brief introduction on Linear Regression|Linear Regression, where in the model is the Line of Best Fit and the training data relies on the known association of labels (outputs) to features (inputs). The keyword here is known. Supervised learning relies on explicit training data that has the correct associations between inputs and outputs. This data is fed into a model that uses Gradient Descent (a training technique) to adjust the weights and bias of the regression equation so that the model reduces loss (the absolute or squared difference between the expected value and the predicted value, or the vertical distance between the Line of Best Fit and the expected, plotted value). In other words, the training data is used to adjust little knobs that influence how the model makes a prediction. With enough data and training time, these knobs can be tuned precise enough to produce a powerfully predictive model. Without this training data that has all of the feature-label associations spelled out, training would not work. Based on training with data that displays the correct output for a given input, the model can then make predictions on new data that it hasn’t seen before. The use of the word “supervised” is used to describe that the model has seen the correct associations between features and labels and not that the ML practitioner is in someway “supervising” the model itself and nudging it across the finish line. Supervised learning would be like a child learning through practicing with flash cards: the side they see is the input, their guess of what is on the back of flash cards is the prediction, and what is actually on the back is the correct output. Each time they review the flashcards, they try to remember the association between the front of the card and the back of the card. The flashcards exist with the correct fronts and backs ready for practice. Eventually, they may be able to correctly answer questions from new flashcards given the patterns they memorized from the previous sessions.

Unsupervised Learning, by contrast, does not rely on known correct associations between features and labels. Instead, its primary purpose is to allow models to learn patterns from data without those patterns being spelled out in the training data itself. Just the other day, my family was making s’mores. We used a pack of Hershey Nuggets which contained a random assortment of silver, gold, and orange wrapped chocolate pieces. My sister, without any instruction, organized those nuggets by color. She looked at the nuggets, saw that all of them had one of three color foils, and then grouped them into piles based on what she saw. In a lot of ways, this is how an unsupervised machine learning model works. Unsupervised learning models make predictions by being given data that doesn’t contain any correct answers. Instead, its goal is to identify meaningful patterns among the data. It isn’t given any hints on how to categorize the data, and instead makes its own rules. Going back to the example with my sister, it is just as likely that she could have grouped the random nuggets by the label on the wrapper instead of its color. Or, she would have grouped them by their physical location, putting the nuggets from the top of the bag in one pile, the middle in another, and the bottom in another. In any case, she would have created some sort of organizing principle she could use to group new nuggets she’d pull out of the bag. This is one of the most commonly employed techniques for unsupervised learning called clustering. The model finds data points across natural groupings it recognized.

If we take the example of building a multi-class classification model, the differences between both learning styles becomes more apparent. Supervised learning could be used to create a classifier to group different things together and predict what group a new thing belongs to. These groupings would have to be found in the training data itself. In other words, the ML practitioner has to make the correct associations between such-and-such features for such-and-such group themselves. Unsupervised learning, on the other hand, would create a classifier, but without any pre-configuration by the ML practitioner. So, in theory, the unsupervised learning model could arrive at the same groupings as the supervised one. Just as likely, though, is that it could arrive at something completely different, finding a through-line in the dataset unbeknownst to practitioners. This is where unsupervised learning is fascinatingly useful: discovering patterns in piles of data we can’t see.

In choosing between supervised and unsupervised learning, the main point of differentiation lies with whether or not you can obtain labelled data. Most popular use cases of machine learning involve supervised learning: image classification, sentiment analysis, next-word prediction, recommendation engines, etc. For unsupervised learning, some of the most common applications include generating word embeddings (used for Large Language Models), Market Basket Analysis (discovering products that are often bought together), and clustering (grouping data together). Most machine learning problems rely on having labelled data and as such, most machine learning problems take advantage of supervised learning. Even something as complex as ChatGPT or Claude leverages supervised learning for training with human-labelled prompt-response pairs.

Whether you pick supervised or unsupervised learning for your next project, I hope this brief article helped increase your intuition surrounding these fundamental concepts. As artificial intelligence becomes a more influential part of our economy, understanding the differences between various training techniques can shed light on the model’s capabilities and pitfalls.

Supervised vs. Unsupervised Machine Learning

Machine learning models are like children, and if you know children, you know there are many ways of teaching them.