By Lance Eliot, the AI Trends Insider
How do you learn something?
That’s the same question that we need to ask when trying to achieve Machine Learning (ML). In what way can we undertake “learning” for a computer and seek to “teach” the system to do things of an intelligent nature. That’s a holy grail for those in AI that are aiming to avoid having to program their way into intelligent behavior. Instead, the notion is to be able to somehow get a computer to learn what to do and not need to explicitly write out every step or knowledge aspect required.
Allow me a moment to share with you a story about the nature of learning.
Earlier in my career, I started out as a professor and was excited to teach classes for both undergraduate students and graduate level students. Those first few lectures were my chance to aid those students in learning about computer science and AI. Before each lecture I spent a lot of time to prepare my lecture notes and was ready to fill the classroom whiteboard with all the key principles they’d need to know. Sure enough, I’d stride into the classroom and start writing on the board and kept doing so until the bell went off that the class session was finished.
After doing this for about a week or two, a student came to my office hours and asked if there was a textbook they could use to study from. I was taken aback since I had purposely not chosen a textbook in order to save the students money. I figured that my copious notes on the board would be better than some stodgy textbook and averted them from having to spend a fortune on costly books. The student explained that though they welcomed my approach, they were the type of person that found it easier to learn by reading a book. Trying not to offend me, the student gingerly inquired as to whether my lecture notes could be augmented by a textbook.
I considered this suggestion and sure enough found a textbook that I thought would be pretty good to recommend, and at the next session of the class mentioned it to the students, indicating that it was optional and not mandatory for the class.
While walking across the campus after a class session, another student came up to me and asked if there were any videos of my lectures. I was suspicious that the student wanted to skip coming to lecture and figured they could just watch a video instead, but this student sincerely convinced me that she found that watching a video allowed her to start and stop the lecture while trying to study the material after class sessions. She said that my fast pace during class didn’t allow time for her to really soak in the points and that by having a video she would be able to do so at a measured pace on her own time.
I considered this suggestion and provided to the class links to some videos that were pertinent to the lectures that I was giving.
Yet another student came to see me about another facet of my classes. For the undergrad lectures, I spoke the entire time and didn’t allow for any classroom discussion or interaction. This seemed sensible because the classes were large lecture halls that had hundreds of students attending. I figured it would not be feasible to carry on a Socratic dialogue similar to what I was doing in the graduate level courses where I had many 15-20 students per class. I had even been told by some of the senior faculty that trying to engage undergrads in discussion was a waste of time anyway since those newbie students were neophytes and it would be ineffective to allow any kind of Q&A with them.
Well, an undergrad student came to see me and asked if I was ever going to allow Q&A during my lectures. When I started to discuss this with the student, I inquired as to what kinds of questions was he thinking of asking. Turns out that we had a very vigorous back-and-forth on some meaty aspects of AI and it made me realize that there were perhaps students in the lecture hall that could indeed engage in a hearty dialogue during class. At my next lecture, I opted to stop every twenty minutes and gauge the reaction from the students and see if I could get a brief and useful interaction going with them. It worked, and I noticed that many of the students became much more interested in the lectures by this added feature of allowing for Q&A (even for so-called “lowly” undergraduate students, which was how my fellow faculty seemed to think of them).
Why do I tell you this story about my initial days of being a professor?
I found out pretty quickly that using only one method or approach to learning is not necessarily very wise. My initial impetus to do fast paced all-spoken lectures was perhaps sufficient for some students, but not for all. Furthermore, even the students that were OK with that narrow singular approach were likely to tap into other means of learning if I was able to provide it. By augmenting my lectures with videos, with textbooks, and by allowing for in-classroom discussion, I was providing a multitude of means to learn.
You’ll be happy to know that I learned that learning is best done via offering multiple ways to learn. Allow the learner to select which approach best fits to them. When I say this, also keep in mind that the situation might determine which mode is best at that time. In other words, don’t assume that someone that prefers learning via in-person lecture is always going to find that to be the best learning method for them. They might switch to a preference for say video or textbook, depending upon the circumstance.
And, don’t assume that each learner will learn via only one method. Student A might find that using lectures and the textbook is their best fit. Student B might find lectures to be unsuitable for learning and prefer dialogue and videos. Each learner will have their own one-or-more learning approaches that work best for them, and this varies by the nature of the topic being learned.
I kept all of this in mind for the rest of my professorial days and always tried to provide multiple learning methods to the students, so they could choose the best fit for them.
Ensemble Learning Employs Multiple Methods, Approaches
A phrase sometimes used to refer to this notion of multiple learning methods is known as ensemble learning. When you consider the word “ensemble” you tend to think of multiples of something, such as multiple musicians in an orchestra or multiple actors in a play. They each have their own role, and yet they also combine together to create a whole.
Ensemble machine learning is the same kind of concept. Rather than using only one method or approach to “teach” a computer to do something, we might use multiple methods or approaches. These multiple methods or approaches are intended to somehow ultimately work together so as to form a group. In other words, we don’t want the learning methods to be so disparate that they don’t end-up working together. It’s like musicians that are supposed to play the same song together. The hope is that the multiple learning methods are going to lead to a greater chance at having the learner learn, which in this case is the computer system as the learner.
At the Cybernetic AI Self-Driving Car Institute, we are using ensemble machine learning as part of our approach to developing AI for self-driving cars.
Allow me to further elaborate.
Suppose I was trying to get a computer system to learn some aspect of how to drive a car. One approach might be to use artificial neural networks (ANN). This is very popular and a relatively standardized way to “teach” the computer about certain driving task aspects. That’s just one approach though. I might also try to use genetic algorithms (GA). I might also use support vector machines (SVM). And so on. These could be done in an ensemble manner, meaning that I’m trying to “teach” the same thing but using multiple learning techniques to do so.
For the use of genetic algorithms in AI self-driving cars see my article: https://aitrends.com/selfdrivingcars/genetic-algorithms-self-driving-cars-darwinism-optimization/
For my article about support vector machines in AI self-driving cars see: https://aitrends.com/selfdrivingcars/support-vector-machines-svm-ai-self-driving-cars/
For my articles about machine learning for AI self-driving cars see:
Benchmarks and machine learning: https://aitrends.com/ai-insider/machine-learning-benchmarks-and-ai-self-driving-cars/
Federated machine learning: https://aitrends.com/selfdrivingcars/federated-machine-learning-for-ai-self-driving-cars/
Explanation-based machine learning: https://aitrends.com/selfdrivingcars/explanation-ai-machine-learning-for-ai-self-driving-cars/
Deep reinforcement learning: https://aitrends.com/ai-insider/human-aided-training-deep-reinforcement-learning-ai-self-driving-cars/
Deep compression pruning in machine learning: https://aitrends.com/selfdrivingcars/deep-compression-pruning-machine-learning-ai-self-driving-cars-using-convolutional-neural-networks-cnn/
Simulations and machine learning: https://aitrends.com/selfdrivingcars/simulations-self-driving-cars-machine-learning-without-fear/
Training data and machine learning: https://aitrends.com/machine-learning/machine-learning-data-self-driving-cars-shared-proprietary/
Now you don’t normally just toss together an ensemble. When you put together a musical band, you probably would be astute to pick musicians that have particular musical skills and play particular musical instruments. You’d want them to end-up being complimentary with each other. Sure, some might be duplicative, such as you might have more than one guitar player, but that could be because one guitarist will be the lead guitar and the other perhaps the bass guitar player.
The same is said for doing ensemble machine learning. You’ll want to select machine learning approaches or methods that seem to make sense when considered in the totality as a group of such machine learning approaches. What is the strength of each ML chosen for the ensemble? What is the weakness of the ML chosen? By having multiple learning methods, hopefully you’ll be able to either find the “best” one for the given learning circumstance at hand, or you might be able to combine them together in a manner that offers a synergistic outcome beyond each of them performing individually.
So, you could select some N number of machine learning approaches, train them on some data, and then see which of them learned the best, as based on some kind of metrics. You might after training feed the MLs with new data and see which does the best job. For example, suppose I’m trying to train toward being able to discern street signs. So, I feed a bunch of pictures of street signs into these each ML’s of my ensemble. After they’ve each used their own respective learning approach, I then test them. I do so by feeding new pictures of street signs and see which of them most consistently can identify a stop sign versus a speed limit sign.
See my article about street signs and AI self-driving cars: https://aitrends.com/selfdrivingcars/making-ai-sense-of-road-signs/
Out of my N number of machine learning approaches that I selected for this street sign learning task, suppose that the SVM turns out to be the “best” as based on my testing after the learning has occurred. I might then decide that for the street sign interpretation I’m going to exclusively use SVM for my AI self-driving car system. This aspect of selecting a particular model out of a set of models is sometimes referred to as the “bucket of models” approach, wherein you have a bucket of models in the ensemble and you choose one out of them. Your selection is based on a kind of “bake-off” as to which is the better choice.
But, suppose that I discover that of the N machine learning approaches, sometimes the SVM is the “best” and meanwhile there are other times that the GA is better. I don’t necessarily need to confine myself to choosing only one of the learning methods for the system. What I might do is opt to use both SVM and GA, and be aware beforehand of when each is preferred to come to play. This is akin to having the two guitarists in my musical band, and each has their own strengths and weaknesses, so if I’m thoughtful about how to arrange my band when they play a concert I’ll put them each into a part of the music playing that seems best for their capabilities. Maybe one of them starts the song, and the other ends the song. Or however arranging them seems most suitable to their capabilities.
Thus, we might choose N number of machine learning approaches for our ensemble, train them, and then decide that some subset Q are chosen to become part of the actual system we are putting together. Q might be 1, in that maybe there’s only one of the machine learning approaches that seemed appropriate to move forward with, or Q might be 2, or 3, and so on up to the number N. If we do select more than just one, the question then arises as to when and how to use the Q number of chosen machine learning approaches.
In some cases, you might use each separately, such as maybe machine learning approach Q1 is good at detecting stop signs, while Q2 is good at detecting speed limit signs. Therefore, you put Q1 and Q2 into the real system and when it is working you are going to rely upon Q1 for stop sign detection and Q2 for speed limit sign detection.
In other cases, you might decide to combine together the machine learning approaches that have been successful to get into the set Q. I might decide that whenever a street sign is being analyzed, I’ll see what Q1 has to indicate about it, and what Q2 has to indicate about it. If they both agree that it is a stop sign, I’ll be satisfied that it’s likely a stop sign, and especially if Q1 is very sure of it. If they both agree that it is speed limit sign, and especially if Q2 is very sure of it, I’ll then be comfortable assuming that it is a speed limit sign.
Various Ways to Combine the Q Sets
There are various ways you might combine together the Q’s. You could simply consider them all equal in terms of their voting power, which is generally called “bagging” or bootstrap aggregation. Or, you could consider them to be unequal in their voting power. In this case, we’re going with the idea that Q1 is better at stop sign detection, so I’ll add a weighting to its results that if it’s interpretation is a stop sign then I’ll give it a lot of weight, while if Q2 detects a stop sign I’ll give it a lower weighting because I already know beforehand it’s not so good at stop sign detection.
These machine learning approaches that are chosen for the ensemble are often referred to as individual learners. You can have any N number of these individual learners and it all depends on what you are trying to achieve and how many machine learning approaches you want to consider for the matter at-hand. Some also refer to these individual learners as base learners. A base or individual learner can be whatever machine learning approach you know and are comfortable with, and that matches to the learning task at hand, and as mentioned earlier can be ANN, SVM, GA, decision trees, etc.
Some believe that to make the learning task fair, you should provide essentially the same training data to the machine learning approaches that you’ve chosen for the matter at-hand. Thus, I might select one sample of training data that I feed into each of the N machine learning approaches. I then see how each of those machine learning approaches did based on the sample data. For example, I select a thousand street sign images and feed them into my N machine learning approaches which in this case I’ve chosen say three, ANN, SVM, GA.
Or, instead, I might take a series of samples of the training data. Let’s refer to one such sample as S1, consisting of a thousand images randomly chosen from a population of 50,000 images, and feed the sample S1 into machine learning approach Q1. I might then select another sample of training data, let’s call it S2, consisting of another randomly selected set of a thousand images, and feed it into machine learning approach Q2. And so on for each of the N machine learning approaches that I’ve selected.
I could then see how each of the machine learning approaches did on their respective sample data. I might then opt to keep all of the machine learning approaches for my actual system, or I might selectively choose which ones will go into my actual system. And, as mentioned earlier, if I have selected multiple machine learning approaches for the actual system then I’ll want to figure out how to possibly combine together their results.
You can further advance the ensemble learning technique by adding learning upon learning. Suppose I have a base set of individual learners. I might feed their results into a second-level of machine learning approaches that act as meta-learners. In a sense, you can use the first-level to do some initial screening and scanning, and then potentially have a second-level that then aims at getting into further refinement of what the first-level found. For example, suppose my first-level identified that a street sign is a speed limit sign, but the first-level isn’t capable to then determine what the speed limit numbers are. I might feed the results into a second-level that is adept at ascertaining the numbers on the speed limit sign and be able to detect what the actual speed limit is as posted on the sign.
The ensemble approach to machine learning allows for a lot of flexibility in how you undertake it. There’s no particular standardized way in which you are supposed to do ensemble machine learning. It’s an area still evolving as to what works best and how to most effectively and efficiently use it.
Some might be tempted to throw every machine learning approach into an ensemble under the blind hope that it will then showcase which is the best for your matter at-hand. This is not as easy as it seems. You need to know what the machine learning approach does and there’s an effort involved in setting it up and giving it a fair chance. In essence, there are costs to undertaking this and you shouldn’t be using a scattergun style way of doing so.
For any particular matter, there are going to be so-called weak learners and strong learners. Some of the machine learning approaches are very good in some situations and quite poor in others. You also need to be thinking about the generalizability of the machine learning approaches. You could be fooled when feeding sample data into the machine learning approaches that say one of them looks really good, but it turns out maybe it has overfitted to the sample data. This might not then do you much good once you start feeding new data into the mix.
Another aspect is the value of diversity. If you have no-diversity, such as only one machine learning approach that you are using, there are likely to be situations wherein it isn’t as good as some other machine learning approach, and you should consider having diversity. Therefore, by having more than one machine learning approach in your mix, you are gaining diversity which will hopefully pay-off for varying circumstances. As with anything else, if you have too many though of the machine learning approaches it can lead to muddled results and you might not be able to know which one to believe for a given result provided.
Keep in mind that any ensemble that you put together will require computational effort, in essence computing power, in order to not only do the training but more importantly when involved in receiving new data and responding accordingly. Thus, if you opt to have a slew of machine learning approaches that are going to become part of your Q final set, and if you are expecting them to run in real-time on-board an AI self-driving car, this is going to be something you need to carefully assess. The amount of memory consumed and the processing power consumed might be prohibitive. There’s a big difference between using an ensemble for a research-oriented task, wherein you might not have any particular time constraints, and versus when using in an AI self-driving car that has severe time constraints and also limits on computational processing available.
For those of you familiar with Python, you might consider trying using the Python-oriented scikit-learn machine learning library and try out various ensemble machine learning aspects to get an understanding of how to use an ensemble learning approach.
If we’re going to have true AI systems, and especially AI self-driving cars, the odds are that we’ll need to deploy multiple machine learning models. Trying to only program directly our way to full AI is unlikely to be feasible. As Benjamin Franklin is famous for saying: “Tell me and I forget. Teach me and I remember. Involve me and I learn.” Using an ensemble learning approach is to-date a vital technique to get us toward that involve me and learn goal. We might still need even better machine learning models, but the chances are that no matter what we discover for better ML’s, we’ll end-up needing to combine them into an ensemble. That’s how the music will come out sounding robust and fulfilling for achieving ultimate AI.
Copyright 2018 Dr. Lance Eliot
This content is originally posted on AI Trends.