System Load Balancing for AI Systems: The Case Of AI Autonomous Cars


By Lance Eliot, the AI Trends Insider

I recall an occasion when my children had decided to cook a meal in our kitchen and went whole hog into the matter (so to speak). I’m not much of a cook and tend to enjoy eating a meal more so than the labor involved in preparing a meal. In this case, it was exciting to see the joy of the kids as they went about putting together a rather amazing dinner. Perhaps partially due to watching the various chef competitions on TV and cable, and due to their own solo cooking efforts, when they joined together it was a miraculous sight to see them bustling about in the kitchen in a relatively professional manner.

I mainly aided by asking questions and serving as a taste tester. From their perspective, I was more of an interloper than someone actually helping to progress the meal making process.

One aspect that caught my attention was the use of our stove top. The stove top has four burner positions. On an everyday cooking process, I believe that four heating positions is sufficient. I could see that with the extravagant dinner that was being put together, the fact that there were only four available was a constraint. Indeed, seemingly a quite difficult constraint.

During the cooking process, there were quite a number of pots and pans containing food that needed to be heated-up. I’d wager that at one point there were at least a dozen of such pots and pans in the midst of containing food and requiring some amount of heating. Towards the start of the cooking, it was somewhat manageable because they only were using three of the available heating spots. By using just three, it allowed them to then allocate one spot, the fourth one, as an “extra” for round robin needs. For this fourth spot, they were using it to do quick warm-ups and meanwhile the other three spots were for truly doing a thorough cooking job that required a substantive amount of dedicated cooking time.

Pots and pans were sliding on and off that fourth spot like a hockey puck on ice. The other three spots had large pots that were gradually each coming to a bubbling and high-heat condition. When one of the three pots had cooked well enough, the enterprising cooks took it off the burner almost immediately and placed it onto a countertop waiting area they had established for super-heated pots and pans that could simmer for a bit.

The moment that one pot came off of any of the three spots, another one was instantly put into its place.

Around and around this went, in a dizzying manner as they contended with only having four available heating spots. They kept one spot in reserve and used it for more of a quick paced warm-up and had opted to use the other three for deep heated cooking. As they neared the end of the cooking process for this meal, they began to use nearly all of the spots for the quick paced warm-up needs, apparently because they had by then done the needed cooking already and no longer needed to devote any of the pots to a prolonged period on a heating spot.

As a computer scientist at heart, I was delighted to see them performing a delicate dance of load balancing.

System Load Balancing Is Unheralded But Crucial

You’ve probably had situations involving multiple processors or maybe multiple web sites wherein you had to do a load balance across them.

In the case of web sites, it’s not uncommon for some popular web sites to be replicated at multiple geographic sites around the world, allowing for more ready speed responses to those from that part of the world. It also can help when one part of the world starts to bombard one of your sites and you need to flatten out the load else that particular web site might choke due to the volume.

In the cooking situation, the kids realized that having just four burner stove top positions was insufficient for the true amount of cooking that needed to take place for the dinner. If they had opted to sequentially and serially have placed pots of food onto the burners in a one-at-a-time manner, they would have had some parts of the meal cooked much earlier than other parts of the meal. In the end, when trying to serve the meal, it would have been a nightmarish result of some food that had been cooked earlier and was now cold, and perhaps other parts of the meal that were superhot and would need to wait to be eaten.

If the meal had been one involving much less preparation, such as if they had only three items to be cooked, they would have readily been able to use the stove top without any of the shenanigans of having to float around the pots and pans. They could have just put on the three pots and then waited until the food was cooked. But, since they had more needs for cooking then just the available heating spots, they needed to devise a means to make use of the constrained resources in a manner that would still allow for the cooking process to proceed properly.

This is what load balancing is all about.

There are situations wherein there are a limited available supply of resources, and the number of requests to utilize those resources might exceed the supply. The load balancer is a means or technique or algorithm or automation that can try to balance out the load.

Another valuable aspect of a load balancer is that it can try to even out the workload, which might help in various other ways. Suppose that one of the stove tops was known to sometimes get a bit cantankerous when it is on high-heat for a long time. One approach of a load balance might be to try and keep that resource from peaking and so purposely adjust to use some other resource for a while.

We can also consider the aspect of resiliency.

You might have a situation wherein one of the resources might unexpectedly go bad or otherwise not be usable. Suppose that one of the burners broke down during the cooking process. A load balance would try to ascertain that a resource is no longer functioning, and then see if it might possible to shift the request or consumption over to another resource instead.

Load Balancing Difficulties And Challenges

Being a load balancer can be a tricky task.

Suppose the kids had decided that they would keep one of stove top burners in reserve and not use it unless it was absolutely necessary. In that case, they might have opted to use the three other burners in a manner of allocating two for the deep heating and one for the warming up. All during this time, the other fourth burner would remain unused, being held in reserve. Is that a good idea?

It depends. I’d bet that the cooking with just the three burners would have stretched out the time required to cook the dinner. I can imagine that someone waiting to eat the dinner might become disturbed if they saw that there was a fourth burner that could be used for cooking, and yet it was not, and the implication being that the hungry person had to wait longer to eat the dinner. This person might go ballistic that a resource sat unused for that entire time. What a waste of a resource, it would seem to that person.

Imagine further if at the start of the cooking process we were to agree that there should be an idle back-up for each of the stove burners being used. In other words, since we only have four, we might say that two of the burners will be active and the other two are the respective back-up for each of them. Let’s number the burners as 1, 2, 3, and 4. We might decide that burner 1 will be active and it’s back-up is burner 2, and burner 3 will be active and its back-up is burner 4.

While the cooking is taking place, we won’t place anything onto the burners 2 and 4, until or if a primary of the burners 1 or burner 3 goes out. We might decide to keep the back-up burners entirely turned-off, in which case as a back-up they would be starting at a cold condition if we needed to suddenly switch over to one of them. We might instead agree that we’ll go ahead and put the two back-ups onto a low-heat position, without actually heating anything per se, and generally be ready then to rapidly go to high-heat if they are needed in their back-up failover mode.

I had just now said that burner 2 would be the back-up for primary burner 1. Suppose I adhered to that aspect and would not budge. If burner 3 went suddenly out and I reverted to using burner 4 as the back-up, but then somehow burner 4 went out, should I go ahead and use burner 2 at that juncture? If I was insistent that burner 2 would only and always be a back-up exclusively for burner 1, presumably I would want the load balancer to refuse to now use burner 2, even though burners 3 and 4 are kaput. Maybe that’s a good idea, maybe not.

These are the kinds of considerations that go into establishing an appropriate load balancer. You need to try and decide what the rules are for the load balancer. Different circumstances will dictate different aspects of how you want the load balancer to do its thing. Furthermore, you might not just setup the load balancer entirely in-advance, such that it is acting in a static manner during the load balancing, but instead might have the load balancer figuring out what action to take dynamically, in real-time.

When using load balancing for resiliency or redundancy purposes, there is a standard nomenclature of referring to the number of resources as N, and then appending a plus sign along with an integer value that ranges from 0 to some number M. If I say that my system is setup as N+0, I’m saying that there are zero or no redundancy devices. If I say it is N+1, then that implies there is 1 and only 1 such redundancy device. And so on.

You might be thinking that I should always have a plentiful set of redundancy devices, since that would seem the safest bet. But, there’s a cost associated with the redundancy. Why was my stove top limited to just four burners? Because I wasn’t willing to shell out the bigger bucks to get the model that had eight. I had assumed that for my cooking needs, the four sized stove was sufficient, and actually ample.

For computer systems, the same kind of consideration needs to come to play.

How many devices do I need and how much redundancy do I need, which has to be considered in light of the costs involved. This can be a significant decision in that later on it can be harder and even costlier to adjust. In the case of my stove top, the kitchen was built in such a manner that the four burner sized stove top fits just right. If I were to now decide that I want the eight burner version, it’s not just a simple plug-and-play, instead they would need to knock out my kitchen counters, and likely some of the flooring, and so on. The choice I made at the start has somewhat locked me in, though of course if I want to have the kids doing cooking more of the time, it might be worth the dough to expand the kitchen accordingly.

In computing, you can consider load balancing for just about anything. It might be the CPU processors that underlie your system. It could be the GPUs. It could be the servers. You can load balance on an actual hardware basis, and you can also do load balancing on a virtualized system. The target resource is often referred to as an endpoint, or perhaps a replica, or a device, or some other such wording.

Those in computing that don’t explicitly consider the matter of load balancing are either unaware of the significance of it or are unsure of what it can achieve.

For many AI software developers, they figure that it’s really a hardware issue or maybe an operating system issue, and thus they don’t put much of their own attention toward the topic. Instead, they hope or assume that those OS specialists or hardware experts have done whatever is required to figure out any needed load balancing.

Similar to my example about my four burner stove, the problem with this kind of thinking is that if later on the AI application is not running at a suitable performance level and all of a sudden you want to do something about load balancing, the horse is already out of the barn. Just like my notion of possibly replacing the four burner stove with an eight burner, it can take a lot of effort and cost to retrofit for load balancing.

AI Autonomous Cars And Load Balancing The On-Board Systems

What does this have to do with AI self-driving driverless autonomous cars?

At the Cybernetic AI Self-Driving Car Institute, we are developing AI systems for self-driving cars. One key aspect of an AI system for a self-driving car is its ability to perform responsively in real-time.

On-board of the self-driving car you have numerous processors that are intended to run the AI software. This can also include various GPUs and other specialized devices.

Per my overall framework of AI self-driving cars, here are some the key driving tasks involved:

  •         Sensor data collection and interpretation
  •         Sensor fusion
  •         Virtual world model updating
  •         AI action planning
  •         Car controls command issuance

For my framework, see:

For my article about real-time performance aspects, see:

For aspects about AI developers, see my article:

For the dangers of Groupthink, see my article:

You’ve got software that needs to run in real-time and direct the activities of a car. The car will at times be in motion. There will be circumstances wherein the AI is relatively at ease and there’s not much happening, and there will be situations whereby the AI is having to work at a rip-roaring pace. Imagine going on a freeway at 75 miles per hour, and there’s lots of other nearby traffic, along with foul weather, the road itself has potholes, there’s debris on the roadway, and so on. A lot of things, all happening at once.

The AI holds in its automation the key to whether the self-driving car safely navigates and avoids getting into a car accident. This is not just a real-time system, it is a real-time system that can spell life and death. Human occupants in the AI self-driving car can get harmed if the AI can’t operate in time to make the proper decision. Pedestrians can get harmed. Other cars can get hit, and thus the human occupants of those cars can get harmed. All in all, this is quite serious business.

To achieve this, the on-board hardware generally has lots of computing power and lots of redundancy.

Is it enough? That’s the zillion dollar question. Similar to my choice of a four burner stove, when the automotive engineers for the auto maker or tech firm decide to outfit the self-driving car with whatever number and type of processors and other such devices, they are making some hard choices about what the performance capability of that self-driving car will be. If the AI cannot run fast enough to make sound choices, it’s a bad situation all around.

Imagine too that you are fielding your self-driving car. It seems to be running fine in the roadway trials underway. You give the green light to ramp up production of the self-driving car. These self-driving cars start to roll off the assembly line and the public at large is buying them.

Suppose after this has taken place for a while, you begin to get reports that there are times that the AI seemed to not perform in time. Maybe it even froze up. Not good.

Some self-driving car pundits say that it’s easy to solve this. Via OTA (Over The Air) updates, you just beam down into the self-driving cars a patch for whatever issue or flaw there was in the AI software. I’ve mentioned many times that the use of OTA is handy, important, and significant, but it is not a cure all.

Let’s suppose that the AI software has no bugs or errors in this case. Instead, it’s that the AI running via the on-board processors is exhausting the computing power at certain times. Maybe this only happens once in a blue moon, but if you are depending upon your life and the life of others, even a once in a blue moon is too much of a problem. It could be that the computing power is just insufficient.

What do you do then? Yes, you can try to optimize the AI and get it to somehow not consume so much computing power. This though is harder than it seems. If you opt to toss more hardware at this problem, sure, that’s’ possible, but now this means that all of those AI self-driving cars that you sold will need to come back into the auto shop and get added hardware. Costly. Logistically arduous. A mess.

For my article about the freezing robot problem and AI self-driving cars, see:

For my article about bugs and errors in AI self-driving cars, see:

For my article about automobile recalls and AI self-driving cars, see:

For product liability claims against AI self-driving cars, see my article:

Dangers Of Silos Among Autonomous Car Components

Some auto makers and tech firms find themselves confronting the classic silo mentality of the software side and the hardware side of their development groups. The software side developing the AI is not so concerned about the details of the hardware and just expect that their AI will run in proper time. The hardware side puts in place as much computing power as it seems can be suitably provided, depending on cost considerations, physical space considerations, etc.

If there is little or no load balancing that comes to play, in terms of making sure that both the software and hardware teams come together on how to load balance, it’s a recipe for disaster.

Some might say that all they need to know is how much raw speed is needed, whether it is MIPS (millions of instructions per second), FLOPS (floating point operations per second), TPU’s (tensor processing units), or other such metrics. This though doesn’t fully answer the performance question. The AI software side often doesn’t really know what kind of performance resources they’ll need per se.

You can try to simulate the AI software to gauge how much performance it will require. You can create benchmarks. There are all sorts of “lab” kinds of ways to gauge usage. Once you’ve got AI self-driving cars in the field for trials, you should also be pulling stats about performance. Indeed, it’s quite important that their be on-board monitoring to see how the AI and the hardware are performing.

For my article about simulations and AI self-driving cars, see:

For my article about benchmarks and AI self-driving cars, see:

For my article about AI self-driving cars involved in accidents, see:

With proper load balancing on-board the self-driving car, the load balancer is trying to keep the AI from getting starved, it is trying to ensure that the AI runs undisrupted by whatever might be happening at the hardware level. The load balance is monitoring the devices involved. When saturation approaches, this can be potentially handled via static or dynamic balancing, and thus the load balancer needs to come to play.

If an on-board device goes sour, the load balancer hopefully has a means to deal with the loss. Whether it’s redundancy or whether it is shifting over to have another device now do double-duty, you’ve got to have a load balancer on-board to deal with those moments. And do so in real-time. While the self-driving car is possibly in motion, on a crowded freeway, etc.

Fail-Safe Aspects To Keep In Mind

Believe it or not, I’ve had some AI developers say to me that it is ridiculous to think that any of the on-board hardware devices are going to just up and quit. They cannot fathom any reason for this to occur.

I point out that the on-board devices are all prone to the same kinds of hardware failures as any piece of hardware. There’s nothing magical about being included into a self-driving car. There will be “bad” devices that will go out much sooner than their life expectancy. There will be devices that will go out due to some kind of in-car issue that arises, maybe overheating or maybe somehow a human occupant manages to bust it up. There are bound to be recalls on some of that hardware.

Also, I’ve seen some of them deluded by the aspect that during the initial trials of self-driving cars, the auto maker or tech firm is pampering the AI self-driving car. After each journey or maybe at the end of the day, the tech team involved in the trials are testing to make sure that all of the hardware is still in pristine shape. They swap out equipment as needed. They act like a race car team, continually tuning and making sure that everything on-board is in top shape. There’s nearly an unlimited budget of sorts during these trials in that the view is do whatever it takes to keep the AI self-driving car running.

This is not what’s going to happen once the real-world occurs.

When those self-driving cars are being used by the average Joe or Samatha, they will not have a trained team of self-driving car specialists at the ready to tweak and replace whatever might need to be replaced. The equipment will age. It will suffer normal wear and tear. It will even be taxed beyond normal wear and tear since it is anticipated that AI self-driving cars will be running perhaps 24×7, nearly non-stop.

For my article about non-stop AI self-driving cars, see:

For repairs of AI self-driving cars, see my article:


For those auto makers and tech firms that are giving short shrift right now to the importance of load balancing, I hope that this might be a wake-up call.

It’s not going to do anyone any good, neither the public and nor the makers of AI self-driving cars, if it turns out that the AI is unable to get the performance it needs out of the on-board devices.

A load balancer is not a silver bullet, but it at least provides the kind of added layer of protection that you’d expect for any solidly devised real-time system. Presumably, there aren’t any auto makers or tech firms that opted to go with the four burner stove when an eight burner stove was needed.

Copyright 2019 Dr. Lance Eliot

This content is originally posted on AI Trends.