Ada:
Hi, it’s Ada Lovelace here again! You probably know me by now. As the developer of the first ever computer program and the mathematician who lent her name to the ADA Lovelace Center for Analytics, Data and Applications, I’ve been chatting to experts involved in this project about the methods and applications of artificial intelligence. Today, I’m delighted to be talking to Chris again. We already benefited from his insights into the complex world of machine learning in a previous discussion when he introduced us to the topic of sequence-based learning. Today, we’re here to talk about experience-based learning. Could you tell us more about this approach?
Chris:
Experience-based learning is a set of methods that enable a system to optimize itself by autonomously interacting with the environment and evaluating the feedback it gets, or by dynamically adapting itself to changing environmental conditions. You might say that the system “learns from experience.” A good example is the automated generation of models that can be used to evaluate and optimize business processes, transport sequences and robot control systems in industrial manufacturing.
Ada:
It sounds fascinating! I’d like to know more about how it works. For example, what kind of data can be used in these learning methods?
Chris:
Well, that’s where the distinctions start to get clearer. For example, statistical data sets don’t tend to be used at all in this context. Unlike other machine learning paradigms that require the presence of labeled and unlabeled data, experience-based learning involves an agent that perform actions in an environment and attempts to learn from the resulting feedback. The agent tries to maximize a reward signal it receives for desirable outcomes while at the same time attempting to explore the world in which it operates to find as yet unknown and potentially more rewarding action sequences. So it’s more about linking experiences gained through interaction with the environment to existing models or learning these models along the way. This creates all sorts of challenges because the AI system interacts directly with the environment, not only gathering data, but also actively influencing this environment. This marks a fundamental difference from other AI methods, since it’s the job of AI algorithms themselves to collect the right data during the learning process to ensure that learning continues and moves in the right direction.
Ada:
It’s interesting to think that a machine can learn! So, what we’re dealing with here is a special AI method that doesn’t require any particular data. But are there any guidelines on how much data is required or what quality the data should be that goes into the model?
Chris:
Yes, but the relevant questions here are slightly different. Basically, there are two key questions: first, how frequently do I need to interact with the environment? Or, put another way, how long does it take me to learn how to behave? And, second, what is the right way for me to react to the environment? Let me give you a brief example. Let’s say we want my robot lawn mower not only to mow the lawn, but also to learn to fertilize my garden automatically as it mows. We can tell whether each area has been properly fertilized by seeing whether things are growing well in that spot, because over- or under-fertilization will lead to inadequate plant growth. To do this job, the robot moves around the lawn mowing and fertilizing. The whole time, the robot is keeping track of how much fertilizer it applies and roughly how much grass it cuts in each spot. The key now is to come up with a formal definition – that is, effectively a mathematical definition – of what you are actually trying to do. That’s what we call the reward function. This function might be, for example, that you want to use as little fertilizer as possible and that everything should grow at an approximately even rate of around three centimeters a week, and so on. So, you dispatch the robot and let it interact with its environment, mowing and fertilizing the lawn. In this way, it gains indirect experience of the soil conditions and thus attempts to approximate the right degree of fertilization. This example nicely illustrates the particular challenge of experience-based learning: it essentially comes down to what we call the credit assignment problem, in other words the fact that my actions do not lead to direct feedback; the feedback only comes much, much later, which makes it difficult to understand the causal relationships. For example, fertilizer takes a certain amount of time to really get to the roots and actually affect plant growth. We can see the same kind of concept in chess. For example, one player might make a good move that will inevitably lead to their opponent losing their queen. But the actual loss of the queen may not happen until a few moves later – and the AI somehow has to extract those correlations. Depending on the method used, there are obviously myriad possibilities: for example, model-based methods in which I can introduce my knowledge of the world – such as knowledge of soil conditions and infiltration or even different lawn types – and entirely model-free methods, where I essentially start with a blank page and simply proceed via my reward function. Both options have their pros and cons.
Ada:
I see. So there are different ways to approach experience-based learning. And what areas or industries can these methods be applied to?
Chris:
Essentially, these methods can be used in any situation where you need to continuously monitor a condition and make a decision. Examples include driver assistance systems and autonomous driving, locomotion and movement control in robotics, evaluation of the human gait in gait analysis, trading, drug dosing and drone flight. Games such as Go and chess are also very well-known examples. But this situation also crops up in energy and building management, recommender systems – which are very much based on experience – as well as in the stabilization of quantum computers and the optimization of neural networks.
Ada:
There are clearly a lot of different applications! Based on my experience and conversations I had with a colleague, I get the impression that this method is open enough to combine with other AI methods. Which other areas of expertise at the ADA Lovelace Center can be combined with experience-based learning? And is that already happening in some areas?
Chris:
The basic method we use in experience-based learning is called reinforcement learning, and that’s obviously an approach that is often linked to neural networks. In the same way, there are also links to sequence-based learning – for example, when it comes to building temporal models – because here, too, we’re acting in a temporal sequence. At the same time, those of us who specialize in experience-based learning also need the knowledge and expertise generated by other disciplines and domains at the ADA Lovelace Center in order to get our work off the ground in the first place.
Ada:
Thanks so much for those fascinating insights! I’m really starting to appreciate that one of the things that makes the ADA Lovelace Center so special is how it connects a variety of AI methods to real-world use cases. I’m looking forward to learning even more about the project’s other domains and areas of application. We still have a lot to learn about automatic learning, semantics and various other aspects, so make sure to tune in again! Bye for now, and take care!