Should We Aim For Human-AI Coordination Instead Of Human-AI Confrontation?
Should we aim for Human-AI coordination instead of Human-AI confrontation?
Originally published in Rebellion Research Website
There’s no longer a need to demonstrate how AI will impact the future and every aspect of our lives and industries. However, for most people, AI is characterized by programs that are supposed to oppose us because of their eventual intellectual superiority. The thing is that AI is (and will be) mostly used as a decision-helping tool, not as something to fight against.
Therefore, Human-AI and more generally multi-agent coordination will be at the center of our interactions with machines in the future.
The main challenge is to be able to interpret the actions and intentions of other agents in the environment and find an optimal policy to make the best decisions.
The main difference here is that previously the environment could be considered almost static and now, it can be changed by other intelligent agents.
Recent related research has been centered around the fact of generalizing as much as possible agents’ policies in order to avoid seeing arbitrary and meaningless conventions appear. An arbitrary convention can be as simple as selecting a specific action that doesn’t make sense to reach a target. During training, the agent can create a correlation between an action and its reward when in reality this action didn’t impact the reward.
The problem with arbitrary conventions is that when paired with other agents, the first agent would rely on those conventions to reach its target resulting in poor performance.
To reach this kind of ability and perception for intelligent agents, one of the works that paved the way for current research is the machine theory of mind. The original theory of mind is actually applied to humans and refers to our ability to represent and abstract the mental states of other humans. This includes intentions, desires, and beliefs.
For example, imagine yourself riding a bike. If you come by a bus stop and see someone waving then you might assume that there is a bus behind you. Yet you haven’t seen the bus directly, you just interpreted the other person’s actions. This is what we want to reproduce in our artificial agents and reinforcement learning happens to provide the best framework for it.
The work done on the machine theory of mind was able to improve decision-making in complex multi-agent tasks.
It was also shown that meta-learning could be used to furnish an agent with the ability to build flexible and sample-efficient models of others.
At some point in the future, with the increasing complexity of agents, we could imagine AIs to be closer to being able to coordinate efficiently with Humans based on this theory.
Another interesting aspect of this work is the motivation to use this theory to explore our human abilities and make agents more explainable for humans and hence bring transparency to our models and their decision process.
In this respect, the pursuit of a machine theory of mind is about building the missing interface between machines and human expectations.
The long-term goal of artificial intelligence is often defined as the ability to solve advanced real-world challenges. Furthermore, a number of companies such as DeepMind have been focused on “solving
intelligence” mostly through games and unresolved scientific challenges. Among these, we can cite AlphaFold for protein folding or the recent breakthrough in nuclear fusion.
However, games were often used as a referential metric for AI’s ability since the first confrontation in Backgammon in 1992. More recently, in 2016, AlphaGo managed to defeat the world champion at Go using deep reinforcement learning and Monte Carlo tree search. The same methods were used after that on other video games such as Minecraft or Starcraft. In contrast with classic board games, modern video games simulate a more complex environment and open the freedom to more complex player behavior creating a true challenge for AI.
One of the most impressive recent experiences was the construction of a Dota 2 AI.
Dota 2’s rules are complex and imply a high level of coordination and strategy to perform at the highest level — the game has been actively developed for over a decade, with game logic implemented in hundreds of thousands of lines of code.
It was not only able to beat the actual world champions but also create new strategies that were either underestimated or not considered by pro players. This was a blatant case where AI discovered and helped humans to better their gaming abilities.
One way to train and coordinate different agents is the Single Play (SP) method. The idea is to train an agent with duplicates of itself and make all the agents work in coordination.
This method has a major drawback which was its fragility to change. When paired with other agents it performs poorly because it focuses on arbitrary policies built with itself.
In order to avoid that, current literature is focused on generalizing agents as much as possible and preventing agents from creating any arbitrary conventions at all.
One of the first attempts to achieve Zero-Shot Coordination is called the Other-Play (OP) by Jakob Foerster. The idea was to train an agent with itself, just like in SP, however, the version of itself used would have gone through symmetric transformations adding more diversity to the encountered scenarios and preventing arbitrary conventions.
In this example each circle represents a lever. It’s a two player game, each player has the same set of levers and the goal is to pick the same lever as the other player without communicating with them. If both players select the same lever they win the number of points written over the lever. Since all 1.0
levers are the same, it would be too risky for both players to select the same 1.0 lever. The optimal strategy would be to pick the 0.9 since it’s not only maximize the number of points but it’s distinguishable from the other and minimize the chance for both players to pick a different lever and lose.
A Single Play training would mean that an agent would play this game with a version of itself. It would mean that at some point the agent could find an arbitrary lever at 1.0 to stay on it since it would be greater than previous random picks at 1.0. From the point of view of the Agent the 1.0 payoff levers have no labels and so are symmetric. Since agents cannot coordinate on how to break symmetries, picking one of the 1.0 levers leads to 0.11 expected return.
The Other Play method tackles this arbitrary convention issue by integrating symmetry transformation.
In this problem, it would mean that the agent would train against transformed versions of itself, and hence avoid creating arbitrary convention. For example, if an agent plays with a transformed version of itself and finds that selecting the third 1.0 lever is an efficient strategy, when paired with a newly transformed agent, it will perform poorly and try to find a better policy. By contrast, OP suggests the choice of the 0.9 lever.
The OP method has shown to be very efficient, especially in the Hanabi challenge . The problem with OP is that symmetries need to be known before training the agent and when it comes to high dimension problems, finding all the symmetries is challenging and not always feasible.
Agent communication is also a key part of this field and work has already been realized both in free communication and in costly communication between agents.
Those elements also play an important part in AI research since agents will be needing to communicate before starting to coordinate in order to look for similarities in further work.
More recently, the research provided a more general framework in order to achieve label-free coordination (LFC) which is a generalized setting for ZSC. They managed to express OP in this framework and even explored a new potential solution based on breaking ties with optimal policies. The principal motivation behind this paper remains to find ways to prevent arbitrary conventions.
Understanding how multi-agent coordination works could be decisive in the future since most of our system will be built around this idea. We could very well imagine some AIs analyzing our performances in a sport or in a videogame and being able to assist us in an optimal way to progress. Even though adversity has played an important place in the AI research landscape, coordination must not be forgotten since it might be the key to fully explainable AI and algorithm transparency.