Request for Example of AEC API Usage with Agent Masking in Petting Zoo #76

wmn7 · 2024-04-16T21:59:47Z

I've been exploring the BenchMARL library and am impressed with its capabilities and design—great work!

I am currently interested in implementing a multi-agent reinforcement learning scenario using the AEC (Agent-Environment Cycle) API in petting zoo, particularly for environments that require sequential turn-based actions like in a Chess game. In this context, I need to apply masking at the agent level rather than action masking.

Could you provide an example or guidance on how to adapt the AEC API for such a use case? Any examples of AEC API usage with agent masking in a Chess-like environment would be incredibly helpful.

Thank you for your assistance and for the excellent work on BenchMARL.

matteobettini · 2024-04-20T14:28:21Z

Hello! Thanks for the nice feedback!

BenchMARL does not currently support the AEC turn-based enviornments, but it is something we have on our TODO list! (this is also because they are already available in torchrl) (and I will also make a tutorial in the future on how to train them in torchrl)

If you could convert your AEC env to a Parallel one using the PettingZoo conversion wrapper https://pettingzoo.farama.org/api/wrappers/pz_wrappers/#module-pettingzoo.utils.conversions that would be a naive workaround, but I understand that this is not always possible.

I'll pin this issue and update it when they will become directly compatible.

wmn7 · 2024-04-24T14:16:47Z

Hello,

Thank you for the information. I've reviewed the implementation of agent masking in the torchrl's PettingZoo wrapper. It seems that setting the state and reward to zero is a straightforward approach to agent masking.

https://github.com/pytorch/rl/blob/6f1c38765f85389f75e259575163fff972173f07/torchrl/envs/libs/pettingzoo.py#L619-L638

However, when I add the agent mask in the env, then all agents are outputting identical actions. I think an example demonstrating proper agent masking with multiagents methods (such as mappo, qmix) would indeed be beneficial.

Best regards.

matteobettini · 2024-04-24T15:45:08Z

Hello,

What do you mean by all the agents are outputting identical actions? Are you trying to train the environment in BenchMARL or TorchRL?

Yes! A tutorial on training those envs in TorchRL is needed and we are looking into it as it is not straightforward.

A further example of using the turn based env can be like this one from the tests https://github.com/pytorch/rl/blob/6f1c38765f85389f75e259575163fff972173f07/test/test_libs.py#L3211 where we play tic tac toe.

But we have no training example for now.

wmn7 · 2024-04-25T03:14:19Z

Hello,

I used torchrl for training, and I found that the reward curve converged during training, as shown in the figure below:

But during testing, I found that the agents chose the same action.

For example, now the agent is a discrete action (Discrete(2)), and all actions 0 are selected during testing. I found that this is related to ExplorationType. During training, it is ExplorationType.RANDOM, but during testing, it is ExplorationType.MODE.

I'm not sure whether this is a problem with my environment or an algorithm problem, so I hope there is an example of agent mask.

wmn7 · 2024-04-25T15:25:53Z

Hello,

I've identified the issue: the state representation was not accurately reflecting the environment. After modifying the environment configuration, the problem was resolved.

However, I still require an example of how to implement an agent mask. In my simulation, which involves managing Connected Autonomous Vehicles (CAVs), there are instances when these vehicles exit the road network. I am considering applying an agent mask to handle these exiting vehicles. Given that the count of CAVs in my environment changes frequently, I'm uncertain whether this method is the best solution.

Lastly, I want to express my gratitude for your efforts. Your framework appears to be more debug-friendly compared to RLlib, which is greatly appreciated.

Thank you!

matteobettini · 2024-04-25T16:53:46Z

However, I still require an example of how to implement an agent mask. In my simulation, which involves managing Connected Autonomous Vehicles (CAVs), there are instances when these vehicles exit the road network. I am considering applying an agent mask to handle these exiting vehicles. Given that the count of CAVs in my environment changes frequently, I'm uncertain whether this method is the best solution.

A simple solution would be to just respawn the out of bounds veichles in a reset state

Alternatively, if you use a mask, you need to return it as part of your step output.
Then, after collection, apply it to the data and feed it to the loss after having filtered out the invalid transitions.

This is currently not done in benchmarl and I know we need a tutorial for this in torchrl. As soon as I will get some time I will work on it.

Lastly, I want to express my gratitude for your efforts. Your framework appears to be more debug-friendly compared to RLlib, which is greatly appreciated.

This is the best compliment you could make :) Thanks

matteobettini pinned this issue Apr 20, 2024

matteobettini mentioned this issue Jun 12, 2024

[DO NOT CLOSE] Library TODOs and call for contributions #94

Open

13 tasks

matteobettini unpinned this issue Jun 12, 2024

matteobettini mentioned this issue Oct 10, 2024

[Feature Request] Support for Integrating Custom PettingZoo Environments with TorchRL's PettingZooEnv pytorch/rl#2461

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for Example of AEC API Usage with Agent Masking in Petting Zoo #76

Request for Example of AEC API Usage with Agent Masking in Petting Zoo #76

wmn7 commented Apr 16, 2024

matteobettini commented Apr 20, 2024

wmn7 commented Apr 24, 2024 •

edited

Loading

matteobettini commented Apr 24, 2024

wmn7 commented Apr 25, 2024

wmn7 commented Apr 25, 2024

matteobettini commented Apr 25, 2024

Request for Example of AEC API Usage with Agent Masking in Petting Zoo #76

Request for Example of AEC API Usage with Agent Masking in Petting Zoo #76

Comments

wmn7 commented Apr 16, 2024

matteobettini commented Apr 20, 2024

wmn7 commented Apr 24, 2024 • edited Loading

matteobettini commented Apr 24, 2024

wmn7 commented Apr 25, 2024

wmn7 commented Apr 25, 2024

matteobettini commented Apr 25, 2024

wmn7 commented Apr 24, 2024 •

edited

Loading