Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Example of AEC API Usage with Agent Masking in Petting Zoo #76

Open
wmn7 opened this issue Apr 16, 2024 · 6 comments
Open

Comments

@wmn7
Copy link

wmn7 commented Apr 16, 2024

I've been exploring the BenchMARL library and am impressed with its capabilities and design—great work!

I am currently interested in implementing a multi-agent reinforcement learning scenario using the AEC (Agent-Environment Cycle) API in petting zoo, particularly for environments that require sequential turn-based actions like in a Chess game. In this context, I need to apply masking at the agent level rather than action masking.

Could you provide an example or guidance on how to adapt the AEC API for such a use case? Any examples of AEC API usage with agent masking in a Chess-like environment would be incredibly helpful.

Thank you for your assistance and for the excellent work on BenchMARL.

@matteobettini
Copy link
Collaborator

Hello! Thanks for the nice feedback!

BenchMARL does not currently support the AEC turn-based enviornments, but it is something we have on our TODO list! (this is also because they are already available in torchrl) (and I will also make a tutorial in the future on how to train them in torchrl)

If you could convert your AEC env to a Parallel one using the PettingZoo conversion wrapper https://pettingzoo.farama.org/api/wrappers/pz_wrappers/#module-pettingzoo.utils.conversions that would be a naive workaround, but I understand that this is not always possible.

I'll pin this issue and update it when they will become directly compatible.

@matteobettini matteobettini pinned this issue Apr 20, 2024
@wmn7
Copy link
Author

wmn7 commented Apr 24, 2024

Hello,

Thank you for the information. I've reviewed the implementation of agent masking in the torchrl's PettingZoo wrapper. It seems that setting the state and reward to zero is a straightforward approach to agent masking.

https://github.com/pytorch/rl/blob/6f1c38765f85389f75e259575163fff972173f07/torchrl/envs/libs/pettingzoo.py#L619-L638

However, when I add the agent mask in the env, then all agents are outputting identical actions. I think an example demonstrating proper agent masking with multiagents methods (such as mappo, qmix) would indeed be beneficial.

Best regards.

@matteobettini
Copy link
Collaborator

Hello,

What do you mean by all the agents are outputting identical actions? Are you trying to train the environment in BenchMARL or TorchRL?

Yes! A tutorial on training those envs in TorchRL is needed and we are looking into it as it is not straightforward.

A further example of using the turn based env can be like this one from the tests https://github.com/pytorch/rl/blob/6f1c38765f85389f75e259575163fff972173f07/test/test_libs.py#L3211 where we play tic tac toe.

But we have no training example for now.

@wmn7
Copy link
Author

wmn7 commented Apr 25, 2024

Hello,

I used torchrl for training, and I found that the reward curve converged during training, as shown in the figure below:

2024-04-25_11-09

But during testing, I found that the agents chose the same action.

For example, now the agent is a discrete action (Discrete(2)), and all actions 0 are selected during testing. I found that this is related to ExplorationType. During training, it is ExplorationType.RANDOM, but during testing, it is ExplorationType.MODE.

I'm not sure whether this is a problem with my environment or an algorithm problem, so I hope there is an example of agent mask.

@wmn7
Copy link
Author

wmn7 commented Apr 25, 2024

Hello,

I've identified the issue: the state representation was not accurately reflecting the environment. After modifying the environment configuration, the problem was resolved.

However, I still require an example of how to implement an agent mask. In my simulation, which involves managing Connected Autonomous Vehicles (CAVs), there are instances when these vehicles exit the road network. I am considering applying an agent mask to handle these exiting vehicles. Given that the count of CAVs in my environment changes frequently, I'm uncertain whether this method is the best solution.

Lastly, I want to express my gratitude for your efforts. Your framework appears to be more debug-friendly compared to RLlib, which is greatly appreciated.

Thank you!

@matteobettini
Copy link
Collaborator

However, I still require an example of how to implement an agent mask. In my simulation, which involves managing Connected Autonomous Vehicles (CAVs), there are instances when these vehicles exit the road network. I am considering applying an agent mask to handle these exiting vehicles. Given that the count of CAVs in my environment changes frequently, I'm uncertain whether this method is the best solution.

A simple solution would be to just respawn the out of bounds veichles in a reset state

Alternatively, if you use a mask, you need to return it as part of your step output.
Then, after collection, apply it to the data and feed it to the loss after having filtered out the invalid transitions.

This is currently not done in benchmarl and I know we need a tutorial for this in torchrl. As soon as I will get some time I will work on it.

Lastly, I want to express my gratitude for your efforts. Your framework appears to be more debug-friendly compared to RLlib, which is greatly appreciated.

This is the best compliment you could make :) Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants