MAgent2 integration #135

JoseLuisC99 · 2024-10-02T16:46:21Z

Hi! I'm trying to integrate BenchMARL with the MAgent2 environment, but I've encountered some problems. The main issue is that when I execute experiment.run(), I get the following error:

File [~/miniconda3/envs/marl/lib/python3.11/site-packages/benchmarl/experiment/logger.py:134] in Logger.log_collection(self, batch, task, total_frames, step)
    132 to_log.update(task.log_info(batch))
    133 # print(json_metrics.items())
--> 134 mean_group_return = torch.stack(
    135     [value for key, value in json_metrics.items()], dim=0
    136 ).mean(0)
    137 if mean_group_return.numel() > 0:
    138     to_log.update(
    139         {
    140             "collection[/reward/episode_reward_min](/reward/episode_reward_min)": mean_group_return.min().item(),
   (...)
    143         }
    144     )

RuntimeError: stack expects each tensor to be equal size, but got [1] at entry 0 and [16] at entry 1

While I was trying to fix it, I found that the error I'm encountering is likely due to BenchMARL's expectation of equal "done" signals for both adversary groups. This means BenchMARL doesn't currently support scenarios where the number of agents that die in an epoch differs between groups.

I'd like to know if there is a solution or wrapper that can cover this case, or if there is a way to add it.

The text was updated successfully, but these errors were encountered:

JoseLuisC99 · 2024-10-02T16:49:49Z

This issue may be related to feature two in #94 (support for variable number of agents).

matteobettini · 2024-10-03T14:25:59Z

Hey! Thanks for opening this and I think we can definitely fix this issue. The code in the loggers can defineitely be more flexible and it is about time I seriously improved it.

To understand the issue better I need more context. In particular:

Do you have a variable number of agents (dying agents)?

Benchmarl does not currently support this, but it supports some agent being done before others. In the case of some agents being done before others, the rollout countiunes until the global done is set (in pettingZoo this can be computed with any or all on the agent dones). If some agents are done and the rollout continues these agents will be required to continue acting but you can either ignore thier action in the env and give a reward of 0 or mask their actions such that only the no-op action remains available (like in SMAC) (and stll give reward of 0).

If you could provide more details on your env, I'll be able to help.

In general, I will already start making a PR that makes the code snippet you got stuck on more flexible as I think there is a lot i can already improve there.

JoseLuisC99 · 2024-10-03T17:02:49Z

In this environment, some agents finish before others (when they lose all their health points). Once an agent is done, a truncation or termination flag is set and returned every iteration until the episode ends. This effectively ignores actions from finished agents, as there is no no-op action available.

This causes a problem when logging. The logger sets the done flag using experiment.logger._get_done(group, batch), and then calculates the mean episode reward per group with episode_reward.mean(-2)[done.any(-2)]. Since some groups have more agents that finished earlier, the tensors end up with different sizes, causing an error because torch.stack requires tensors to have the same shape.

matteobettini · 2024-10-03T17:07:02Z

Very clear got it. I will fix this in #136

matteobettini · 2024-10-03T17:08:30Z

If you want to try the PR as of now it should work well in collection logging. (you can try without evaluation)

I still have to fix evaluation logging.

matteobettini · 2024-10-04T09:51:21Z

Thanks for bearing with me.

PR is ready and its description gives an overview of how things are computed now.

Let me know if this fixed your issue

matteobettini · 2024-10-04T15:53:28Z

Btw if once you finish you would like to contribute MAgent2 to BenchMARL we would love that!

Otherwise, could you share your implementation? I might consider adding its wrapper in the future

JoseLuisC99 · 2024-10-04T16:32:15Z

Sure, my intention is definitely to contribute. However, I still need to investigate some performance issues within MAgent2. Once I've finished training some models, I'd be happy to share my contributions.

matteobettini mentioned this issue Oct 3, 2024

[BugFix] More flexible episode_reward computation in logger #136

Merged

matteobettini closed this as completed in #136 Oct 4, 2024

JoseLuisC99 mentioned this issue Oct 8, 2024

[Environment] MAgent2 #137

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAgent2 integration #135

MAgent2 integration #135

JoseLuisC99 commented Oct 2, 2024

JoseLuisC99 commented Oct 2, 2024

matteobettini commented Oct 3, 2024 •

edited

Loading

JoseLuisC99 commented Oct 3, 2024

matteobettini commented Oct 3, 2024

matteobettini commented Oct 3, 2024

matteobettini commented Oct 4, 2024 •

edited

Loading

matteobettini commented Oct 4, 2024 •

edited

Loading

JoseLuisC99 commented Oct 4, 2024

MAgent2 integration #135

MAgent2 integration #135

Comments

JoseLuisC99 commented Oct 2, 2024

JoseLuisC99 commented Oct 2, 2024

matteobettini commented Oct 3, 2024 • edited Loading

JoseLuisC99 commented Oct 3, 2024

matteobettini commented Oct 3, 2024

matteobettini commented Oct 3, 2024

matteobettini commented Oct 4, 2024 • edited Loading

matteobettini commented Oct 4, 2024 • edited Loading

JoseLuisC99 commented Oct 4, 2024

matteobettini commented Oct 3, 2024 •

edited

Loading

matteobettini commented Oct 4, 2024 •

edited

Loading

matteobettini commented Oct 4, 2024 •

edited

Loading