-
-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] Observation does not get updated in Atari multi-player games after the first player's action. #1204
Comments
Hi, I haven't used the Atari envs so I'm not 100% sure this explanation is correct, but I think this is working as intended. Note that When you change this to agent_1, you get the assertion you expect because you are comparing what agent_1 saw (before all agents acted) to what agent_0 saw (after all agents acted, which triggered an observation update). Those should be different. Alternatively, you can put |
Hi, I'm using a reinforcement learning library(Tianshou) that only supports AEC environments in multi-agent settings. In the case of Atari games, this would result in collecting (state, action, reward, next_state) where the state is the same as next_state for the first agent. Is there a way to have an AEC Atari environment that updates observation and reward after each agent's action? Or is Atari originally parallel and I need to find another library or modify the library/algorithm I'm using the get the desired result? |
I'm sorry, I don't have a good answer to your questions. The little exposure i have to Tianshou with multi-agent didn't work well. I haven't looked at it since. As far as I know all the Atari envs are originally parallel and wrapped to match the AEC interface. I do not know what would be involved in having them update after every agent move. You might have better replies asking on the discord channel: https://discord.gg/nhvKkYa6qX |
Describe the bug
It seems like after taking an action in an Atari game by the first player, the observation is the same as it was before taking the action. This could become problematic for reinforcement learning algorithms since the next state constructed from this observation may not show the consequence of the action taken. I have provided a code snippet that would check this issue by running an Atari environment, taking some actions and asserting if the observation is the same for the first player. If you run the code and you get no assertion error that means that the observation of the first player has not changed before and after taking the action while playing the game. Note that sometimes due to no-op or in the early stages of the game the observation may not change which makes sense, so the problem is why the observation is "always" the same before and after taking an action by the first player.
Code example
System info
Describe how PettingZoo was installed: Using pip
Version of pettingzoo: 1.24.3
What OS/version you're using: Ubuntu 22.04.1 LTS, Linux 6.5.0-15-generic, x86-64
Python Version: 3.11.0
Additional context
The provided code only checks Space Invaders but I have tried using other environments and got the same results. I also have tried doing the same thing for second player(called "second_0") and got an assertion error at some point which shows that the observation is changing after the second player's action as expected.
Checklist
The text was updated successfully, but these errors were encountered: