Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueEnum conflict between emmet and jobflow #1002

Open
gpetretto opened this issue Sep 30, 2024 · 6 comments
Open

ValueEnum conflict between emmet and jobflow #1002

gpetretto opened this issue Sep 30, 2024 · 6 comments

Comments

@gpetretto
Copy link
Contributor

I have noticed that the behaviour of ValueEnum, whose use is widespread in atomate2 for all the codes, is incosistent between emme-core and jobflow. The former now does not have an as_dict that returns a string, while the latter still has that.
The TaskState is a ValueEnum for all the codes, but for VASP, it is the emmet version, for most of the other is the jobflow version.

The code below demonstrates the potential consequences of the problem:

from jobflow.utils.enum import ValueEnum as JobflowValueEnum

from emmet.core.utils import ValueEnum as EmmetValueEnum

from monty.json import MSONable, jsanitize
from pydantic import BaseModel


class Em(EmmetValueEnum):
    A = "A"
    B = "B"


class Jo(JobflowValueEnum):
   A = "A"
   B = "B"


class Monty(MSONable):
    def __init__(self, e, j):
        self.e = e
        self.j = j


class Pyd(BaseModel):
    e: Em
    j: Jo


m = Monty(e=Em.A, j=Jo.A)
p = Pyd(e=Em.A, j=Jo.A)

print(jsanitize(m, strict=True))
print(jsanitize(p, strict=True))

When using the latest version of emmet-core and jobflow the output is:

{'@module': '__main__', '@class': 'Monty', '@version': None, 'e': {'value': 'A', '@module': '__main__', '@class': 'Em', '@version': None}, 'j': 'A'}
{'e': {'value': 'A', '@module': '__main__', '@class': 'Em', '@version': None}, 'j': 'A', '@module': '__main__', '@class': 'Pyd', '@version': None}

Where the two lead to different results.
Trying to use dumpfn (that relies on the MontyEncoder) will be even worse, since the dumpfn(p, "p.json")` leads to the error:

TypeError: 'str' object does not support item assignment

Which is the same reported in materialsproject/emmet#1051

In general, I would probably support the option of not having the as_dict method return a string, as this breaks the MSONable API. Maybe just switching all to the emmet version of the object, if it should stay like this for jobflow? However I am not sure about all the implications of dropping that method in jobflow as well. In any case, this will change the structure of the data in the DB.

@utf @janosh

@gpetretto
Copy link
Contributor Author

Sorry, I forgot to mention that the behaviour can be made uniform if selecting enum_values=True in jsanitize, but

  1. this still leaves the inconsistency between the objects
  2. does not solve the error when the MontyEncoder is used.

@utf
Copy link
Member

utf commented Sep 30, 2024

There has already been some discussion on related issues:

I advised on not removing as_dict at the time since it will break downstream packages. I think this summarises my feelings: materialsproject/emmet#944 (comment)

Personally, I think the fix is to add back as_dict. It has been working fine for many years.

@utf
Copy link
Member

utf commented Sep 30, 2024

That said, I feel like I'm fighting a losing battle. If moving over to the emmet ValueEnum and using enum_values=True in jsanitize doesn't cause any regressions, I'm happy to go with it.

@janosh would you have any time to look into this?

@gpetretto
Copy link
Contributor Author

I see, I apologize for not searching further for previous discussions. I had found the last PR and assumed it was just a recent update.

Just to give a bit more context, I got to this point, since a user reported having the TypeError: 'str' object does not support item assignment error message in jobflow-remote. Not knowing of this issue, it was annoying to track down and is probably hard to fix in jobflow-remote itself. The reason is that I need to serialize to a json file the replace/addition/detour in the Response on the remote worker and reconstruct it properly in the Runner to apply it. The user that reported the issue is passing part of the output as an argument to the addition, and this contains subclasses of ValueEnum. The solution might be to make sure that no such object is present there, but this should be an ad-hoc solution for jobflow-remote and I would have preferred to find a general fix.

I should add that I have also realized that enum_values is always set to True when inserting the output into the store (https://github.com/materialsproject/jobflow/blob/053451721600751aaeda7eca0586635163f7a6f5/src/jobflow/core/job.py#L653), so is there even a point in having the objects in the output document being ValueEnum? Whether it is a ValueEnum or a simple Enum it will always be converted to an Enum when stored by jobflow. Am I wrong? Are there other points where being a ValueEnum makes a difference?

In general, I understand the idea behind the ValueEnum, but I am worried about breaking the MSONable API, as it can lead to unpredictable behaviours, as in this case. Maybe if this a relevant feature to have, the ValueEnum could be moved to monty and be explicitly supported in the serialization process. This would remove the ambiguity on different ValueEnum present in different packages and prevent unexpected errors. But then I suppose the enum_values=True should be removed from the jobflow serialization to actually take advantage of this feature.

@utf
Copy link
Member

utf commented Sep 30, 2024

No need to appologise. I didn't expect you to be familiar with the previous discussions, I was just putting them for the complete history of this problem.

Thanks for digging into this @gpetretto. I think one can imagine a use case for having both ValueEnum (e.g., something which serialises to a string, can be compared to a string, but which is an enum object) vs a normal Enum (something which is MSONable in that you can recover the original object) be present in jobflow. I thought this is what we had but I'm now lost with the changes to monty/emmet/jobflow as to what is still possible. There doesn't seem to be consistent use of enum_values in jobflow (e.g., https://github.com/materialsproject/jobflow/blob/053451721600751aaeda7eca0586635163f7a6f5/src/jobflow/core/store.py#L300) but I suppose even one call with it enabled means regular Enums haven't been possible for a while.

In that case, I'm happy to switch over emmet ValueEnum and enable enum_values=True everywhere.

@janosh
Copy link
Member

janosh commented Oct 2, 2024

@janosh would you have any time to look into this?

@utf sorry, not immediately. would have to be inserted fairly low on the todo list.

I think one can imagine a use case for having both ValueEnum (e.g., something which serialises to a string, can be compared to a string, but which is an enum object)

@utf that already exists actually. big fan of standard lib's StrEnum!
@gpetretto been meaning to recommend to switch jf-remote to that actually. not just because it auto-serializes to a normal string but also because you don't have to litter the code with Enum.<key>.value. Enum.<key> already yields the string value.

import json
from enum import StrEnum


class State(StrEnum):
    completed = "completed"
    failed = "failed"
    waiting = "waiting"
    submitted = "submitted"
    running = "running"


assert State.completed == "completed" # no need for .value
assert json.dumps(State.completed) == '"completed"' # no need for special serialization logic

only downside is it's 3.11+. but there's a backport package on pypi or you could just copy-paste the StrEnum class into your code as I did in pymatviz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants