Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use explicit item type map to determine field types and encodings #33

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

piehld
Copy link
Collaborator

@piehld piehld commented Sep 6, 2024

This PR is intended to address a current limitation in the conversion of mmCIF to BCIF for some data items.

Specifically, there's a bug when handling some uncommon data types such as point_group and float_range.

  • point_group fails because it has "int" in the name (i.e., "point_group")
  • float_range fails because it has "float" in the name, so the code tries to encode it as a float, but really it's a string since the value contains a hyphen

Rather than having the code try to figure out what the type is and how to encode it based on substring matching, I've added a [current] complete list of possible types and what they should be encoded as. This is similar to the method being used by our Java encoder (e.g., see here).

Additionally, I made a minor adjustment to the __molstar_forced_ints object, in which I changed it to be a dictionary rather than a list to allow for O(1) efficiency.

@piehld piehld requested a review from epeisach September 6, 2024 20:16
@epeisach
Copy link
Collaborator

epeisach commented Sep 6, 2024

Okay - your code rolled back som submodules -- hence tests failing.

Is there a reason you are doing this?

Otherwise - fundamentally looks good...

@epeisach
Copy link
Collaborator

epeisach commented Sep 6, 2024

I notice that you are using the term "field" in inMostarTypeHints -- maybe use the term "name" -- this would be consistent with PdbxContainers.py CifName class.

@epeisach
Copy link
Collaborator

epeisach commented Sep 7, 2024

I reverted back the modules/cpp-cif-file to latest.
(git submodule update after you pull in changes to your branch)

I leave it to you to decide "field" vs "name" discussion.

@epeisach
Copy link
Collaborator

epeisach commented Sep 7, 2024

You mentioned that this was detected by trying to convert public PDB files. Perhaps a test case to show success? I am not sure what the failure mode was with existing code and if it was detectable. Or create a simple test script that outputs a minimal file - with point_group in it - and show it works - which would have failed before?

@epeisach
Copy link
Collaborator

epeisach commented Sep 9, 2024

I have added a test - that would ail with current code and is fixed with your changes. I needed to capture the output of logging and look for the cast error - which is no longer present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants