-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate character encoding #18
base: master
Are you sure you want to change the base?
Conversation
"windows-1258", | ||
"IBM437", | ||
"IBM866" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to keep this list in sync with https://github.com/kaitai-io/kaitai_struct_compiler/blob/master/shared/src/main/scala/io/kaitai/struct/EncodingList.scala
I wonder if there is any easy way out here? I can imagine:
- some tool which will copy ksc's EncodingList.scala list to here automatically
- some tool which will copy this file to EncodingList.scala
- getting ksc to read ksy_schema.json as source of truth (but then we'll need to keep a list of aliases somewhere?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- some tool which will copy ksc's EncodingList.scala list to here automatically
I think this is the way to go (but I don't think it has to be "automatically"). Practically speaking, I'm thinking of adding a CI job to kaitai_struct_compiler that compares EncodingList.scala
with the latest ksy_schema.json
. If it finds any inconsistencies, it can do one of two things:
-
just fail the CI job and thus prompt a human operator to look into the CI log, which would contain the details about the failed comparison. The human would then go here to https://github.com/kaitai-io/ksy_schema and update the list manually.
This should be simple to implement and would probably do the job good enough. Yes, the updating would still need to be done manually, but I don't think the encoding list in KSC will be changing so rapidly that this becomes annoying.
-
have @kaitai-bot open a PR with the update in https://github.com/kaitai-io/ksy_schema/pulls and wait for the human operator to merge it. This would perhaps be more convenient if we were changing the encoding list in KSC often, but as I already mentioned, I don't think we will, so this is probably an overkill. It adds a bunch of implementation challenges on top of the variant 1 - for example, editing the JSON file so that the overall formatting stays intact, opening a PR via some API, etc.
To be honest, I don't think we need this, it seems overengineered. I suppose that there will be like less than 10 updates of the encoding list in the next 5 years, and we can certainly handle that much manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to keep this list in sync
The same could be said for the list of attributes (e.g. the valid
attribute, implemented by the compiler but missing from the spec kaitai-io/kaitai_struct#944).
Practically speaking, I'm thinking of adding a CI job to kaitai_struct_compiler that compares EncodingList.scala with the latest ksy_schema.json
I think it would be ideal to add a CI job in https://github.com/kaitai-io/kaitai_struct_formats which checks that *.ksy
files validate against the schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me, if anything, that's a great first step towards user-friendly canonicalization.
@rotu On the one hand, naming Git branches after animals is kind of funny (I wonder what tool does that?), but on the other hand, sensible branch names that at least slightly indicate the content are better. It's not that different from the default naming scheme that GitHub uses for edits created via the Web GUI ( Speaking of which, please try to write at least a brief explanation in the pull request description. In general, every change deserves it. |
This is the default behavior of Visual Studio Code. The branch names should not be conveying meaning - please see the PR for canonical description.
I thought the title of the PR was sufficient but I'll elaborate. |
Why shouldn't they?
It's not just about what you did (commit / PR titles and code are probably indeed sufficient for that), but why and how you did it are often important questions. For example, here we had to guess that you took the list from And I'm not talking just about this PR, but in general - most of the time, leaving PR descriptions empty is a bad habit. |
Here's why I don't like naming my branches: (1) branch names are immutable, even if the scope of a PR changes or you named the branch poorly to start (2) it creates a conflicting source of truth for the intent of a branch.
Good points! |
Where the schema expects a declared character encoding, validate that it is one of the character encodings recognized by the compiler. When using a smart editor, this makes it easier to author a
.ksy
file.I took this list from the recognized encoding list in Kaitai Struct compiler.