Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate character encoding #18

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 40 additions & 2 deletions ksy_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,9 @@
"pattern": "^(.*/)?[a-z][a-z0-9_]*$"
}
},
"encoding": { "type": "string" },
"encoding": {
"$ref": "#/definitions/CharacterEncoding"
},
"endian": {
"anyOf": [
{ "enum": [ "le", "be" ] },
Expand Down Expand Up @@ -422,7 +424,7 @@
"pattern": "^([a-z][a-z0-9_]*::)*[a-z][a-z0-9_]*$"
},
"encoding": {
"type": "string"
"$ref": "#/definitions/CharacterEncoding"
},
"pad-right": {
"type": "integer",
Expand Down Expand Up @@ -635,6 +637,42 @@
{ "type": "boolean" },
{ "type": "null" }
]
},
"CharacterEncoding": {
"enum": [
"ASCII",
"UTF-8",
"UTF-16LE",
"UTF-16BE",
"UTF-32LE",
"UTF-32BE",
"ISO-8859-1",
"ISO-8859-2",
"ISO-8859-3",
"ISO-8859-4",
"ISO-8859-5",
"ISO-8859-6",
"ISO-8859-7",
"ISO-8859-8",
"ISO-8859-9",
"ISO-8859-10",
"ISO-8859-11",
"ISO-8859-13",
"ISO-8859-14",
"ISO-8859-15",
"ISO-8859-16",
"windows-1250",
"windows-1251",
"windows-1252",
"windows-1253",
"windows-1254",
"windows-1255",
"windows-1256",
"windows-1257",
"windows-1258",
"IBM437",
"IBM866"
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to keep this list in sync with https://github.com/kaitai-io/kaitai_struct_compiler/blob/master/shared/src/main/scala/io/kaitai/struct/EncodingList.scala

I wonder if there is any easy way out here? I can imagine:

  • some tool which will copy ksc's EncodingList.scala list to here automatically
  • some tool which will copy this file to EncodingList.scala
  • getting ksc to read ksy_schema.json as source of truth (but then we'll need to keep a list of aliases somewhere?)

Copy link
Member

@generalmimon generalmimon Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • some tool which will copy ksc's EncodingList.scala list to here automatically

I think this is the way to go (but I don't think it has to be "automatically"). Practically speaking, I'm thinking of adding a CI job to kaitai_struct_compiler that compares EncodingList.scala with the latest ksy_schema.json. If it finds any inconsistencies, it can do one of two things:

  1. just fail the CI job and thus prompt a human operator to look into the CI log, which would contain the details about the failed comparison. The human would then go here to https://github.com/kaitai-io/ksy_schema and update the list manually.

    This should be simple to implement and would probably do the job good enough. Yes, the updating would still need to be done manually, but I don't think the encoding list in KSC will be changing so rapidly that this becomes annoying.

  2. have @kaitai-bot open a PR with the update in https://github.com/kaitai-io/ksy_schema/pulls and wait for the human operator to merge it. This would perhaps be more convenient if we were changing the encoding list in KSC often, but as I already mentioned, I don't think we will, so this is probably an overkill. It adds a bunch of implementation challenges on top of the variant 1 - for example, editing the JSON file so that the overall formatting stays intact, opening a PR via some API, etc.

    To be honest, I don't think we need this, it seems overengineered. I suppose that there will be like less than 10 updates of the encoding list in the next 5 years, and we can certainly handle that much manually.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to keep this list in sync

The same could be said for the list of attributes (e.g. the valid attribute, implemented by the compiler but missing from the spec kaitai-io/kaitai_struct#944).

Practically speaking, I'm thinking of adding a CI job to kaitai_struct_compiler that compares EncodingList.scala with the latest ksy_schema.json

I think it would be ideal to add a CI job in https://github.com/kaitai-io/kaitai_struct_formats which checks that *.ksy files validate against the schema.

}
}
}