-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow defining grammar in native language constructs #78
Comments
Another advantage of grammar as a python construct is possibility to inline actions into the grammar. |
I understand where you are coming from. And your points are valid. I did it in Arpeggio where you can define your grammar either by Python (a.k.a. internal DSL) or using PEG notation (external DSL), or you can even make another notation (there are two variants of PEG notations as a showcase). The problem with the approach is maintenance overhead and issues regarding consistency between different specification approaches. Another cons is that your grammar is tied up to the particular parsing framework and it is very hard to migrate to something else. With EBNF variants you can easily switch, even when there are differences usually they are not big. For example, see here for an implementation of C parser in parglare where the grammar was taken from elsewhere and minimally modified. I don't have an intention to support internal grammar spec in parglare. Actually, I have plans to port parglare to other languages and technologies and enable interoperability. E.g. compile parse table in Go and use it in Python. Thus the grammar should be defined a portable external DSL spec. Yes, you could use |
Maintenance overheadI think maintenance overhead can be reduced by use of This way duplication of code is reduced (and thus bugs and some issues) and consistency is guaranteed. Some bugs in some scenarios can even be avoided: if user prefers to use python representation and skip parsing step then she is safe from bugs in grammar parser code. Overhead on documentation remains, but there is no such thing as too much documentation ;) Lack of portabilityIf users wants the portability (at the cost of advantages I listed above) then they are free to specify the grammar in BNF. But lets assume that the python representation is "treeish" enough to be easily dumped to/loaded from json/yaml/protobuf/dhall/etc. Then enabling users to use it is absolutely great for portability beacuse they can load dumped grammars into any language they want without need to write relatively complex BNF parser (compared to And a depiction
And a note on EBNF portability and othersEBNF is not standardized notation and in general it's not easily switchable. Usually yes, but not in general. For example I consider rewriting examples on wikipedia page to EBNF comparable (regarding time) to rewriting them into python: {
'digit excluding zero': [["1"], ["2"], ["3"], ["4"], ["5"], ["6"], ["7"], ["8"], ["9"]],
'digit': [["0"], [ident('digit excluding zero')]],
'integer': [["0"], [optional( "-"), ident('natural number')]],
} In fact I opened this discussion because of problems with portability. I started adapting existing grammar. I prefer generic solutions so doing this by hand wasn't acceptable. I wrote a parser for ABNF using parglare, then the "desugaring transformer" which converts nested brackets/repetitions/optionals to structures acceptable by parglare, and now I wonder how to load converted grammar to parglare. Do I need to serialize it from python into EBNF and load using |
Thanks for the detailed explanation of the idea. Yes, this approach if implemented would be cool. Dumping grammar to json together with LALR parser table would make possible for parser to be used without grammar textual representation and grammar parsing overhead. Still though there would be a need to maintain Didn't gave it a thorough thought but I think quite a big chunk of work should be done to make this happen while retaining current set of features. Anyway, I agree with the general idea and would like to see something along this line implemented in the future. |
I wrote crude function, that converts grammar description (aka. grammar struct) into Let me know if the format I came up with is suitable for integrating into parglare (replacing the current format)? Of course more sophisticated functionality like specifying associativity and priority still has to be incorporated in it, but I think it'll be straight forward (oh, naive me). If you are curious how I imagine this format being incorporated into loading grammar form string take a look at ABNF parser and grammar desugaring code - parglare-EBNF can be supported similarily, and I think even easier, because unlike ABNF it forbids nested structures. |
@SupraSummus Just to let you know that I have been working on a redesign lately for the upcoming 0.10 version. There are a lot of changes and one of them is ability to specify grammar as a Python struct you suggested. Other interesting stuff include action/recognizer definition as python class/objects utilizing inheritance for override and various smaller improvements. Generally, the design is in my view now much simpler and more maintainable. The work is currently on Interesting module to look at is Import semantics is changed to be a simple file import with merging using the same namespace. I'm planning to add a proper language registration/discovery/reference that will inherit a part of the previous import functionality. |
@SupraSummus, you may want to take a look at https://gitlab.com/UniGrammar/UniGrammar.py |
@igordejanovic, I see you mentioned working on this for version 0.10. Were these changes merged in to the master branch? I wrote a natural language parser (Pyramids) a while back. It's terribly slow, due to its naive approach, but it's indispensable because it handles a broad array of natural language constructs, converting them to small semantic graphs that I use in a larger NLU project I'm working on. I'm trying to port the parser to use parglare instead, as it seems you have done an excellent job on this package and mine is shamefully slow in comparison. However, I have a very large natural language grammar written in a DSL with very specific design requirements that prevent it from being ported easily to parglare's grammar DSL. This leaves me in the position of needing to programmatically construct a natural language grammar as the grammar file written in my own DSL is read, but I'm iffy on how to approach the task. I had expected to see some sort of "grammar builder" object used in the I'm considering implementing a grammar builder class that feeds into the struct-based mechanism you already mentioned. This would not only meet my own needs, but it would likely meet @SupraSummus's needs as well. If I do eventually implement something along these lines, would you be interested in merging it into the code base (assuming it meets your code quality standards, etc.) or linking to a separate repo from your own? |
@hosford42 Yes, I did a rework on a If I recall correctly, this branch was complete, all tests were passing. You can start to look here https://github.com/igordejanovic/parglare/blob/language-def-redesign/tests/func/rework/test_grammar.py And this module is interesting as it represents an implementation of parglare grammar DSL using internal struct-based definition. This means that parglare grammar language is just one of many possible grammar languages you can create. You can provide your own if you would like. |
I just need to warn you that there is an important rework of GLR algorithm on |
If I am understanding you correctly, the |
Yes, you understood correctly. |
Thank you! |
This is a very common practice that a parser library has it's own grammar syntax (sometimes standardized, like ABNF), but I wonder why is that? Why not just allow user to encode grammar in native language constructs?
Encoding grammar in native language has advantages. Here are some, I can think of:
When I came across parglare and started digging into the code and found
Grammar.from_struct
i realized this is the thing. Is there any reason why one shouldn't use this method outside parglare? Maybe all is needed to make this method usable for user is documentation?I don't mean parglare should get rid of it's grammar parser - just allow to not use it.
The text was updated successfully, but these errors were encountered: