Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminals cannot be constructed from sequences with skip nodes #20

Open
Theodus opened this issue Jan 28, 2021 · 1 comment
Open

Terminals cannot be constructed from sequences with skip nodes #20

Theodus opened this issue Jan 28, 2021 · 1 comment
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@Theodus
Copy link
Contributor

Theodus commented Jan 28, 2021

The example below shows a program that expects to construct the Header and Body terminals from sequences containing skipped input. However, the program instead given an error like.

1: >ID_8867
   ^
   expected at least one element

I suspect that the implementation is unable to construct terminals from multiple non-contiguous substrings. This should either be fixed, or an error message should be presented that the grammar is malformed.

use "peg"

actor Main
  new create(env: Env) =>
    let input = ">ID_8867\nGATTACA\nACATTAG\n"
    let src = Source.from_string(input, "text")
    match recover val FastaParser().parse(src) end
    | (_, let ast: AST) =>
      env.out.print(recover Printer(ast) end)
    | (let offset: USize, let r: Parser val) =>
      let e = recover val SyntaxError(src, offset, r) end
      env.out.writev(PegFormatError.console(e))
    else
      env.out.print("wat")
    end

primitive FastaParser
  fun apply(): Parser val =>
    recover
      let id = (R('A', 'z') / L("_") / R('0', '9').many1()).many1()
      let header = (L(">") * id * L("\n")).term(Header)
      let body = ((L("A") / L("T") / L("G") / L("C")).many1() * -L("\n")).many().term(Body)
      let records = (header * body).many1()
      records
    end

primitive Header is Label fun text(): String => "Header"
primitive Body is Label fun text(): String => "Body"
@Theodus Theodus added the bug Something isn't working label Jan 28, 2021
@rhagenson
Copy link
Member

I raised my confusion on this point on Zulip so want to add a few details for anyone reading afterwards.

I ended up change the implementation to the following to get around terminals not being able to be constructed with skip nodes.

use "peg"

actor Main
  new create(env: Env) =>
    let input = ">ID_8867\nGATTACA\nACATTAG\n"
    let src = Source.from_string(input, "text")
    match recover val FastaParser().parse(src) end
    | (_, let ast: AST) =>
      env.out.print(recover Printer(ast) end)
    | (let offset: USize, let r: Parser val) =>
      let e = recover val SyntaxError(src, offset, r) end
      env.out.writev(PegFormatError.console(e))
    else
      env.out.print("wat")
    end

primitive FastaParser
  fun apply(): Parser val =>
    recover
      let id = (R('A', 'z') / L("_") / R('0', '9').many1()).many1()
      let header = -L(">") * (id.term(Header)) * -L("\n")
      let body = ((L("A") / L("T") / L("G") / L("C")).many1().term(Body) * -L("\n")).many1()
      let records = (header * body).many1()
      records
    end

primitive Header is Label fun text(): String => "Header"
primitive Body is Label fun text(): String => "Body"

This has the effect of parsing as:

(
  (
    (Header ID_8867)
    (Body GATTACA)
    (Body ACATTAG)
  )
)

When what is desired is:

(
  (
    (Header ID_8867)
    (Body GATTACAACATTAG)
  )
)

As the "Body" of a FASTA record is all sequence lines (until the next Header) with all internal newline characters removed.

@SeanTAllen SeanTAllen added the help wanted Extra attention is needed label Feb 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants