Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate captured RegExp groups to the output value #1204

Open
Abion47 opened this issue Nov 14, 2024 · 7 comments
Open

Propagate captured RegExp groups to the output value #1204

Abion47 opened this issue Nov 14, 2024 · 7 comments
Labels
confirmed The maintainers of the repo would like to address this

Comments

@Abion47
Copy link
Contributor

Abion47 commented Nov 14, 2024

Related to #695 and possibly related to #491 (specifically similar to the second half of this comment), it would be reasonable to assume that RegExp fields should be able to use capture groups and be able to propagate those groups to the result.

In practice, when a RegExp field is parsed, if the pattern contains capturing groups, those groups should be accessible from the resulting value as a tuple property groups. In order to maintain parity with vanilla JS RegExp and standard regular expression behavior, the first value of the tuple should be the entire matched string with the matched groups beginning at index 1.

Examples:

// should simply result in a string
const Foo = type("/^abcd$/");
const value = Foo.assert("abcd"); // 'abcd'

// should result in a union of the matched string and a group tuple
const Foo = type("/^(abc)d$/");
const value = Foo.assert("abcd"); // 'abcd' | { groups: ['abcd', 'abc'] }
const [_, group] = value.groups;

// should result in a union of the matched string and a group tuple
const Foo = type("^/(\d{1,3}[A-Z])? ([A-Za-z ']+), ([A-Za-z \-]+), ([A-Z]{2}\d{1,2} \d{1,2}[A-Z]{2})$/");
const value = Foo.assert("221B Baker Street, London, NW1 6XE"); // '221B Baker Street, London, NW1 6XE' | { groups: ['221B Baker Street, London, NW1 6XE', '221B', 'Baker Street', 'London', 'NW1 6XE'] }'
const [address, houseNumber, road, city, postCode] = value.groups;
@github-project-automation github-project-automation bot moved this to To do in arktypeio Nov 14, 2024
@ssalbdivad ssalbdivad added the confirmed The maintainers of the repo would like to address this label Nov 14, 2024
@ssalbdivad
Copy link
Member

ssalbdivad commented Nov 14, 2024

Definitely would love to add this! Will have to think a bit about the syntax to differentiate from a pattern-only regex, but would be a very valuable API to have, especially if we can make it type safe.

As with #695, would welcome external contributions here and would be happy to answer questions about the type system and help get it integrated smoothly!

@Abion47
Copy link
Contributor Author

Abion47 commented Nov 14, 2024

A low hanging fruit implementation would be to simply make all returned RegExp values be typed as string & { groups?: string[] }, and have pattern-only regex values leave groups as undefined. This should enable it to remain backward compatible with all current uses of it with only the minor inconvenience of having to deal with the nullability of groups in the new case. (e.g. const [...] = value.groups ?? [];)

At my level of understanding regarding ArkType's typing system (read: it might as well be straight up witchcraft), I'm not sure how to get beyond that point.

@ssalbdivad
Copy link
Member

@Abion47 I assume you're referring to adding groups as a prop at runtime on the result type?

Unfortunately, although encoding extra information on a primitive via branding works at a type-level, it is impossible to attach additional properties to a string.

@Abion47
Copy link
Contributor Author

Abion47 commented Nov 15, 2024

That's an interesting factoid I did not know. Mark that as one more entry under the "Javascript has rules except when it chooses to break them" category.

Still, it seems to work if you target String rather than string.

type StringWithProp = string & { groups?: string[] };

function foo(): StringWithProp {
    const s: any = new String('abc');
    s.groups = ['a', 'b', 'c'];
    return s as StringWithProp;
}

function test(s: string) {
    console.log(s, `${s}`);
}

const bar = foo();

console.log(bar);                     // String: "abc"
console.log(bar.groups);              // ["a", "b", "c"]
console.log(typeof bar === 'string'); // false
test(bar);                            // String: "abc", "abc"

That's not quite a solution as typeof value === 'string' will no longer be true, but value as string seems to work just fine. I can see why a lot of linters default to not wanting you to use String.


In that case, there doesn't seem to be a backward compatible way to implement this simply - so much for the low hanging fruit option. I don't see another real solution beyond returning a compound type in the event that a RegExp contains capture groups. Or perhaps it just returns the tuple itself instead of attaching it to a property:

// Returns a string if there are no capture groups
const SimpleRegExp = type("/^abcd$/");
const value = SimpleRegExp.assert("abcd"); // 'abcd'

// Returns a tuple containing the captured groups
const GroupedRegExp = type("^/(\d{1,3}[A-Z])? ([A-Za-z ']+), ([A-Za-z \-]+), ([A-Z]{2}\d{1,2} \d{1,2}[A-Z]{2})$/");
const [address, houseNumber, road, city, postCode] = GroupedRegExp.assert("221B Baker Street, London, NW1 6XE");

@ssalbdivad
Copy link
Member

@Abion47 Yeah this is a whole mess, trust me you don't want to start turning strings into Strings 😅

It would need its own dedicated syntax, but TBH I'd probably go for that regardless because the use cases for transforming a string into groups and checking if it matches a pattern are pretty distinct and the API will likely be a lot clearer if it reflects that rather than trying to cram multiple results together.

@Abion47
Copy link
Contributor Author

Abion47 commented Nov 15, 2024

How about this. What if in addition to type and scope, there was a pattern type factory that was responsible for handling stuff like this? It can be initially backed by regular expressions, but in the future it could also be expanded to other types of complex pattern matching.

const LondonAddress = pattern({
  // The presence of the regex property indicates that this is a RegExp powered pattern
  regex: ^/(\d{1,3}[A-Z])? ([A-Za-z ']+), ([A-Za-z \-]+), ([A-Z]{2}\d{1,2} \d{1,2}[A-Z]{2})$/,
  // Optionally, the groups property assigns a type to each matched group
  // If the type system's regex parsing can manage it, this could possibly be inferred
  groups: [
    // Should group 0 be required to be specified or should it be assumed to just be "string"?
    "/^\d{1,3}[A-Z]$/",
    "/^[A-Za-z ']+$/",
    "/^[A-Za-z \-]+$/",
    "/^[A-Z]{2}\d{1,2} \d{1,2}[A-Z]{2}$/",
  ],
});

const [address, houseNumber, road, city, postCode] = LondonAddress.assert("221B Baker Street, London, NW1 6XE");

@ssalbdivad
Copy link
Member

@Abion47 Hmm, in the end it would just be a special syntax for a type with a morph attached (i.e. something like type("string").pipe(s => regex.exec(s))).

In general the philosophy has been to offer syntax + chainable methods for APIs like this rather than a new API entry point.

Honestly, I'm very confident we can come up with a reasonable API for it, but I just don't want to think about it until someone is interested in implementing it since I need to focus on docs for now 😛

If you or someone else wants to take a stab at it though, let me know and I will commit to something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed The maintainers of the repo would like to address this
Projects
Status: Backlog
Development

No branches or pull requests

2 participants