Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues/Questions about Postgres array support #1077

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

MaxGabriel
Copy link
Member

@MaxGabriel MaxGabriel commented Apr 19, 2020

Hey all, we started trying to use the Postgres array support at work and ran into some issues. I'm not sure which of these are considered bugs, vs expected limitations.

  1. Documentation on using PersistArray is pretty limited and we could probably improve that. Happy to PR this. Some notable things:

    • The docs currently say "Intended especially for PostgreSQL backend for text arrays". I don't think this is true? They seem to work fine for int[] for example.

    • Generally lacking a description of how to write PersistField or PersistFieldSql instances for them.

  2. Serializing to PersistArray deserializes to PersistList. This isn't a big deal, and I haven't looked into if there's a solution for it, but it would be great to document it if not! See instance PersistField RoundtripTextArray in the draft PR for an example of this failing.

Edit: from the source, this is probably intended for backwards compatibility reasons and the best we can do is document it?

  1. Trying to store data in a jsonb[] column fails with the following error:
       SqlError {sqlState = "42804", sqlExecStatus = FatalError, sqlErrorMsg = "column \"test\" is of type jsonb[] but expression is of type text[]", sqlErrorDetail = "", sqlErrorHint = "You will need to rewrite or cast the expression."}

Edit: This probably comes from https://www.postgresql.org/docs/9.1/sql-expressions.html#SQL-SYNTAX-ARRAY-CONSTRUCTORS

By default, the array element type is the common type of the member expressions, determined using the same rules as for UNION or CASE constructs (see Section 10.5). You can override this by explicitly casting the array constructor to the desired type, for example:

  1. Can/should persistent-postgresql provide any datatypes that allow people to use PersistArray without writing their own instances?

@MaxGabriel
Copy link
Member Author

cc @Vlix who initially added the Postgres array support, and @parsonsmatt who reviewed the PR

also my co-workers @mitchellvitez and @danielbarter who were looking into this

@MaxGabriel
Copy link
Member Author

Ok, one more thing, it looks like Persistent wants to remigrate the columns each time? e.g. if you run this:

(1) ~/D/C/H/y/persistent> stack test persistent-postgresql --test-arguments="--match PersistArray"        14:41:53
persistent-postgresql> test (suite: test, args: --match PersistArray)

ALTER TABLE "test_roundtrip" ALTER COLUMN "test" TYPE text[];
ALTER TABLE "test_list_hack" ALTER COLUMN "test" TYPE text[];
ALTER TABLE "test_int_array" ALTER COLUMN "test" TYPE int[];
ALTER TABLE "test_j_s_o_n_array" ALTER COLUMN "test" TYPE jsonb[];

You will see those ALTER TABLEs on each run, whereas I believe they should only be on the first run. You can see in the database the columns already have that type as well:

                                                Table "public.test_roundtrip"
 Column |  Type  | Collation | Nullable |                  Default                   | Storage  | Stats target | Description
--------+--------+-----------+----------+--------------------------------------------+----------+--------------+-------------
 id     | bigint |           | not null | nextval('test_roundtrip_id_seq'::regclass) | plain    |              |
 test   | text[] |           | not null |                                            | extended |              |

This isn't explicitly tested for in these tests atm, but that would be a good one to add

@parsonsmatt
Copy link
Collaborator

Thanks for reporting and exploring on this!

@Vlix
Copy link
Contributor

Vlix commented Apr 21, 2020

Hey @MaxGabriel, I added in the PersistArray constructor to be able to correctly toField a text array ("{text1,text2,etc}") for the JSON module. Specifically the (?|.) and (?&.) operators.
I didn't intend it to be used in any other way, so if anyone wants to correctly implement it, they can take as many liberties as they want, as long as the JSON tests pass. :)

@MaxGabriel
Copy link
Member Author

MaxGabriel commented Apr 21, 2020

Ahhh, ok that is helpful info @Vlix. I saw @parsonsmatt closed the existing Postgres array issues (https://github.com/yesodweb/persistent/issues?q=is%3Aissue+postgres+array) —Matt may have thought it was considered solved? Or I just misunderstood

@Vlix
Copy link
Contributor

Vlix commented Apr 21, 2020

@MaxGabriel I have not seen those issues. And yes, it seems like @parsonsmatt assumed I implemented arrays completely. I probably should have put big disclaimers around that constructor :/
I couldn't use PersistList (which semantically would make most sense) because it uses to/from JSON marshalling in text columns and it would probably break way too much, so that was just not usable.
But yes, I emphatically state that PersistArray was not meant to be used in anything else than the JSON operators.

I also see my JSON tests could have been written better. Apologies @ @parsonsmatt

EDIT: This is all under the assumption that no one else added functionality for PersistArray after me.

@parsonsmatt
Copy link
Collaborator

No worries @Vlix - your contributions are super welcome and I'd rather have something incomplete than wait for a perfection that never comes 😄

@parsonsmatt
Copy link
Collaborator

Fortunately we have a breaking change 2.11 coming out soon and then I want to get 3.0.0 out before too long, so we have two opportunities to implement breaking changes to behavior to provide extra functionality.

@MaxGabriel
Copy link
Member Author

Yes I want to second @parsonsmatt's thanks to you @Vlix. Postgres array support has been a requested feature for a long, long time so it was great you got it working for the JSON part of it. I just didn't realize that was what was intended

@MaxGabriel
Copy link
Member Author

MaxGabriel commented Apr 22, 2020

Ok looked into this a little, small update on #1077 (comment)

Currently we run this code to get data about columns:

    let sqlv=T.concat ["SELECT "
                          ,"column_name "
                          ,",is_nullable "
                          ,",COALESCE(domain_name, udt_name)" -- See DOMAINS below
                          ,",column_default "
                          ,",numeric_precision "
                          ,",numeric_scale "
                          ,",character_maximum_length "
                          ,"FROM information_schema.columns "
                          ,"WHERE table_catalog=current_database() "
                          ,"AND table_schema=current_schema() "
                          ,"AND table_name=? "
                          ,"AND column_name <> ?"]

The important part of the query is that it returns _int4 for the udt_name. This causes Persistent to think the column needs to be migrated from an _int4 (based on the query) to int[] (Based on the PersistFieldSql instance), in the modType function inside findAlters.

It looks like that table also has a data_type parameter, which is ARRAY for the arrays. When the value is array, we can join with the element_types table to get additional info about the type, per this: https://www.postgresql.org/docs/11/infoschema-element-types.html

Example:

test=# SELECT *
FROM information_schema.columns c LEFT JOIN information_schema.element_types e
     ON ((c.table_catalog, c.table_schema, c.table_name, 'TABLE', c.dtd_identifier)
       = (e.object_catalog, e.object_schema, e.object_name, e.object_type, e.collection_type_identifier))
WHERE c.table_name = 'test_int_array'
ORDER BY c.ordinal_position;
-[ RECORD 1 ]--------------+-------------------------------------------
table_catalog              | test
table_schema               | public
table_name                 | test_int_array
column_name                | id
ordinal_position           | 1
column_default             | nextval('test_int_array_id_seq'::regclass)
is_nullable                | NO
data_type                  | bigint
character_maximum_length   |
character_octet_length     |
numeric_precision          | 64
numeric_precision_radix    | 2
numeric_scale              | 0
datetime_precision         |
interval_type              |
interval_precision         |
character_set_catalog      |
character_set_schema       |
character_set_name         |
collation_catalog          |
collation_schema           |
collation_name             |
domain_catalog             |
domain_schema              |
domain_name                |
udt_catalog                | test
udt_schema                 | pg_catalog
udt_name                   | int8
scope_catalog              |
scope_schema               |
scope_name                 |
maximum_cardinality        |
dtd_identifier             | 1
is_self_referencing        | NO
is_identity                | NO
identity_generation        |
identity_start             |
identity_increment         |
identity_maximum           |
identity_minimum           |
identity_cycle             | NO
is_generated               | NEVER
generation_expression      |
is_updatable               | YES
object_catalog             |
object_schema              |
object_name                |
object_type                |
collection_type_identifier |
data_type                  |
character_maximum_length   |
character_octet_length     |
character_set_catalog      |
character_set_schema       |
character_set_name         |
collation_catalog          |
collation_schema           |
collation_name             |
numeric_precision          |
numeric_precision_radix    |
numeric_scale              |
datetime_precision         |
interval_type              |
interval_precision         |
domain_default             |
udt_catalog                |
udt_schema                 |
udt_name                   |
scope_catalog              |
scope_schema               |
scope_name                 |
maximum_cardinality        |
dtd_identifier             |
-[ RECORD 2 ]--------------+-------------------------------------------
table_catalog              | test
table_schema               | public
table_name                 | test_int_array
column_name                | test
ordinal_position           | 2
column_default             |
is_nullable                | NO
data_type                  | ARRAY
character_maximum_length   |
character_octet_length     |
numeric_precision          |
numeric_precision_radix    |
numeric_scale              |
datetime_precision         |
interval_type              |
interval_precision         |
character_set_catalog      |
character_set_schema       |
character_set_name         |
collation_catalog          |
collation_schema           |
collation_name             |
domain_catalog             |
domain_schema              |
domain_name                |
udt_catalog                | test
udt_schema                 | pg_catalog
udt_name                   | _int4
scope_catalog              |
scope_schema               |
scope_name                 |
maximum_cardinality        |
dtd_identifier             | 2
is_self_referencing        | NO
is_identity                | NO
identity_generation        |
identity_start             |
identity_increment         |
identity_maximum           |
identity_minimum           |
identity_cycle             | NO
is_generated               | NEVER
generation_expression      |
is_updatable               | YES
object_catalog             | test
object_schema              | public
object_name                | test_int_array
object_type                | TABLE
collection_type_identifier | 2
data_type                  | integer
character_maximum_length   |
character_octet_length     |
character_set_catalog      |
character_set_schema       |
character_set_name         |
collation_catalog          |
collation_schema           |
collation_name             |
numeric_precision          |
numeric_precision_radix    |
numeric_scale              |
datetime_precision         |
interval_type              |
interval_precision         |
domain_default             |
udt_catalog                | test
udt_schema                 | pg_catalog
udt_name                   | int4
scope_catalog              |
scope_schema               |
scope_name                 |
maximum_cardinality        |
dtd_identifier             | a2

test=#

@MaxGabriel
Copy link
Member Author

MaxGabriel commented Nov 25, 2020

I did a little investigation into postgresql-simple and came up with a way to support arrays of enums with it haskellari/postgresql-simple#57

This doesn't seem great though. persistent works through the PersistValue wrapper, which is trying to be database generic. So, to get this cast in it could maybe go for unescaped data, adding in ::foo at the end? Ideally it would have some way of just generating a postgresql-simple Action, but persistent doesn't want to depend on postgresql-simple to get that into PersistValue

edit: good solution on that issue would be to generate the cast with the enum values themselves, which seems pretty doable for persistent enums for example

@MaxGabriel
Copy link
Member Author

Hm, ok but even if inserts are solved, selects are still a problem. My understanding is: when custom types are used, they get assigned an OID by Postgres identifying their type. Persistent gets access to the OID like so:

oids <- forM [0..cols-1] $ \col -> fmap ((,) col) (LibPQ.ftype ret col)

Then it checks the OID against a handful of hard-coded options (builtinGetters), and if it doesn't find a suitable match it defaults to defaultGetter, which uses PersistLiteralEscaped. This works fine for string-based custom types like enums or e.g. an email type, but not others (e.g. an array).

To get this additional data, Persistent can use getTypeInfo (or query Postgres itself). The results of that query are cached on a per-connection basis, which is helpful.

(postgresql-simple notes that sharing this cache among a pool of connections would be more efficient. postgresql-simple could maybe also make getTypeInfo work on a list of OIDs, since the current setup requires e.g. 5 queries to Postgres if a result row contains 5 new types).

Overall I think this could be a good move to use this, even though the caching behavior could be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants