Using --schema flag for having dump schema level granularity and make COPY less error prone #14

andilabs · 2020-01-31T10:55:52Z

Using --schema flag for having dump schema level granularity
improved per schema filtering, make COPY less error prone
changed DELIMITER, QUOTE and ESCAPE for better handling of complicated data type fields (JSON in particular)
added explicitly the table collumns listed in COPY FROM to handle situation when the order of collumns differs between source and destination databases
added flag --password to explicitly ask for password when pg_dump needed (i.e each run without --data-only flag)
commented out standard_conforming_strings = off so this will be set to default value, which is on and is needed for import with \t delimiter
introduced the requirement that --sample_schema if specified must include _pg_sample in name to avoid risk of accidentally delete when used with --force

like in pg_dump and introduce --sample_schema overtaking old functionality

Using --schema flag for having dump schema level granularity

changed DELIMITER, QUOTE and ESCAPE for better handling of complicated data type fields (JSON in particular) added explicitly the table collumns listed in COPY FROM to handle situation when the order of collumns differs between source and destination databases added flag --password to explicitly ask for password when pg_dump needed (i.e each run without --data-only flag) commented out standard_conforming_strings = off so this will be set to default value, which is on and is needed for import with \t delimiter introduced the requirement that --sample_schema if specified must include _pg_sample in name to avoid risk of accidentally delete when used with --force

andilabs · 2020-02-01T14:16:12Z

@mla I am not much into Perl. Any kind of feedback will be welcome ✊

mla · 2020-02-05T16:24:38Z

These look like great changes, @andilabs. Why was --password option added to pg_dump?

mla · 2020-02-05T16:31:39Z

I saw your comment on the change. Just wondering if it's actually necessary since pg_dump says:

--password
           Force pg_dump to prompt for a password before connecting to a
           database.

           This option is never essential, since pg_dump will automatically
           prompt for a password if the server demands password
           authentication. However, pg_dump will waste a connection attempt
           finding out that the server wants a password. In some cases it is
           worth typing -W to avoid the extra connection attempt.

andilabs · 2020-02-05T19:00:03Z

I find out this flag necessary - at least for my production db. I was also wondering when I red in the docs that it isn’t needed 🤷🏼‍♂️

On Wed, 5 Feb 2020 at 17:31, mla ***@***.***> wrote: I saw your comment on the change. Just wondering if it's actually necessary since pg_dump says: --password Force pg_dump to prompt for a password before connecting to a database. This option is never essential, since pg_dump will automatically prompt for a password if the server demands password authentication. However, pg_dump will waste a connection attempt finding out that the server wants a password. In some cases it is worth typing -W to avoid the extra connection attempt. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14?email_source=notifications&email_token=AAHS6N3E3CUFY5UDFI3TQLTRBLSWZA5CNFSM4KOFEN62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK4CH7I#issuecomment-582493181>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHS6N3RX62IHMSF3P76XM3RBLSWZANCNFSM4KOFEN6Q> .

-- Pozdrowienia, Andrzej

andilabs · 2020-04-17T08:00:02Z

Hi @mla how are you doing in coronavirus times? I hope you are doing well!
What about merging this PR?

Cheers!
Andi

mla · 2020-04-17T15:40:52Z

Hey @andilabs. Doing okay, thanks. Hope you are too. Yes, sorry for the delay on this. Will target this weekend.

mla · 2020-04-18T18:27:07Z

@andilabs can you explain what issue you hit with the COPY being error prone? Or even better, an example test case?

From the project dir, you should be able to run: prove -v
to run the existing tests.

On the standard_conforming_strings setting, I see you commented it out with a comment that we'll use the Pg default. But isn't it more robust to know what the setting is? I don't particularly care if setting it on or off is better, but seems like we should try to minimize any configuration differences between environments.

I've working in the dev branch and working through your changes if you want to see the current state. The splitting of schema and table name shouldn't be necessary. The values in @table are Table instances, so we should be able to inspect the schema() and table() accessors directly.

If you could just explain what the string_agg stuff and adding CSV DELIMITER specifications and such to the COPY so I can better understand what problem was being triggered.

Thanks!

mla · 2020-04-19T16:29:35Z

Also, I would rather we not add the --password option to the pg commands. The pg_dump docs state, for example:

-W
       --password
           Force pg_dump to prompt for a password before connecting to a
           database.

           This option is never essential, since pg_dump will automatically
           prompt for a password if the server demands password
           authentication. However, pg_dump will waste a connection attempt
           finding out that the server wants a password. In some cases it is
           worth typing -W to avoid the extra connection attempt.

So what is happening for you? It is not prompting you for a password?

We already support the connection params as part of the main script, so prob those should just be propagated to the pg_dump call if supplied.

mla · 2020-04-19T16:43:04Z

I guess there's no great way to pass the password securely. There is a PGPASSWORD env var but it's not recommended to use.

There's a PGPASSFILE var that we could use but seems like a bit of a pain. We could create a correctly formatted pgpass file and point to that. https://www.postgresql.org/docs/9.6/libpq-pgpass.html

Or we could pass the --password option to pg_dump if it was supplied to pg_sample. But I thought that would be triggered automatically.

andilabs · 2020-04-30T16:35:33Z

pg_sample

+  my ($schema_name, $table_name) = split('\.', $table, 2);
+  my $cleaned_table_name = substr $table_name, 1, -1;
+  my $cleaned_schema_name = substr $schema_name, 1, -1;
+  my ($q) = "SELECT string_agg(quote_ident(column_name), ',') FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '$cleaned_table_name' AND TABLE_SCHEMA = '$cleaned_schema_name'";


@mla this was needed to explicitly persist the order of columns, which may differ in prod vs dev db.

andilabs · 2020-04-30T16:35:55Z

pg_sample

+  my $cleaned_schema_name = substr $schema_name, 1, -1;
+  my ($q) = "SELECT string_agg(quote_ident(column_name), ',') FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '$cleaned_table_name' AND TABLE_SCHEMA = '$cleaned_schema_name'";
+  my ($column_names_to_keep_order) = $dbh->selectrow_array($q);
+  print "COPY $table ($column_names_to_keep_order) FROM stdin WITH CSV DELIMITER E'\\t' QUOTE '\b' ESCAPE '\\';\n";


@mla this was for mentioned JSONB fields proper escaping

andilabs · 2020-04-30T16:37:48Z

ok let's get rid of password opt-out good practice of using pgpass
the case for more error-prone COPY was connected to dealing with nested JSONB fields escaping. It's a bit pitty task for me to find examples now, because even if I find it, then I will have to clean/fake/ those data being confidential production system data.
I added some comments in MR too.
regarding The values in @table are Table instances, so we should be able to inspect the schema() and table() accessors directly. excuses this. I might be miss using perl because of lacking experience in it.

justin808 · 2020-10-22T07:24:31Z

@mla Will this one get merged?

mla · 2020-10-23T16:02:55Z

@mla Will this one get merged?

What features from this are you looking for? Are you hitting a problem?

andilabs and others added 3 commits January 24, 2020 13:17

Using --schema flag for having dump schema level granularity

2600b41

like in pg_dump and introduce --sample_schema overtaking old functionality

Merge pull request #1 from andilabs/andilabs-patch-1

fbedd08

Using --schema flag for having dump schema level granularity

andilabs mentioned this pull request Jan 31, 2020

Using --schema flag for having dump schema level granularity like in pg_dump and introduce --sample_schema overtaking old functionality #13

Open

andilabs requested a review from mla February 1, 2020 14:15

mla added a commit that referenced this pull request Apr 18, 2020

sample schema, schema specification; #14

7c5d803

andilabs commented Apr 30, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using --schema flag for having dump schema level granularity and make COPY less error prone #14

Using --schema flag for having dump schema level granularity and make COPY less error prone #14

andilabs commented Jan 31, 2020

andilabs commented Feb 1, 2020

mla commented Feb 5, 2020

mla commented Feb 5, 2020

andilabs commented Feb 5, 2020 via email

andilabs commented Apr 17, 2020

mla commented Apr 17, 2020

mla commented Apr 18, 2020

mla commented Apr 19, 2020

mla commented Apr 19, 2020

andilabs Apr 30, 2020

andilabs Apr 30, 2020

andilabs commented Apr 30, 2020

justin808 commented Oct 22, 2020

mla commented Oct 23, 2020

Using --schema flag for having dump schema level granularity and make COPY less error prone #14

Are you sure you want to change the base?

Using --schema flag for having dump schema level granularity and make COPY less error prone #14

Conversation

andilabs commented Jan 31, 2020

andilabs commented Feb 1, 2020

mla commented Feb 5, 2020

mla commented Feb 5, 2020

andilabs commented Feb 5, 2020 via email

andilabs commented Apr 17, 2020

mla commented Apr 17, 2020

mla commented Apr 18, 2020

mla commented Apr 19, 2020

mla commented Apr 19, 2020

andilabs Apr 30, 2020

Choose a reason for hiding this comment

andilabs Apr 30, 2020

Choose a reason for hiding this comment

andilabs commented Apr 30, 2020

justin808 commented Oct 22, 2020

mla commented Oct 23, 2020