You can use WAL-G as a tool for making encrypted, compressed physical Greenplum backups and push/fetch them to/from the remote storage without saving it on your filesystem.
WAL-G for Greenplum understands the basic configuration options that are supported by the WAL-G for Postgres, except the advanced features such as delta backups, remote backups, catchup backup, etc.
To configure the backups, the user needs to do two things on each segment host:
- Create the configuration file
- Configure the WAL archiving
Unlike the WAL-G for Postgres, a config file is a must. It must be placed in the same location on each cluster host, for example, /etc/wal-g/wal-g.yaml
. Sample configuration file:
~ # cat /etc/wal-g/wal-g.yaml
PGDATABASE: "postgres"
WALE_S3_PREFIX: "s3://some/s3/prefix/"
WALG_NETWORK_RATE_LIMIT: 8388608
PGPASSFILE: "/home/gpadmin/.pgpass"
WALG_GP_LOGS_DIR: "/var/log/greenplum"
AWS_ACCESS_KEY_ID: "aws_access_key_id"
WALG_UPLOAD_CONCURRENCY: 5
WALG_PGP_KEY_PATH: "/path/to/PGP_KEY"
WALG_DOWNLOAD_CONCURRENCY: 5
WALE_GPG_KEY_ID: "gpg_key_id"
WALG_DISK_RATE_LIMIT: 167772160
PGUSER: "gpadmin"
GOMAXPROCS: 6
PGHOST: "localhost"
AWS_ENDPOINT: "https://s3-endpoint.host.name"
AWS_SECRET_ACCESS_KEY: "aws_secret_access_key"
WALG_COMPRESSION_METHOD: "brotli"
Also, WAL archiving must be configured on each segment primary. Archive command must contain the --content-id
flag with the content ID of the segment.
$ vim postgresql.conf
…
wal_level = archive
archive_mode = on
archive_command = '/usr/bin/wal-g seg wal-push %p --content-id=-1 --config /etc/wal-g/wal-g.yaml'
…
After the successful configuration, use the backup-push
command from the coordinator host to create a new backup.
wal-g backup-push --config=/path/to/config.yaml
WALG_DELTA_MAX_STEPS
Delta-backup is the difference between previously taken backup and present state. WALG_DELTA_MAX_STEPS
determines how many delta backups can be between full backups. Defaults to 0.
Restoration process will automatically fetch all necessary deltas and base backup and compose valid restored backup (you still need WALs after start of last backup to restore consistent cluster).
Delta computation is based on ModTime of file system and LSN number of pages in datafiles for heap relations and on ModCount + EOF combination for AO/AOCS relations.
When creating delta backup (WALG_DELTA_MAX_STEPS
> 0), WAL-G uses the latest backup as the base by default. This behaviour can be changed via following flags:
-
--delta-from-name
flag orWALG_DELTA_FROM_NAME
environment variable to choose the backup with specified name as the base for the delta backup -
--delta-from-user-data
flag orWALG_DELTA_FROM_USER_DATA
environment variable to choose the backup with specified user data as the base for the delta backup
Examples:
wal-g backup-push --delta-from-name backup_name --config=/path/to/config.yaml
wal-g backup-push --delta-from-user-data "{ \"x\": [3], \"y\": 4 }" --config=/path/to/config.yaml
When fetching base backups, the user should pass in the cluster restore configuration and the name of the backup.
wal-g backup-fetch backup_20211202T011501Z --restore-config=/path/to/restore_cfg.json --config=/path/to/config.yaml
WAL-G can also fetch the latest backup:
wal-g backup-fetch LATEST --restore-config=/path/to/restore_cfg.json --config=/path/to/config.yaml
Cluster restore configuration declares destination host, directory, and port for each segment. Sample restore configuration:
{
"segments": {
"-1": {
"hostname": "gp6master",
"port": 5432,
"data_dir": "/gpdata/master/gpseg-1"
},
"0": {
"hostname": "gp6segment1",
"port": 6000,
"data_dir": "/gpdata/primary/gpseg0"
},
"1": {
"hostname": "gp6segment1",
"port": 6001,
"data_dir": "/gpdata/primary/gpseg1"
}
}
}
WAL-G can fetch the backup with specific UserData (stored in backup metadata) using the --target-user-data
flag or WALG_FETCH_TARGET_USER_DATA
variable:
wal-g backup-fetch --target-user-data "{ \"x\": [3], \"y\": 4 }" --restore-config=/path/to/restore_config.json --config=/path/to/config.yaml
--content-ids
flag allows to perform the fetch operations only on some specific segments. This might be useful when the backup-fetch operation is completed successfully on all segments except the few ones so the DBA or script can semi-automatically complete the failed backup fetch. For example:
wal-g backup-fetch LATEST --content-ids=3,5,7 --restore-config=restore-config.json --config=/etc/wal-g/wal-g.yaml
--mode
allows to specify the desired mode of the backup-fetching.
default
will do the backup unpacking and prepare the configs [unpack+prepare]unpack
will do backup unpacking onlyprepare
will perform config preparation only.
wal-g backup-fetch LATEST --mode=unpack --restore-config=restore-config.json --config=/etc/wal-g/wal-g.yaml
WAL-G can also do in-place backup restoration without the restore config. It might be useful when restoring to the same hosts that were used to make a backup:
wal-g backup-fetch LATEST --in-place --config=/path/to/config.yaml
During the delete execution, WAL-G can process segments in parallel mode. To control, how many segments will be processed simultaneously, use the WALG_GP_DELETE_CONCURRENCY
setting. The default value is 1
.
To control the minimal size of the AO/AOCS segment file to be uploaded into the shared storage, use the WALG_GP_AOSEG_SIZE_THRESHOLD
. The higher this value, the bigger the size of a single backup and the smaller the size of the shared AO/AOCS storage folder. Default value is 1048576 (1MB)
.