This dataset extension provides the #in_batches
method. The method splits dataset in parts and yields it.
Note: currently only PostgreSQL database is supported.
Add this line to your application's Gemfile:
gem 'sequel-batches'
In order to use the feature you should enable the extension:
DB.extension(:batches)
After that the #in_batches
method becomes available on dataset:
User.where(role: "admin").in_batches(of: 4) do |ds|
ds.delete
end
Finally, here's an example including all the available options:
options = {
of: 4,
pk: [:project_id, :external_user_id],
start: { project_id: 2, external_user_id: 3 },
finish: { project_id: 5, external_user_id: 70 },
order: :desc,
}
Event.where(type: "login").in_batches(**options) do |dataset|
dataset.delete
end
You can set the following options:
Overrides primary key of your dataset. This option is required in case your table doesn't have a real PK, otherwise you will get Sequel::Extensions::Batches::MissingPKError
.
Note that you have to provide columns that don't contain NULL values, otherwise this may not work as intended. You will receive Sequel::Extensions::Batches::NullPKError
in case batch processing detects a NULL value on it's way, but it's not guaranteed since it doesn't check all the rows for performance reasons.
Sets chunk size (1000 by default).
A hash { [column]: <start_value> }
that represents frame start for batch processing. Note that you will get Sequel::Extensions::Batches::InvalidPKError
in case you provide a hash with wrong keys (ordering matters as well).
Same as start
but represents the frame end.
Specifies the primary key order (can be :asc
or :desc
). Defaults to :asc
.
Bug reports and pull requests are welcome on GitHub at https://github.com/umbrellio/sequel-batches.
The gem is available as open source under the terms of the MIT License.