Pipes is a PHP Extract Transform Load [ETL] package for Laravel or Laravel Zero
You can install the package via composer:
composer require jwhulette/pipes
-
Create a new EtlPipe object.
-
Add an extractor to the object to read the input file
-
Add one or more transforms to transform the data
-
You can add as many transformers as you want.
-
Data is passed to the transfomers in the order they are defined
-
-
Add a loader to save the data
- Data is passed line by line in the pipeline using the generators
$etl = new EtlPipe();
$etl->extract(new CsvExtractor('my-file.csv'));
$etl->transforms([
new CaseTransformer()
->transformColumn('first_name', 'lower'),
new TrimTransformer(),
]);
$etl->load(new CsvLoader('saved-file.csv'));
$etl->run();
or
(new EtlPipe())
->extract(new CsvExtractor('my-file.csv'))
->transforms([
new CaseTransformer()
->transformColumn('first_name', 'lower'),
new TrimTransformer(),
])
->load(new CsvLoader('saved-file.csv'))
->run();
I used the datasets from the below link to test the library performance
http://eforexcel.com/wp/downloads-18-sample-csv-files-data-sets-for-testing-sales/
Sample runs on my notebook:
- MacBook Pro (Retina, 15-inch, Late 2013)
- 2.3 GHz Quad-Core Intel Core i7
- 16 GB 1600 MHz DDR3
Using the following pipeline:
- Transform the Sales Channel column value to lowercase
- Trim the values in all columns
- Format the date in the Order Date & Ship Date values
(new EtlPipe())
->extract(new CsvExtractor($filename))
->transformers([
(new CaseTransformer())->transformColumn('Sales Channel', 'lower'),
(new TrimTransformer())->transformAllColumns(),
(new DateTimeTransformer())->transformColumn('Order Date')
->transformColumn('Ship Date'),
])
->load(new CsvLoader($filepath.'/output/output.csv'))
->run();
Reading, transforming and writing to another csv file.
---- Processing file: 100000 Sales Records.csv ---- Peak usage: 10.599MB of memory used. Total execution time in seconds: 4.331 ---- Processing file: 1000000 Sales Records.csv ---- Peak usage: 10.599MB of memory used. Total execution time in seconds: 44.176
Reading XLSX file, tranforming and inserting into sqlite database
---- Processing file: 100000 Sales Records.xlsx ---- Peak usage: 14.996MB of memory used. Total execution time in seconds: 33.372
composer test
Please see CHANGELOG for more information on what has changed recently.
Please see CONTRIBUTING for details.
Please review our security policy on how to report security vulnerabilities.
The MIT License (MIT). Please see License File for more information.