Giter VIP home page Giter VIP logo

bentools-etl's Introduction

Latest Stable Version License CI Workflow Coverage Total Downloads

Okay, so you heard about the Extract / Transform / Load pattern, and you're looking for a PHP library to do the stuff. Alright, let's go!

bentools/etl is a versatile PHP library for implementing the Extract, Transform, Load (ETL) pattern, designed to streamline data processing tasks.

Table of Contents

Concepts

Let's cover the basic concepts:

  • Extract: you have a source of data (a database, a CSV file, whatever) - an extractor is able to read that data and provide an iterator of items
  • Transform: apply transformation to each item. A transformer may generate 0, 1 or several items to load (for example, 1 item may generate multiple SQL queries)
  • Load: load transformed item to the destination. For example, extracted items have been transformed to SQL queries, and your loader will run those queries against your database.

Installation

composer require bentools/etl

Warning

Current version (4.0) is a complete redesign and introduces significant BC (backward compatibility) breaks. Avoid upgrading from ^2.0 or ^3.0 unless you're fully aware of the changes.

Usage

Now let's have a look on how simple it is:

use BenTools\ETL\EtlExecutor;

// Given
$singers = ['Bob Marley', 'Amy Winehouse'];

// Transform each singer's name to uppercase and process the array
$etl = (new EtlExecutor())
    ->transformWith(fn (string $name) => strtoupper($name));

// When
$report = $etl->process($singers);

// Then
var_dump($report->output); // ["BOB MARLEY", "AMY WINEHOUSE"]

OK, that wasn't really hard, here we basically don't have to extract anything (we can already iterate on $singers), and we're not loading anywhere, except into PHP's memory.

You may ask, "why don't you just array_map('strtoupper', $singers) ?" and you're totally right.

But sometimes, extracting, transforming and / or loading get a little more complex. You may want to extract from a file, a crawled content on the web, perform one to many transformations, maybe skip some items, or reuse some extraction, transformation or loading logic.

Here's another example of what you can do:

use BenTools\ETL\EventDispatcher\Event\TransformEvent;
use BenTools\ETL\Loader\JSONLoader;

use function BenTools\ETL\extractFrom;

$executor = extractFrom(function () {
    yield ['firstName' => 'Barack', 'lastName' => 'Obama'];
    yield ['firstName' => 'Donald', 'lastName' => 'Trump'];
    yield ['firstName' => 'Joe', 'lastName' => 'Biden'];
})
    ->transformWith(fn (array $item) => implode(' ', array_values($item)))
    ->loadInto(new JSONLoader())
    ->onTransform(function (TransformEvent $event) {
        if ('Donald Trump' === $event->transformResult->value) {
            $event->state->skip();
        }
    });

$report = $executor->process();

dump($report->output); // string '["Barack Obama", "Joe Biden"]'

Or:

$report = $executor->process(destination: 'file:///tmp/presidents.json');
var_dump($report->output); // string 'file:///tmp/presidents.json' - content has been written here

You get the point. Now you're up to write your own workflows!

Continue reading the Getting Started Guide.

Contribute

Contributions are welcome! Don't hesitate to suggest recipes.

This library is 100% covered with Pest tests.

Please ensure to run tests using the command below and maintain code coverage before submitting PRs.

composer ci:check

License

MIT.

bentools-etl's People

Contributors

ben-synapse avatar bpolaszek avatar nclavaud avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

bentools-etl's Issues

Warnings when using PHP 8

Hi ๐Ÿ‘‹

Thank you very much for this lib. I managed to implement a full, custom ETL workflow and all went smoothly.

Using PHP 8, I got those two warnings, though:

PHP Warning:  "resource" is not a supported builtin type and will be interpreted as a class name. Write "\Safe\resource" or import the class with "use" to suppress this warning in vendor/thecodingmachine/safe/generated/sockets.php on line 797
PHP Warning:  "integer" will be interpreted as a class name. Did you mean "int"? Write "\Safe\integer" or import the class with "use" to suppress this warning in vendor/thecodingmachine/safe/generated/swoole.php on line 17

I guess this is because the referenced version of the thecodingmachine/safe lib is quite old.

Illegal offset type error could occur during transformations

Contract of a Transformer is to return a \Generator.

When a transformer returns a Generator with either:

  • Duplicate keys
  • Objects as keys

Loader doesn't receive the accurate items to load.
In the 2nd case, a Illegal offset type error warning is raised.

This is due to the ETL process which requires to traverse generators and build them back again before sending them to loaders, so as to allow hooking on transformation errors and safely dismiss an item which failed to be transformed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.