Giter VIP home page Giter VIP logo

flowpy's Introduction

Hola, I'm Patty ๐Ÿ‘‹

LinkedIn Badge Medium Badge PGP Public Key Badge

He/Him

I'm a developer from sunny little Darwin in Australia. I currently work for a healthcare company building integrations for our internal EMR system across many countries around the globe. Python is my favourite programming language, and it's where I feel the most at home. Type hinting is fantastic, and I am a big fan of Mypy and Pydantic. When I'm not flexing my fingers on the keyboard I'm usually brewing or judging beer, playing volleyball or riding my bike.

Current Projects

I'm currently working on a couple of different personal projects. You can check them out below. Please get involved if you like; I'm always looking for help ๐Ÿ˜ƒ. I love open-source software, and I'm keen to get more people involved with some of the things I'm working on at the moment.

Pypes

Pypes is a job/pipeline scheduler for High-Performance Computing. It's a bit like snakemake, but the plan is to simplify it. If you are into that type of thing, don't hesitate to hit me up about it. I'm not using the HPC I have access to as much as I used to, but I would be keen to get the ball rolling again if there was interest ๐Ÿ™‚.

Julienne

Julienne is an integration engine/data pipeline. The basic idea is that a source (API endpoint/DB/CSV file etc.) ingests data and then pipes that data into a transform pipeline. The transform pipeline consists of some python code run on celery workers that do filtering/transformation/enrichment before sending onto a "sink". The places that data can be pushed to are called sinks, which could be files, databases, HTTP APIs, MLLP servers, etc.

PELT Studio

PELT Studio, it stands for Python Extract Load Transform Studio. It's what it sounds like ๐Ÿคท. It has a python backend with a svelte front end. You can design SQL ELT flows with a graph/network/DAG style interface, and it will (one day) orchestrate them. Currently, it only produces queries you run manually, but there's some good potential if I can find others to build it with me ๐Ÿ˜€.

Languages & Frameowrks

Python ๐Ÿง‘โ€๐Ÿซ

  • FastAPI โค๏ธ
  • Django
  • Numpy
  • Pandas
  • Keras / Tensorflow / PyTorch

JS/Typescript

  • Svelte โค๏ธ
  • D3
  • React

Other Languanges

  • Rust ๐Ÿง‘โ€๐ŸŽ“
  • Julia
  • R ๐Ÿคท
  • Lua ๐Ÿง‘โ€๐ŸŽ“

flowpy's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flowpy's Issues

New feature: Conditionals

This feature would allow adding conditions on the DataFrame and then assigning value for a new column

Example

A > 5 & B < 10; Create Flag as 1, else Flag as 0

This could then be used to create masks on the data to be used by the already present Filter block. (for future enhancements)

New Feature: Aggregate and calculate

This feature would emulate the typical pandas groupby operations and provide calculating metrics like mean, count, min max etc

Example

Groupby key - Multiselect for columns
Metric - mean, count, std, min, max etc.

Inclusion of a custom groupby metric (as a function) can be thought of for future enhancements

Design Discussion to provide field inputs/parameters as dropdowns

Hi @schlerp, I have been thinking it would be useful provide input fields with drop downs to the user. As a reference, I use bamboolib for personal projects and they do this very well. Link: https://docs.bamboolib.8080labs.com/documentation/getting-started

Started this issue to discuss on a design for implementation since the same design would be usable for the majority of the app, ranging from column name selection all the way to hyper parameters to aggregation functions like datetime format/resampling frequency etc.

Boolean parameters will still be kept as check boxes, as has been implemented for index in csv output block.

Add instructions how to install yarn

Hi,
I tried to install and run flowpy with the help of the instructions in the README.md. But at least on the machines (Windows 7 and MacOS) which I tried to run flowpy the yarn-program was always missing.
Can you add instructions on how to install the missing program? It seems that it does not get installed by the python modules in the requirements.txt.
For someone who does have it already installed, it could deter them from trying out flowpy.

New feature: Compute

This feature would involve creating a new column from existing ones. Support for formulas like the excel formula bar would be the expected functionality.

Example

A**2
A + B / C
A == C
A = 0

Target column name would also need to be provided here, which can be a new column or an existing one.

I would like to start contributing from this feature, will generate a pull request once complete. Any thoughts?

Additional features

Hi @schlerp , I happened to come across your medium article and had a similar idea parked for a while now: to profile existing data processing pipelines/scripts/notebooks to build a representation of the data flow. I believe this could be a future feature to this module, so this is very exciting to me. I work in Data Science and would find this extremely useful.

I'd like to contribute to adding new features on the pandas front, since I do not know much on the UI side of things. A few regular operations I use are listed below:

  1. Basic feature/column computations
  2. Aggregation using groupby
  3. Resampling time indices
  4. Replacing and imputing values
  5. Pivot, melt and stacking

Before I try to modify anything, I'd like to know if there's a tracker of sorts for current development activity and upcoming features already being worked on.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.