Giter VIP home page Giter VIP logo

orange3-hxl's Introduction

HXL visual ETL (Orange3 add-on)

GitHub Pypi: Orange3-HXLvisualETL

This is an early draft of Orange3 add-on with minimal awareness of data labeled with HXL.

To install this package, use

pip install Orange3-HXLvisualETL

Features

Data Vault Conf

[WORKING DRAFT] Configure active local data vault configurations. This allows overriding defaults.

Download Raw File

Download remote resource into a local FileRAW

Unzip Raw File

[WORKING DRAFT] Unzip (zip, gzip, bzip, ...) an FileRAW into an FileRAWCollection

Select Raw File

[DRAFT] From a local FileRAWCollection, select an FileRAW

Load Raw File

Convert a local FileRAW into Orange3 Data / DataFrame. Required to allow use with other widgets.

Supported features (*):

  • pandas.read_table
  • pandas.read_csv
  • pandas.read_excel
  • pandas.read_feather
  • pandas.read_fwf
  • pandas.read_html
  • pandas.read_json
  • pandas.json_normalize
  • pandas.read_orc
  • pandas.read_parquet
  • pandas.read_sas
  • pandas.read_spss
  • pandas.read_stata
  • pandas.read_xml

(*) Some features will require additional python packages which are not installed by default with this add-on. The user will be warned about this.

Statistical Role

Change statistical role (the "feature", "target", "meta", "ignore") using HXL patterns instead of stric exact names for the data variables.

Data Type

[DRAFT] Change the computational data type (the "numeric", "categorical" "text", "datetime") using HXL patterns instead of stric exact names for the data variables.

HXL short names

[EARLY DRAFT] Make HXLated input data with shorter variable names.

RAW Info

[DRAFT] Inspect a FileRAW or FileRAWCollection

Installation

From Pypi (recommended)

pip install Orange3-HXLvisualETL

From source

To install the add-on from source run

pip install .

To register this add-on with Orange, but keep the code in the development directory (do not copy it to Python's site-packages directory), run

pip install -e .

Documentation / widget help can be built by running

make html htmlhelp

from the doc directory.

Usage

After the installation, the widget from this add-on is registered with Orange. To run Orange from the terminal, use

orange-canvas

or

python -m Orange.canvas

The new widget appears in the toolbox bar under the section Example.

screenshot

orange3-hxl's People

Contributors

fititnt avatar

Watchers

 avatar  avatar  avatar

orange3-hxl's Issues

(to be tested) HXL widget to pre-process training referential data (already in tabular, not compiled model) to adapt to the field names / remove excessive details to increase reusability with less user steps

While making tests on https://github.com/fititnt/lsf-orange-data-mining (mostly to create manually crafted training data) I just noticed that the way the orange interface works, seems that it need from the user that the column names from the training references must match the column names of non already meta values from the working dataset.

Ok, it works as expected, but for example it would mean we would need to explain to the user how it should rename the columns for either case.

The idea

This needs some testing to check if it is necessary, but the goal would be

  • make two widgets, both accept the reference dataset and main dataset.
    • One "output" a variant of training referential data; for example, if the training data has much more information than what the users will put against, it could simplify the training data. This one is likely to be the most important
    • The second would do the same, but maybe just change the columns on the working data the Orange3 would ask the user to do it. I think this mostly happens if the working data already have the column to replace

Implement Orange Development recommendations for Responsive GUI

Related:


Currently, all the work (including ones which can take at least some seconds) are doing synchronously. This already is perceptive on the Download Raw File to a point other parts of the interface stop responding, but likely to happens to any other internal file processing (such as conversions from one raw file to another) before pass to Orange.

Another thing we need to try to implement in this add-on is if some step is likely to become unstable, we kill that step or try not make it kill entire Orange interface. However, I'm not sure if is viable to really try to enforce maximum memory one thread would be allowed to ask before be aborted, but if is viable, we really would like to have it.

To investigate: this change likely to improve how memory is released

Another point we might mitigate by this change is that on some quick tests, even if manually set variable containing huge memory object to null, the Python does release memory, but not 100%. And this is not related to Orange (or maybe directly to QT) but python assumes the program might need part of that memory back again soon, so... I think that if we offload heavy operations to different thread by default (e.g. already make Orange GUI responsive) this also means when the work is done, is clear for the runtime that whatever was using that memory will not be more necessary.

Minimal Viable Product of documentation of `orange3-hxl`


Okay. I think I got the quick general idea of how to create Orange3 add-on. But still need to know more about how to use the interface itself, because some features at first would seem necessary, already are implemented on other extensions.

While the minimal viable product of an extension still need some time (again, more because need to understand features, think like user) this very first issue already is about document the orange3-hxl.

Anyway, to avoid create a lot of other issues, maybe will do it in other repositories from @EticaAI / @HXL-CPLP and leave here mostly for the extension itself.

Things that other Orange3 extensions likely would not have

For sake of MVP, this issue likely will not implement all these features.

Converters for HXL / HXLM / HXL+RDF with BCP47 syntax

On this point, could make sense simply also port the python funcitonality we have on https://github.com/EticaAI/hxltm and https://github.com/EticaAI/lexicographi-sine-finibus

Reference tables for vocabularies (internal use)

For sake of simplify conversors, I think migth be relevant start to break some conversors not already purely with simpler rules to machine-parseable data files.

However, this also would need changes on the upstream

Pre-build reference tables related to places (e.g. COD-ABs, P-Codes)

geometries

Orange3 already have https://github.com/biolab/orange3-geo, which for example allows data visualization with maps. Likely there's other features, but this extension is quite relevant here.

However, at later point we migth need to pre-compile and share online geojsons, like these ones https://github.com/biolab/orange3-geo/tree/master/orangecontrib/geo/geojson. Maybe also allow user change the data provider.

But in any case, trying to store all the geojsons with single python package would likely to take too much space. So, while I did not checked if orange3-geo allows changes the geometries, anything additional would need to consider how to package the files.

P-Codes to Latitude/Longitude

Somewhat related: EticaAI/lexicographi-sine-finibus#45

To use orange3-geo as output, since it uses latitude/longitude pair, means most datasets would need to have this pre-compiled. This would really require fetch all geometries from ~150 of COD-ABs and create this.

However, as somewhat expected, the https://github.com/biolab/orange3-geo/tree/master/orangecontrib/geo/geojson have only world and level 1, so it would not be possible to map lower levels without also care of distributing the geometries. Another issue is that already at admin1, I think that some latitude/longitude might already be outside of the ones on orange3-geo, but this is something to test later.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.