Giter VIP home page Giter VIP logo

weld's People

Contributors

bathtor avatar cgmossa avatar cirla avatar deepakn94 avatar dobachi avatar harumichi avatar hustnn avatar hvanhovell avatar jialinding avatar jjthomas avatar kaz7 avatar kumagi avatar mateiz avatar max-meldrum avatar mihai-varga avatar nikhilsimha avatar paddyhoran avatar parimarjan avatar pattern avatar radujica avatar rahulpalamuttam avatar renato2099 avatar rgankema avatar sarutak avatar smacke avatar snakescott avatar sppalkia avatar viirya avatar willcrichton avatar winding-lines avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

weld's Issues

Grizzly is Python 2.7 only

It will be important to run on Python 3, preferably both 2.7 and 3.5/3.6 with a single codebase (the six module helps with this)

Composite builder example is broken

When I execute the composite builder example in the repl:

let b1 = appender[i32];
let b2 = appender[i32];
let data = [1, 2, 3];
let bs = for(
  data,
  {b1, b2},
  |bs: {appender[i32], appender[i32]}, i: i64, n: i32| {merge(bs.$0, n), merge(bs.$1, 2 * n)}
);
result(bs)

It fails with the following error:

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Abort trap: 6

Also note that the example in the documentation is not correct.

I am on mac OS X 10.11.6.

Load libweld.so from $WELD_HOME/target/debug dir in binding.py

Hi,
When I followed the tutorial to have a try, an error was triggered as below:

>>> import numpy
>>> from hello_weld import *
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "hello_weld.py", line 2, in <module>
    from weld.weldobject import *
  File "build/bdist.linux-x86_64/egg/weld/weldobject.py", line 12, in <module>
  File "build/bdist.linux-x86_64/egg/weld/bindings.py", line 30, in <module>
  File "/root/SkyDiscovery/lib/python2.7/ctypes/__init__.py", line 365, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /data/weld/weld/target/debug/libweld.so: cannot open shared object file: No such file or directory
>>> 

I built weld with cargo build --release command, and there is no debug directory under target directory.
I checked the python/weld/bindings.py content as follows:

home = os.environ["WELD_HOME"]
if home[-1] != "/":
    home += "/"

path = home + "target/debug/" + path

# Load the Weld Dynamic Library.
weld = CDLL(path)

Is the binding.py just be used in debug mode? Finding libweld.so automatically may be better.

Perform the lazy encoding conversion

I found that memory usage of grizzle is much larger than pandas. Then I go into it and find that it is may be caused by change the encoding type when calling raw_column = np.array(self.df[key], dtype=str).

Can it be optimized by keeping the original encoding type in dataframe[key].values and perform the conversion at runtime (lazy encoding conversion)

If the way I proposed to optimize it is correct. I can take this issue.
Thanks.

Memory usage increase continuously

Hi @deepakn94 ,
When I used grizzly in my python program, I found that the process of the program was killed automatically. Through debugging, I found the reason is memory usage increase continuously.
I extracted a piece of the major logic , just load data from csv and then query operations, details as below:

import pandas as pd
import grizzly.grizzly as gr
import grizzly.numpy_weld as gn

df = pd.read_csv("total_price_completed.csv")
weld_df = gr.DataFrameWeld(df)
price_df = weld_df[weld_df['name'] == '000001.SZ']
price_list = price_df['open']
result_list = price_list.evaluate(verbose=False)

Above code was executed many times in a for loop, so memory usage reached the limit and the process was killed by system.

Weld APIs for Java

The API should hide the complexity of setting up JNI, etc. from most users.

parallel optimizations

  1. parallelize result call for vecbuilder and dictbuilder (need to create tasks for "continuation" of result calls)
  2. use a register merger for inner loops (instead of writing to global thread-local pointer)

Memory layouts for string processing

hi folks,

I'm excited about the Weld project. I have been looking at the Weld data structures and way that the runtime interacts with memory and have some questions, particularly about non-numeric data.

I see here https://github.com/weld-project/weld/blob/master/python/grizzly/numpy_weld_convertor.cpp#L155 that a Weld string vector is semantically a vector of pointers. While this is one possible way to deal with arrays of variable-length types, I am wondering what it would take to expand to other kinds of non-pointer-based memory layouts, which can yield better processing efficiency for the CPU.

In pandas for example, our likely long term plan is to move toward a "packed" columnar memory model (as specified in Apache Arrow) for strings that is like:

length: 4
validity_bits [0 0 0 0 1 1 1 1] + padding for alignment
offsets: [0, 3, 6, 9, 12] + padding
data: 'foobarbazqux' + padding

Beyond "packing" the strings in a contiguous buffer, you can also dictionary encode for better efficiency. I am curious what are you plans generally along these lines and if there are any opportunities for standardizing different string memory layouts (Weld may need to support more than one memory layout) to make it easier for other systems to integrate with Weld.

cc @julienledem

Code to replicate performance metrics

Hi,
This is a compelling library, do you have any of the code used to generate the reported performance increases over various frameworks mentioned here? I'm particularly curious about the tensorflow benchmark.

Use flake8 to enforce Python style conventions

$ flake8 python/
python/grizzly/encoders.py:6:1: F401 'subprocess' imported but unused
python/grizzly/encoders.py:8:1: F403 'from weld.weldobject import *' used; unable to detect undefined names
python/grizzly/encoders.py:21:1: E302 expected 2 blank lines, found 1
python/grizzly/encoders.py:49:13: E128 continuation line under-indented for visual indent
python/grizzly/encoders.py:138:13: E128 continuation line under-indented for visual indent
python/grizzly/grizzly.py:6:1: F403 'from weld.weldobject import *' used; unable to detect undefined names
python/grizzly/grizzly.py:65:80: E501 line too long (84 > 79 characters)
python/grizzly/grizzly.py:67:80: E501 line too long (84 > 79 characters)
python/grizzly/grizzlyImpl.py:7:1: F403 'from encoders import *' used; unable to detect undefined names
python/grizzly/grizzlyImpl.py:8:1: F403 'from weld.weldobject import *' used; unable to detect undefined names
python/grizzly/grizzlyImpl.py:81:80: E501 line too long (85 > 79 characters)
python/grizzly/grizzlyImpl.py:201:80: E501 line too long (81 > 79 characters)
python/grizzly/grizzlyImpl.py:202:35: E128 continuation line under-indented for visual indent
python/grizzly/grizzlyImpl.py:208:80: E501 line too long (92 > 79 characters)
python/grizzly/grizzlyImpl.py:241:80: E501 line too long (81 > 79 characters)
python/grizzly/grizzlyImpl.py:242:35: E128 continuation line under-indented for visual indent
python/grizzly/grizzlyImpl.py:274:35: E128 continuation line under-indented for visual indent
python/grizzly/grizzlyImpl.py:357:80: E501 line too long (81 > 79 characters)
python/grizzly/grizzlyImpl.py:358:35: E128 continuation line under-indented for visual indent
python/grizzly/grizzlyImpl.py:359:35: E128 continuation line under-indented for visual indent
python/grizzly/grizzlyImpl.py:388:80: E501 line too long (87 > 79 characters)
python/grizzly/lazyOp.py:3:1: F403 'from weld.weldobject import *' used; unable to detect undefined names
python/grizzly/numpyImplWeld.py:8:1: F403 'from encoders import *' used; unable to detect undefined names
python/grizzly/numpyImplWeld.py:9:1: F403 'from weld.weldobject import *' used; unable to detect undefined names
python/grizzly/numpyImplWeld.py:52:80: E501 line too long (81 > 79 characters)
python/grizzly/numpyImplWeld.py:53:35: E128 continuation line under-indented for visual indent
python/grizzly/numpyImplWeld.py:86:80: E501 line too long (85 > 79 characters)
python/grizzly/numpyImplWeld.py:120:80: E501 line too long (96 > 79 characters)
python/grizzly/numpyImplWeld.py:128:80: E501 line too long (85 > 79 characters)
python/grizzly/numpyImplWeld.py:129:35: E128 continuation line under-indented for visual indent
python/grizzly/numpyImplWeld.py:129:80: E501 line too long (89 > 79 characters)
python/grizzly/numpyWeld.py:5:1: F403 'from weld.weldobject import *' used; unable to detect undefined names
python/grizzly/numpyWeld.py:83:80: E501 line too long (86 > 79 characters)
python/grizzly/numpyWeld.py:89:80: E501 line too long (86 > 79 characters)
python/weld/__init__.py:2:1: F401 'bindings' imported but unused
python/weld/__init__.py:3:1: F401 'encoders' imported but unused
python/weld/__init__.py:4:1: F401 'types' imported but unused
python/weld/__init__.py:5:1: F401 'weldobject' imported but unused
python/weld/bindings.py:5:1: F403 'from ctypes import *' used; unable to detect undefined names
python/weld/bindings.py:8:1: F401 'os' imported but unused
python/weld/bindings.py:25:1: E302 expected 2 blank lines, found 1
python/weld/bindings.py:25:30: E701 multiple statements on one line (colon)
python/weld/bindings.py:26:1: E302 expected 2 blank lines, found 0
python/weld/bindings.py:26:28: E701 multiple statements on one line (colon)
python/weld/bindings.py:27:1: E302 expected 2 blank lines, found 0
python/weld/bindings.py:27:29: E701 multiple statements on one line (colon)
python/weld/bindings.py:29:1: E302 expected 2 blank lines, found 1
python/weld/bindings.py:32:80: E501 line too long (82 > 79 characters)
python/weld/bindings.py:41:80: E501 line too long (97 > 79 characters)
python/weld/encoders.py:6:1: F403 'from types import *' used; unable to detect undefined names
python/weld/encoders.py:13:1: E302 expected 2 blank lines, found 1
python/weld/encoders.py:25:1: E302 expected 2 blank lines, found 1
python/weld/encoders.py:57:1: E302 expected 2 blank lines, found 1
python/weld/encoders.py:58:6: E111 indentation is not a multiple of four
python/weld/encoders.py:59:10: E111 indentation is not a multiple of four
python/weld/encoders.py:60:10: E111 indentation is not a multiple of four
python/weld/encoders.py:61:10: E111 indentation is not a multiple of four
python/weld/types.py:8:1: F403 'from ctypes import *' used; unable to detect undefined names
python/weld/types.py:17:1: W293 blank line contains whitespace
python/weld/types.py:34:1: W293 blank line contains whitespace
python/weld/types.py:60:1: W293 blank line contains whitespace
python/weld/types.py:79:1: W293 blank line contains whitespace
python/weld/types.py:123:1: W293 blank line contains whitespace
python/weld/types.py:145:1: W293 blank line contains whitespace
python/weld/types.py:162:1: W293 blank line contains whitespace
python/weld/weldobject.py:8:1: F401 'sys' imported but unused
python/weld/weldobject.py:9:1: F401 'os' imported but unused
python/weld/weldobject.py:10:1: F401 'np' imported but unused
python/weld/weldobject.py:15:1: F403 'from types import *' used; unable to detect undefined names
python/weld/weldobject.py:17:1: E302 expected 2 blank lines, found 1
python/weld/weldobject.py:35:1: E302 expected 2 blank lines, found 1
python/weld/weldobject.py:47:1: E302 expected 2 blank lines, found 1
python/weld/weldobject.py:61:80: E501 line too long (82 > 79 characters)
python/weld/weldobject.py:66:80: E501 line too long (84 > 79 characters)
python/weld/weldobject.py:69:80: E501 line too long (84 > 79 characters)
python/weld/weldobject.py:124:80: E501 line too long (100 > 79 characters)
python/weld/weldobject.py:125:17: E128 continuation line under-indented for visual indent
python/weld/weldobject.py:127:80: E501 line too long (90 > 79 characters)
python/weld/weldobject.py:139:80: E501 line too long (88 > 79 characters)
python/weld/weldobject.py:170:80: E501 line too long (97 > 79 characters)
python/weld/weldobject.py:178:80: E501 line too long (93 > 79 characters)
python/weld/weldobject.py:198:1: W391 blank line at end of file

WeldValue destructor

@sppalkia @deepakn94
Should we have this destructor for WeldValue in bindings.py?
It's in the pandas code as well but commented out.

Once the WeldValue object goes out of scope in the python runtime, python decides to clean this up
As a result it starts messing with our return value since we call weld_value_free.

Parlib breaks if run from a different directory

The parlib library is not found by the runtime if a file using weld is run from anything but the root cargo directory. As example, the example C API programs fail unless they're run from the topmost directory.

Separate source files for weld op templates

In the likely case that there are performance - sensitive implementations of common operations, say for example, a matrix decomposition, it would be great to have the weld templates in their own files so that any bindings to other languages or libraries could link to the preferred implementation without needing to have weld-specific knowledge about how sensitive implementation choices are handled internally.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.