Giter VIP home page Giter VIP logo

python_sparse_list's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

python_sparse_list's Issues

SparseVector

Your yummy test-suite has spurred me to try my hand at making a quick'n dirty implementation of a sparse list using not a dictionary of keys, but linked lists : two numpy.ndarrays internally, one of integers for the keys, and one for the values, whose dtype is left to the choice of the user.

If this works, I'll probably make a sparse_vector lib, unless you think this lib should hold multiple implementations. (my inner sloth hopes that you do)

I called it vector because it's specialized for numerical values.

The idea is to make the init with initial values and the setting via list slicing (see PR #2) both faster.

I tried scipy.sparse and pandas.Series but they don't fit my immediate needs.

My constraints are :
1. Python only
2. Performance
3. Available now :)

I just want a simple sparse 1D numerical vector that works well with numpy.

I guess scipy.sparse should make separate classes for 1D, 2D, 3D, and ND.
They only support 2D right now (and therefore 1D), with classes such as scipy.sparse.lil_matrix, which is also a linked list implementation.

But the API does not feel right for vectors, look at the instantiation :

# data = [6.28, 1.62, 2.72, 2.675]
# indices = [42, 1e9, 6e6, 1]  # not sorted (constraint)
vector = lil_matrix(
    (values, (np.zeros(len(indices)), indices)),  # data, (rowinds, colsinds)
    shape=(1, 1e16)
)

Besides having to np.zeros(len(indices)) to fulfill the 2D signature, there's no settable default value,
which is zero. (and I need another)

And there's a bunch of 2D-only "features" such as :

vector[0] # same object as vector, shape (1, 1e16)
# it somewhat makes sense in 2D, but not in 1D

I have not tried sympy.matrices.sparse.SparseMatrix yet.


Anyhow, I'm running benchmarks with sparse_vector :

import benchmark
import numpy as np

from sparse_list import SparseList
from sparse_vector import SparseVector


class BenchmarkSparse(benchmark.Benchmark):

    each = 100          # number of runs
    full_size = 100000  # total size of sparse lists and vectors
    data_size = 1000    # actual data size of sparse lists and vectors

    def setUp(self):
        # Can also specify tearDown, eachSetUp, and eachTearDown
        self.a = np.random.rand(self.data_size)
        self.k = np.arange(self.data_size)

    def test_sparse_list_init(self):
        sl = SparseList(self.full_size, default_value=0)

    def test_sparse_vector_init(self):
        sv = SparseVector(self.full_size, default_value=0)

    def test_sparse_list_set_with_iterables_in_slice(self):
        sl = SparseList(self.full_size, default_value=0)
        sl[self.k] = self.a

    def test_sparse_list_set_with_iterables_on_init(self):
        sl = SparseList(dict(zip(self.k, self.a)), default_value=0)
        sl.size = self.full_size

    def test_sparse_vector_set_with_iterables_in_slice(self):
        sv = SparseVector(self.full_size, default_value=0)
        sv[self.k] = self.a

    def test_sparse_vector_set_with_iterables_on_init(self):
        sv = SparseVector((self.k, self.a), default_value=0)
        sv.size = self.full_size


if __name__ == '__main__':
    benchmark.main(format="markdown", numberFormat="%.4g")

which right now yields :

Benchmark Report
================

BenchmarkSparse
---------------

                                     name | rank | runs |      mean |        sd | timesBaseline
------------------------------------------|------|------|-----------|-----------|--------------
                         sparse list init |    1 |  100 | 1.771e-06 | 1.344e-06 |           1.0
                       sparse vector init |    2 |  100 | 4.199e-06 | 1.877e-06 | 2.37012113055
 sparse vector set with iterables on init |    3 |  100 | 1.531e-05 |  7.01e-06 | 8.64468371467
   sparse list set with iterables on init |    4 |  100 | 0.0004169 | 8.095e-05 | 235.347240915
  sparse list set with iterables in slice |    5 |  100 | 0.0004247 | 7.585e-05 | 239.769851952
sparse vector set with iterables in slice |    6 |  100 |  0.008322 | 0.0005945 | 4698.04441454

Not optimal yet, but it's still quick'n dirty, I can clean that up some and maybe start comparing with lil_matrix too.

sparse_list[a:b] should return a sparse list itself

I think it would be nice to have the option to return a sparse list when taking the slice of a sparse_list. Currently it returns a standard (dense) list.
If the current behaviour of slicing is not to be changed, an alternate method could be provided for that.

Invalid implementation of `del my_list[a:b]`

Deleting slices does not behave like Python's lists:

>>> standard_list = [1,2,3]
>>> del standard_list[:2]
>>> standard_list
[3]
>>> sparse_list = SparseList([1,2,3])
>>> del sparse_list[:2]
>>> sparse_list
[None, None, 3]

Bursty Sparse List (List of Lists of Blocks)

Great work on Sparse List! I want to use a Sparse List, but I'm not sure the dictionary implementation is the most efficient for me.

Bursty Data

My particular data is sparse, but bursty. While 95% of the indices are empty, the remaining 5% are broken up into relatively bursty chunks. Rather than store everything in a dictionary, it would be nice to store a dictionary of lists, where each list contains a range of data. This could make certain operations much more efficient.

Use Case: Caching Result Sets

The main use case for this is caching result sets. Say I have a query that has 1M results, and I want to cache the ones I pull into a list-type structure. If I pull results 1000 at a time, there will be many contiguous blocks, and it will be much more efficient to store the data in some sort of list of lists structure.

Implementation

Is there any chance that this will be implemented here? Would you accept a pull request that supported this type of storage?

The data structure is not conceptually difficult, but there will be a bit of hairy code for merging contiguous blocks and going from one block to the next.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.