Giter VIP home page Giter VIP logo

out_of_core_fft's Introduction

out_of_core_fft Status of automatic build and test suite MIT license

Fourier transforms are highly nonlocal, which can cause problems when dealing with very large data sets. In particular, standard algorithms cannot work with data sets too large to fit into memory. On the other hand, the classic Cooley-Tukey FFT algorithm shows that discrete Fourier transforms can be split up into smaller sub-problems. This module provides functions for FFTs that can work with the data directly on disk, extracting small subsets that fit into memory, working on each individually, and then combining back onto disk to get the final result. This implementation is based on the algorithm presented by Thomas H. Cormen in "Algorithms for parallel processing" (1999). A nontrivial part of the implementation involves transposing the data on disk, for which I created a relatively simple, but fairly fast, function included here simply as transpose.

Usage

These functions assume that the data to be manipulated are stored in HDF5 files. The FFT and inverse FFT are called with something like

import out_of_core_fft
out_of_core_fft.fft('input.h5', 'x', 'output.h5', 'X')
out_of_core_fft.ifft('input2.h5', 'X', 'output2.h5', 'x')

Here, x and X are names for the datasets within the HDF5 files. Note that nothing is returned, because the result is stored on disk, as requested.

See the docstrings for more details.

Acknowledgments

The work of creating this code was supported in part by the Sherman Fairchild Foundation and by NSF Grants No. PHY-1306125 and AST-1333129.

out_of_core_fft's People

Contributors

moble avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

out_of_core_fft's Issues

H5 data type issue

Hi, I am trying to do FFT on WAV files on disk, so I tried to put creat H5 files from the wav data and then make the FFT. I am not sure if the problem is with the H5 file creation or is a bug.

By the way, do you know another way to do FFT on WAV files from disk without moving them to RAM?

Thanks

def wavread(file_name):
    from scipy.io.wavfile import read
    fs, y = read(file_name)
    return fs,np.array(y,dtype=np.float64)/(2**15-1)

def power_two(n):
    return int(np.ceil(np.log2(n)))

fs,s = wavread('out.wav')
x = np.zeros(2**power_two(s.shape[0]))
x[:s.shape[0]]=s[:,0]

fp = h5py.File('input.h5','w',)
gr = fp.create_group('x')
gr.create_dataset('x',data=x,shape=x.shape)
fp.close()

out_of_core_fft.fft('input.h5', 'x', 'output.h5', 'X')

Exception ignored in: 'h5py._proxy.make_reduced_type'
ValueError: Cannot return member number (Operation not supported for type class)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-54c55743fa99> in <module>()
----> 1 out_of_core_fft.fft('input.h5', 'x', 'output.h5', 'X')

/usr/local/lib/python3.5/dist-packages/out_of_core_fft/__init__.py in fft(*args, **kwargs)
    403 def fft(*args, **kwargs):
    404     kwargs['inverse_fft'] = False
--> 405     _general_fft(*args, **kwargs)
    406 fft.__doc__ = "Perform FFT for very large dataset stored in HDF5 file" + _general_fft.__doc__

/usr/local/lib/python3.5/dist-packages/out_of_core_fft/__init__.py in _general_fft(infile, ingroup, outfile, outgroup, overwrite, mem_limit, inverse_fft, show_progress)
    345                 if show_progress and k % max(C//20, 20) == 1:
    346                     print("\t\t\t{0} of {1}".format(k, C))
--> 347                 y[k] = np.exp(np.arange(R)*(-k*2j*np.pi/N)) * np.fft.fft(y[k])
    348 
    349         # Step 4:

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-huypgcah-build/h5py/_objects.c:2840)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-huypgcah-build/h5py/_objects.c:2798)()

/usr/local/lib/python3.5/dist-packages/h5py/_hl/dataset.py in __setitem__(self, args, val)
    628         mspace = h5s.create_simple(mshape_pad, (h5s.UNLIMITED,)*len(mshape_pad))
    629         for fspace in selection.broadcast(mshape):
--> 630             self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl)
    631 
    632     def read_direct(self, dest, source_sel=None, dest_sel=None):

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-huypgcah-build/h5py/_objects.c:2840)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-huypgcah-build/h5py/_objects.c:2798)()

h5py/h5d.pyx in h5py.h5d.DatasetID.write (/tmp/pip-huypgcah-build/h5py/h5d.c:3694)()

h5py/_proxy.pyx in h5py._proxy.dset_rw (/tmp/pip-huypgcah-build/h5py/_proxy.c:1971)()

h5py/_proxy.pyx in h5py._proxy.needs_proxy (/tmp/pip-huypgcah-build/h5py/_proxy.c:3964)()

ValueError: Not a datatype (Not a datatype)

Overwriting array not working.

When I try aand run the out_of_core_fft method using the overwrite=True command I get an error that I belive is telling me that the hdf5 group dtype is not a class.

My package details:
h5py = 2.7.1
numpy = 1.14.1
python = 3.5.2

My Code:

import numpy as np
import out_of_core_fft as ooc_fft

test_data = np.random.randn(2048).astype('complex128')

TestPath = "test1.hdf5"
TestFile = h5py.File(TestPath, 'w')
tester1 = TestFile.create_group('PHI')
tester1.create_dataset('phi',data=test_data,shape=test_data.shape,dtype=test_data.dtype)
TestFile.close()

ooc_fft.fft(infile=TestPath,ingroup='/PHI/phi',overwrite=True)

The Traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-44-2ffe60dac043> in <module>()
----> 1 ooc_fft.fft(infile=TestPath,ingroup='/PHI/phi',overwrite=True)#,outfile=TestPath,outgroup='/phi_out/phi_out')#,

/Users/jeffreyhazboun/anaconda3/lib/python3.5/site-packages/out_of_core_fft/__init__.py in fft(*args, **kwargs)
    403 def fft(*args, **kwargs):
    404     kwargs['inverse_fft'] = False
--> 405     _general_fft(*args, **kwargs)
    406 fft.__doc__ = "Perform FFT for very large dataset stored in HDF5 file" + _general_fft.__doc__

/Users/jeffreyhazboun/anaconda3/lib/python3.5/site-packages/out_of_core_fft/__init__.py in _general_fft(infile, ingroup, outfile, outgroup, overwrite, mem_limit, inverse_fft, show_progress)
    291         if overwrite:
    292             x = f_in[ingroup]
--> 293             if not issubclass(x.dtype, numbers.Complex):
    294                 raise ValueError("Flag `overwrite` is True, but input dtype={0} is not complex".format(x.dtype))
    295             X = x

/Users/jeffreyhazboun/anaconda3/lib/python3.5/abc.py in __subclasscheck__(cls, subclass)
    219         # Check if it's a subclass of a registered class (recursive)
    220         for rcls in cls._abc_registry:
--> 221             if issubclass(subclass, rcls):
    222                 cls._abc_cache.add(subclass)
    223                 return True

TypeError: issubclass() arg 1 must be a class

Chunk size setting not working

After attempting to overwrite an array, see #2 , I then tried to write the FFT to a new group in the same hdf5 file. This then gave the error message below where it seems the chunk size is off. The only difference in the code is that I have made a new group in the hdf5 file and tried to write the FFT to that group. If I remove the lines beginning in tester2 I get the same error message.

My package details:
h5py = 2.7.1
numpy = 1.14.1
python = 3.5.2

import numpy as np
import out_of_core_fft as ooc_fft

test_data = np.random.randn(2048).astype('complex128')

TestPath = "test1.hdf5"
TestFile = h5py.File(TestPath, 'w')
tester1 = TestFile.create_group('PHI')
tester1.create_dataset('phi',data=test_data,shape=test_data.shape,dtype=test_data.dtype)
TestFile.close()

TestFile2 = h5py.File(TestPath, 'r+')
tester2 = TestFile2.create_group('PHI_out')
tester2.create_dataset('phi_out',shape=test_data.shape,dtype=test_data.dtype)
TestFile2.close()

ooc_fft.fft(infile=TestPath,ingroup='/PHI/phi',outfile=TestPath,outgroup='/PHI_out/phi_out')
ValueError                                Traceback (most recent call last)
<ipython-input-52-df05e664715a> in <module>()
----> 1 ooc_fft.fft(infile=TestPath,ingroup='/PHI/phi',outfile=TestPath,outgroup='/PHI_out/phi_out')#,

/Users/jeffreyhazboun/anaconda3/lib/python3.5/site-packages/out_of_core_fft/__init__.py in fft(*args, **kwargs)
    403 def fft(*args, **kwargs):
    404     kwargs['inverse_fft'] = False
--> 405     _general_fft(*args, **kwargs)
    406 fft.__doc__ = "Perform FFT for very large dataset stored in HDF5 file" + _general_fft.__doc__

/Users/jeffreyhazboun/anaconda3/lib/python3.5/site-packages/out_of_core_fft/__init__.py in _general_fft(infile, ingroup, outfile, outgroup, overwrite, mem_limit, inverse_fft, show_progress)
    300             if outgroup in f_out:
    301                 del f_out[outgroup]
--> 302             X = f_out.create_dataset(outgroup, shape=x.shape, dtype=x.dtype, chunks=(sqrt_n_c_e,))
    303 
    304         # Determine appropriate size and shape

/Users/jeffreyhazboun/anaconda3/lib/python3.5/site-packages/h5py/_hl/group.py in create_dataset(self, name, shape, dtype, data, **kwds)
    104         """
    105         with phil:
--> 106             dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
    107             dset = dataset.Dataset(dsid)
    108             if name is not None:

/Users/jeffreyhazboun/anaconda3/lib/python3.5/site-packages/h5py/_hl/dataset.py in make_new_dset(parent, shape, dtype, data, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times)
     84         errmsg = "Chunk shape must not be greater than data shape in any dimension. "\
     85                  "{} is not compatible with {}".format(chunks, shape)
---> 86         raise ValueError(errmsg)
     87 
     88     if isinstance(dtype, Datatype):

ValueError: Chunk shape must not be greater than data shape in any dimension. (8191,) is not compatible with (2048,)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.