Giter VIP home page Giter VIP logo

Comments (19)

bastibe avatar bastibe commented on July 19, 2024

You are correct, that insofar as reading the file is concerned, two dimensional indexing is no advantage to just reading all the channels.

However, I think it is much more convenient and much easier to understand to write

wave = SoundFile('file.wav')[22050:66150,1]

instead of

wave = SoundFile('file.wav')[22050:66150][:,1]

In ac983e5 , I implemented two dimensional indexing simply as syntactic sugar that converts the first expression into the second. The whole numpy array is still read, and the second index is only applied after the whole data was read. Indeed, libsndfile does not provide a way to access single channels anyway, so this is probably the only possible solution.

from python-soundfile.

bastibe avatar bastibe commented on July 19, 2024

As for wavread-like functionality, I don't see much difference between

s = wavread('file.wav')

and

s = SoundFile('file.wav')[:]

or

s = SoundFile('file.wav').read()

Do you think wavread-like functionality is really necessary?

from python-soundfile.

mgeier avatar mgeier commented on July 19, 2024

You are correct, that insofar as reading the file is concerned, two dimensional indexing is no advantage to just reading all the channels.

However, I think it is much more convenient and much easier to understand to write

wave = SoundFile('file.wav')[22050:66150,1]

instead of

wave = SoundFile('file.wav')[22050:66150][:,1]

True. But what about this:

import PySoundFile as sf
wave = sf.read('file.wav')[22050:66150,1]

In this hypothetical case, read() is a function that returns the
whole file as NumPy array.
It would internally open the file, read from it and close the file.
Or what about this:

wave = sf.read('file.wav', start=22050, stop=66150)[:, 1]

This wouldn't be the most common use, though, the most common would be:

wave = sf.read('file.wav')

... which in my opinion easier to grasp then

wave = SoundFile('file.wav')[:]

or

wave = SoundFile('file.wav').read()

And, just to be clear, I don't suggest to abandon the class
SoundFile entirely (just the indexing feature).
In more special cases of course the SoundFile object can be used
directly, ideally within a context manager.

In ac983e5 , I implemented two dimensional indexing simply as syntactic sugar that converts the first expression into the second. The whole numpy array is still read, and the second index is only applied after the whole data was read. Indeed, libsndfile does not provide a way to access single channels anyway, so this is probably the only possible solution.

I guess this is better than before, because only allowing
one-dimensional indexing seemed odd.

But not caching the data also seems strange, especially if someone
wants to read 2 (or more) channels separately.
Each time a full np.ndarray with all channels will be created and a
view for one channel will be returned.
If I'm not mistaken, the whole array will be kept in memory (via the
wave.base reference) until the returned view is destroyed.
That would mean, for example, if I load 2 separate channels from a
7-channel file, 14 channels would be kept im memory, right?

So probably if a subset of channels is selected they should be copied
to a new (and smaller) array?
Which again would incur a performance overhead and additional memory
allocation which also would be very bad.

All in all, regardless of the chosen syntax, I think it is impossible
to get that right, so selecting individual channels should be left to
the user.

from python-soundfile.

bastibe avatar bastibe commented on July 19, 2024

I think I finally get your point. Just to make sure: You are saying that indexing a SoundFile object just like a Numpy array is unintuitive, and doing the same on a Numpy array would be better.

I can agree with that. However, I think indexing the SoundFile object is extremely convenient and one of the core features of PySoundFile. Also, as I said, there is no way to read single channels in portaudio, so we'll have to live with the inefficiencies this affords us.

That said, we could easily allocate a new array instead of returning the view. Did you really run into that problem though? Because I have quite a few friends who work with audio stuff and this is the least of their worries.

from python-soundfile.

bastibe avatar bastibe commented on July 19, 2024

Also, I think there is some confusion regarding the lifetime of SoundFile objects:

wave = SoundFile('file.wav')[:]
wave = SoundFile('file.wav').read()

are equivalent, and both SoundFile objects will deallocate right on that very line. They will not stay in memory beyond that and the file will not be kept open. I see no reason at all to change this to

wave = wavread()

If you really want wavread, it's literally just lambda x: SoundFile(x).read().

from python-soundfile.

mgeier avatar mgeier commented on July 19, 2024

In order to be able to give examples for my answer here I tried the two-dimensional feature and promptly found a bug. In line 445 instead of

if second_frame:

you should write

if second_frame is not None:

Otherwise, whenever you request the first channel (second_frame == 0), you'll get the whole array.

This is not a biggie, bugs happen all the time, but it illustrates my point: You are implementing stuff that's already implemented in NumPy!

I deliberately didn't open a new issue for that because I think the best fix is to remove the code completely.

About indexing being intuitive:
I do think indexing would be intuitive, but only if it is two-dimensional and only if it's done right.
And, as I mentioned above, I believe two-dimensional indexing cannot be done right (because of caching and copying issues which cause CPU and/or memory overhead).
Indexing on a NumPy array wouldn't be more intuitive, but it would be much easier for people to get what they expect. If they use indexing on a NumPy array, they know what they get (a view, a copy, ...), correctly indexing a SoundFile object would be something they have to learn.

About indexing being convenient:
It doesn't matter if it is convenient or if it is a core feature.
As long as it cannot be implemented right (which it cannot, as I stated above), it shouldn't be implemented.

About returning new memory:
This would only in a few cases save memory, in others it would waste memory, in all cases it would waste CPU.
I didn't run into problems regarding memory usage, but I'm convinced users can handle their memory requirements better if they are able to use the ndarrays they are used to instead of indexing into a SoundFile object which they might not be very familiar with.

It is impossible with libsndfile to read single channels therefore PySoundFile shouldn't claim to do it either.

Regarding lifetime:

Of course the SoundFile objects in your example don't persist after the line they're on, but the returned array does.
And if it is a view, it still needs the whole original array (which was called np_data in read()) and therefore the original CFFI buffer stays in memory (and can be accessed with wave.base in your example).
If you read two channels from the same SoundFile object (but indexing two times separately), each time a new CFFI buffer (holding all channels) will be created.

Example (with my strange test file, 15 frames, 7 channels):

from pysoundfile import SoundFile
with SoundFile('data/test_wav_pcm16.wav') as f:
    sig1 = f[:, 2]
    sig2 = f[:, 5]
sig1.size, sig2.size, sig1.base.size, sig2.base.size

>>> (15, 15, 105, 105)

The two buffers hold the same data (the whole file contents) but at separate memory locations:

all(sig1.base == sig2.base)
>>> True
sig1.base is sig2.base
>>> False

If PySoundFile would never return single channels (which I strongly suggest), this can still happen but then It's the user's fault.

Regarding the necessity of a wavread-like functionality:

Yes! I really want (even need) such a thing!

I'm trying to convert people from using Matlab to using Python+NumPy+... and every obstacle will make it harder for them and therefore for me.
It might seem like a small thing but it is hard to explain why it is necessary to append [:] or .read() to the constructor and why there isn't a simpler way to do that (using a function).
Apart from that, reading and writing audio files is also very common in interactive sessions, and saving a few keystrokes (especially brackets and stuff that cannot be auto-completed) can make a big difference.

Of course it would be easy for me to implement it locally, but regardless how trivial it may be, I'll have to do this for each project, for each example I show someone, each time I quickly want to try something, ...
It would make my life (and that of my colleagues) easier if such convenience functions were included in PySoundFile.

Don't get me wrong, I definitely don't want a Matlab-clone, I just want a clear and easy high-level interface (in addition to a fully-fledged low-level interface).

In closing:
I will implement said convenience functions anyway and you can decide after I show them to you if you like them or not.

from python-soundfile.

bastibe avatar bastibe commented on July 19, 2024

Thank you very much for pointing out that bug! Quite a nasty one, that. I fondly remember working with Lua, where only false and nil evaluated to False. That makes so much more sense...

from python-soundfile.

bastibe avatar bastibe commented on July 19, 2024

Regarding wavread, I concede your point. It should not be called wavread, though, since it can read so much more than wavs. How about soundread, audioread, or just simply read?

from pysoundfile import read as wavread  
wavread('file.wav')

import pysoundfile as sf
sf.read('file.wav')

Looks good to me.

from python-soundfile.

bastibe avatar bastibe commented on July 19, 2024

Regarding indexing:

I understand everything you are saying, but I really want that feature. As you rightly point out, there are many issues with indexing in general, and two-dimensional indexing in particular, but that feature is just incredibly powerful for my particular use cases. It is implemented as a simple wrapper over read, seek and indexing the Numpy result, and those are the very steps I would have to take manually every time, otherwise.

In the end, I think this comes down to the same point you were making about wavread: The functionality is trivial and could be easily implemented locally. But it saves a few key strokes and it makes things easier to explain. And just like wavread is crucial for you, so is indexing for me. I think PySoundFile can accomodate both use cases, and make both of us happy.

from python-soundfile.

marcecj avatar marcecj commented on July 19, 2024

Responding as per your Email, Basti, I think I agree on all points here: a trivial read function is quite frankly practical, and if it's so simple why not include it? But the indexing is also quite practical, if, as it turns out, somewhat problematic. However, if I wanted to index separate non-neighbouring channels I would instinctively just get the entire numpy array and index from that, as opposed to the example given by mgeier. Put differently: I don't think I would use two-dimensional indexing on a SoundFile object to do more than [a:b, c:d] style indexing. Anything fancier requires a proper numpy array anyway, e.g., the appropriately termed "fancy indexing" (although, since you pass second_frame to the numpy array, I believe you could do something like SoundFile(...)[(a, b, c), :] to select separate channels).

from python-soundfile.

mgeier avatar mgeier commented on July 19, 2024

I guess both features (SoundFile indexing and a high-level read() function) don't interfere with each other, so it shouldn't be a problem for both to coexist.

@bastibe: just out of curiosity ... can you give an example where indexing into a SoundFile is more powerful/easy/efficient/beautiful/... than, say, indexing into a NumPy array after it has been returned from a (still hypothetical) read() function?

from python-soundfile.

bastibe avatar bastibe commented on July 19, 2024

@mgeier Basically, any time I want to access only part of a file:

with SoundFile as s:
    data = s[44100:88200]

This will only read the second second of the file, without reading the second before or anything after.

You could also write it like this:

with SoundFile as s:
    s.seek(44100)
    data = s.read(44100)

But I think the first example is much cleaner and easier to understand. I had one application that was churning through some 3000 audio files on 8 simultaneus threads. This application was wicked fast and used next to no memory, mainly because PySoundFile allowed us to only load a tiny part of a file at a time and never having to do big allocations. Also, the algorithm could only ever process 1024 samples at a time anyway, so loading more than that would have been a waste.

On a different note, read, seek and write work in relation to an invisible current read pointer. Consecutiive calls to read seek and write will have different effects depending on that invisible pointer. I think that is very confusing. You can't just pass the SoundFile object around and expect random functions to read and write with any kind of confidence.

Indexing on the other hand is a deterministic operation that will always do the same thing. You can pass the SoundFile object into some random function and have it play back the file, while simultaneusly doing some signal processing with it on the main thread. This works even for very large files that you wouldn't want to load in their entirity.

On the other hand, there is a whole class of algorithms that just wants to step through a file one block after another. For that, read, write and seek works perfectly: I think both APIs have their merit and use cases. I have used both techniques in different contexts to great effect.

from python-soundfile.

mgeier avatar mgeier commented on July 19, 2024

@bastibe: thanks for the examples

To make my counter-examples a bit less hypothetical, I implemented the suggested features in #18.

If you only want to read a certain part of a file, I would suggest the high-level function read():

import pysoundfile as sf
myarray = sf.read('myfile.wav', start=44100, stop=88200)

or

myarray = sf.read('myfile.wav', frames=44100, start=44100)

I think that's clearer (more explicit) and more flexible (either frames or stop can be specified) than your indexing example.

Admittedly, this (not explicitly closing the file on read shouldn't be a problem):

myarray = sf.open('myfile.wav')[44100:88200]

... is a little less typing than

myarray = sf.read('myfile.wav', start=44100, stop=88200)

... but if you want to avoid some typing, you can use positional parameters:

myarray = sf.read('myfile.wav',44100,44100)

... which is even one character less than the indexing solution, although I wouldn't recommend that outside of an interactive session because it will be confusing what the different numbers actually mean.

As for your multi-threading example: I suspect that it is totally not safe to access the same SoundFile object from different threads (even for reading) exactly for the reason of the read/write pointer.
It's true that you internally re-set the pointer after the indexing operation, but the whole operation is totally not atomic. Correct me if I'm wrong, I don't know much about Python's threading mechanism.
If I'm not mistaken, your example is unsafe and therefore your point is moot.

If you are only reading, it should be perfectly safe to open the same file twice (into two different SoundFile objects), which I would suggest in your multi-threading scenario.
If one of the threads is writing, much more care has to be taken anyway.

from python-soundfile.

bastibe avatar bastibe commented on July 19, 2024

Oh my, I finally get what you have been saying all along. All this time, you were talking about the read function, and I was thinking you were talking about the read method. See, I didn't get why you would want to introduce the keyword arguments for the method. For the function, they make a lot of sense! You are right, I am wrong. Sorry about that.

from python-soundfile.

mgeier avatar mgeier commented on July 19, 2024

Yes, and my examples open and close the file each time. If you want to read the file in 1024-frame-chunks, you can just use the read() method repeatedly (s.read(1024)) as you mentioned.

You still have to convince me that there is a situation where you want to jump around in an open file where read() and seek_*() is not enough and you really need indexing.

from python-soundfile.

bastibe avatar bastibe commented on July 19, 2024

When doing spectral processing, I frequently want to read consecutive, overlapping blocks of data. So the first block might be [0:256], the next [128:384], then [256:512] and so on. You can implement this using read and seek like this:

while seek(0) < len(f)-256:
    data = f.read(256)
    f.seek(-128)

I think it is much cleaner to do something like

for idx in range(0, len(f)-256, 128):
    data = f[idx:idx+256]

from python-soundfile.

mgeier avatar mgeier commented on July 19, 2024

Reading overlapping sections is indeed an interesting example!

I agree that you second example is better than the first, but what about this:

for idx in range(0, len(f) - 256, 128):
    f.seek_absolute(idx)
    data = f.read(256)

Sure, this is one more line, but I think it would be more explicit, less obscure and therefore easier to read and maintain.

from python-soundfile.

bastibe avatar bastibe commented on July 19, 2024

I disagree. I think indexing is much easier to understand than seek and read. Sorry.

from python-soundfile.

mgeier avatar mgeier commented on July 19, 2024

Fair enough.

I'm closing this issue, the actual changes were done by @bastibe in ac983e5.

from python-soundfile.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.