I guess this is a controversial topic for Matlab converts because there are no one-dim

Return one-dimensional arrays for mono files?,about bastibe/python-soundfile

Comments (13)

bastibe commented on July 19, 2024

Interesting idea. On the one hand, (frames,) is certainly easier to work with for mono signals. On the other hand, this would lead to different indexing styles depending on the number of channels in the signal. For this reason alone, I think it is more consistent to always use (frames, channels), even if channels is 1 in some cases.

if signal.ndim == 1:
    first_sample = signal[0]
else:
    first_sample = signal[0, 1]
# vs
first_sample = signal[0, 1]

It also enables code like mono_signal = signal.sum(axis=1) to work regardless of the number of channels.

Also, this makes determining the number of channels easier, in particular when reading the whole file on one big matrix:

signal = sf.read()
channels = signal.shape[1]
# vs
signal = sf.read()
channels = 1 if signal.ndim == 1 else signal.shape[1]

from python-soundfile.

bastibe commented on July 19, 2024

Is this still up for discussion or should we close this?

from python-soundfile.

mgeier commented on July 19, 2024

Please keep it open ...

I still think that mono files should be returned as one-dimensional arrays.
And as input, one-dimensional arrays should be supported anyway.

I was trying to find some examples within NumPy, but I still don't have really convincing arguments ...

np.arange(42) returns a one-dimensional array.
Incidentally, your example signal.sum(axis=1) also returns a one-dimensional array!

Are there any counter-examples?

My original argument was indexing, but in the meantime I found out there are two ways to do it: a[:, 0] returns the first column as one-dimensional array, a[:, 0:1] returns the same, but as two-dimensional array with one column.

Many functions in NumPy have a axis argument, probably this should be also added to the read() and write() methods.

Regarding your examples:
I would check for one-dimensionality and reshape if needed. After that, there are no special cases anymore:

channels = np.asarray(channels)
# check if ndim < 1 or > 2?
if channels.ndim == 1:
    channels.reshape(-1, 1)
# now continue as usual
channels = signal.shape[1]

Of course, an axis argument would make this a bit more complicated.

from python-soundfile.

bastibe commented on July 19, 2024

The point of signal.sum(axis=1) is to return a one-dimensional array. For example signal.mean(axis=1) can be used to downmix any signal to a mono array regardless of the number of channels. If we returned an array, the call would lead to an error.

I think it is very valuable that we return a predictable, consistent results. Returning different data types in different situations is confusing and might lead to errors. Thus, I prefer always returning a matrix to sometimes-a-matrix, sometimes-an-array.

I am not sure about the axis argument. In the context of pysoundfile, channel might make more sense. Of course, that argument would suffer from the very same issues that two dimensional indexing suffers from. But you are right, if we are to embrace those issues anyway, we might as well provide the same functionality for both indexing and read.

from python-soundfile.

mgeier commented on July 19, 2024

The point of signal.sum(axis=1) is to return a one-dimensional array. For example signal.mean(axis=1) can be used to downmix any signal to a mono array regardless of the number of channels. If we returned an array, the call would lead to an error.

I'm not getting your point here ... the "mono array" is a one-dimensional array ... so should SoundFile.read() also return a one-dimensional array for mono files (as I'm suggesting) or not?

I think it is very valuable that we return a predictable, consistent results.

I agree.

Returning different data types in different situations is confusing and might lead to errors.

But if NumPy itself returns arrays with different dimensionality in different situations, this is what a user would expect, isn't it?

Thus, I prefer always returning a matrix to sometimes-a-matrix, sometimes-an-array.

With "matrix" I guess you mean a 2-dimensional np.ndarray and not a np.matrix, right?

I still think there is nothing wrong with returning a 2-dimensional array for multi-channel files and a 1-dimensional array for mono files.

I am not sure about the axis argument. In the context of pysoundfile, channel might make more sense.

What about channelaxis?

Of course, that argument would suffer from the very same issues that two dimensional indexing suffers from. But you are right, if we are to embrace those issues anyway, we might as well provide the same functionality for both indexing and read.

I don't see the link to 2-dimensional indexing, can you please elaborate?

from python-soundfile.

bastibe commented on July 19, 2024

All I'm saying is that if I have a piece of code that expects a matrix (meaning an np.ndarray with two dimensions), very often that code will not work with a vector (meaning an np.ndarray with one dimension). Thus, I think that always returning a matrix is preferable to returning sometimes a vector, sometimes a matrix.

from python-soundfile.

mgeier commented on July 19, 2024

I still couldn't find any really convincing argument for either standpoint ...

But we could at least make the behavior explicit in the function signatures.

I suggest an argument channelaxis with default value 0 which would mean the channels are stored in the columns of the arrays. This would also make sense for one-dimensional arrays where the only axis is 0 anyway.

An additional keyword argument force_2d should be used to explicitly state if mono files are returned as one-dimensional arrays.
If we use force_2d=True as default value, the default behavior wouldn't change at all, but I think it would be much clearer what happens with mono files and if someone needs it, they can be returned as one-dimensional arrays with force_2d=False.

I think this would be a reasonable solution for this issue, but probably the names of my proposed arguments could be improved ...

There is still an annoying corner case when returning only one frame of a multi-channel file ... in this case even I would return a two-dimensional array, although I don't really know what the consistent behavior would be ... anyway, this should be extremely rare ...

from python-soundfile.

bastibe commented on July 19, 2024

I am not sure about channelaxis. I think the canonical term for matrices would be row-major and column-major, which would probably translate to channel-major and frame-major. In audio terms, frame-major is more often called interleaved and it is pretty ubiquitous. At the very least, we should change the name to one of these, though I don't know which one.

Given that interleaved data is so ubiquitous and that setting channelaxis is more complicated than appending .T to the call, I don't really see the value of adding it as a separate argument at all though.

Your idea about force_2d is a very good one (although I would rather call it always_2d). I agree that this is a very reasonable solution.

The single-frame-of-multi-channel is a great example! It neatly shows why returning a frames x channels matrix is better than returning an np.ndarray with the minimal amount of dimensions. That frame would simply be returned as [[left, right]].

from python-soundfile.

mgeier commented on July 19, 2024

I like the name always_2d, I was thinking about names with "mono" in it, but I couldn't come up with anything useful.

Of course, channelaxis could be replaced by appending .T, but this wouldn't help making PySoundFile's behavior explicit. A user would have to know the behavior or look it up somewhere deep in the documentation.
It is an arbitrary choice that channels are represented as columns, I would like to make this choice explicit by the argument channelaxis=0. And if someone prefers, they might as well represent channels as rows with channelaxis=1 (or channelaxis=-1).

The alternative names mentioned above don't work because they all talk about memory order, but channelaxis doesn't say anything about that.
Of course channelaxis=0 is more efficient with C-contiguous arrays and channelaxis=1 with Fortran-contiguous data, because in those cases the data is already interleaved in memory and can be handed to libsndfile without additional copy.

I was thinking about the argument axis to fft() and sum(), but they have slightly different behavior and default values. And they are meant to also work with more than 2 dimensions, which we will never support.
So maybe an integer argument isn't the right thing here, a boolean argument should provide all options we need (exactly two).

But how to call it?

An argument channels_are_stored_in_columns=True wouldn't work, because (except from being long and ugly) it doesn't make sense for one-dimensional arrays (which I still would like to support).

from python-soundfile.

bastibe commented on July 19, 2024

You are right. We should advertise matrix orientation and not memory order. How about channel_major=True then? Or even major_axis=channel? Sometimes I wish Python had something like rubies :constants or lisp 'symbols.

from python-soundfile.

mgeier commented on July 19, 2024

Well, since version 3.4, Python has Enums, which doesn't help at all if we want to support 2.6+ ... but in any case, we would still need a name for the argument.

I think everything with "major" in it suggests memory order. In channel_major I cannot read any information about columns/rows, while in major_axis I don't see the connection between axes and channels.

I think channels_in_second_axis=False (or channels_in_first_axis=True) would mean the exactly right thing, but probably it sounds too cryptic.
This would have the advantage that channels_in_second_axis=True implies that we have to be dealing with a 2-dimensional array.

This could be shortened to channels_first=True, but I guess this would be even harder to understand.

More suggestions?

from python-soundfile.

mgeier commented on July 19, 2024

I'm closing this.
We decided to drop the axis thing because we couldn't find a good name and it would have needed a lot of additional code with very questionable overall benefit.
The original question about mono files was resolved in #36 with always_2d.

from python-soundfile.

mgeier commented on July 19, 2024

Just for the record:
I stumbled upon an alternative name for always_2d: numpy.loadtxt() uses ndmin.
Our default would be ndmin=2.

I still like always_2d better, I'm mentioning this here only for completeness' sake.

They also have an alternative for the suggested option channelaxis: they use unpack, which is supposed to transpose the array and allow tuple unpacking to get the individual "channels".

Anyway, I don't think this would be helpful in our case.
Again, just for the record ...

from python-soundfile.

Return one-dimensional arrays for mono files? about python-soundfile HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent