Python version 3.7.7 Platform information

export_to_csv(): Is it possible to remove the 100hz resample? about mdfreader HOT 15 CLOSED

ecalpy commented on July 17, 2024

export_to_csv(): Is it possible to remove the 100hz resample?

from mdfreader.

Comments (15)

ratal commented on July 17, 2024 1

Only my personal opinion: asammdf is very strong for data science, especially thanks to its GUI. It is easy to manipulate mdf files.
To me, mdfreader is more for advanced python users, especially in the domain of big data. Thanks to cython module, you can have better performance to read files (depends of use cases lately, asammdf progressed a lot on this): at a downside, sometimes not easy to properly install by all potential users. Because of its design, data are directly at reach in interactive interpreter, while rather via API for asammdf.
Nowadays, asammdf source is also becoming very complex with files of 10k lines for instance which could make it difficult to customise (but in the end, contribution is also possible)
At work I see a much bigger user base for asammdf, but still for some cases mdfreader is used.
Maybe you have different opinion @danielhrisca ?

from mdfreader.

ratal commented on July 17, 2024

Hi,
Because of the nature of .csv file format, I do not see an efficient way to export data with several sampling time at once. Some columns would be longer than others, a lot of empty cells.
Probably more efficient if you split it into files, one per sampling (or data group) ?
I guess you would expect something a bit like what is is already done for xlsx export ?

from mdfreader.

ecalpy commented on July 17, 2024

Hi- thanks for the quick response!

I haven't tried xlsx export, but our requirement is to have an uncompressed .csv file. Maybe it works to go xlsx and convert to .csv.

Looking into ASAMMDF, does it do the same? It seems to generate much larger files, but there still seems to be some sort of resampling. @danielhrisca - any input on that?

Edit: I just tried export_to_xlsx(), and yes- precisely what I would want in .csv format.

from mdfreader.

danielhrisca commented on July 17, 2024

Looking into ASAMMDF, does it do the same? It seems to generate much larger files, but there still seems to be some sort of resampling. @danielhrisca - any input on that?

the file is big because all channels are interpolated using the union of all time stamps

from mdfreader.

ecalpy commented on July 17, 2024

Looking into ASAMMDF, does it do the same? It seems to generate much larger files, but there still seems to be some sort of resampling. @danielhrisca - any input on that?

the file is big because all channels are interpolated using the union of all time stamps

Does that mean it's just resampled to the longest time stamp array?

from mdfreader.

danielhrisca commented on July 17, 2024

No it means that all the time channels are merged into a single one that contains all the unique time stamps. After that all the channels are interpolated using the new merged time channel

from mdfreader.

ecalpy commented on July 17, 2024

Thanks for the explanation. I will see if this is acceptable for my project.

Thanks again for the great packages!
Did you guys collaborate on your two different ones? (Always wanted to ask)

from mdfreader.

ratal commented on July 17, 2024

Making a csv with all your data not resampled is possible but honestly, I doubt this will be practical for your end user.
We know each other and met during conference but work for different companies and located in different countries. Our packages are having different objectives and approach but globally beneficial for the community I think. Daniel is way more active than me past years and could grow a good contributing community which is not really existing for mdfreader. I have less time to spend on this package, rather in maintenance for the moment, maybe more active for next mdf standard release.

from mdfreader.

ecalpy commented on July 17, 2024

I agree with you- It's definitely not practical for anyone! But it was the original request. I think the project will eventually agree to a resample very soon.

We can close this issue now. Thanks for both of your support!

You piqued my interest now, is it easy to explain the different objectives between the mdfreader and asammdf?

Regardless of the differences, these two packages are very helpful for our industry.

from mdfreader.

ecalpy commented on July 17, 2024

Final questions regarding export_to_csv()...

is there a way to specify which time axis is utilized in the export in column A?
Would there be an easy rename option (e.g. "Time") if that's not possible?

Thanks!

from mdfreader.

danielhrisca commented on July 17, 2024

Ever since asammdf 5.0.0 there is no option to load the channel samples in the RAM, so only the file metadata is loaded. The samples are extracted on demand and in a chunks so memory usage is most times low. This was done with the "big data" in mind for the cases where you can't fit everything in the RAM. Using the filtering option on file load and the select method deliver good performance I would say. If you have any example were the speed in an issue I would be very interested to investigate.

Because of its design, data are directly at reach in interactive interpreter, while rather via API for asammdf.

The internal representation of mdfreader is simpler indeed. There is a bigger learning curve to using the asammdf API and its internal data representation.

Nowadays, asammdf source is also becoming very complex with files of 10k lines for instance which could make it difficult to customise (but in the end, contribution is also possible)

As you know the MDF spec is really complex, especially since version 4 (not mention the new additions in 4.20). Having an almost 1-to-1 internal representation results in a complex code base as well

from mdfreader.

ratal commented on July 17, 2024

This was done with the "big data" in mind for the cases where you can't fit everything in the RAM. Using the filtering option on file load and the select method deliver good performance I would say.

Using mdfreader channel_list or no_data_loading parameters, you can reach same feature. Also reading by chunks, there is internal parameter for chunk size you can tweak to reach best performance if needed.
However, the file metadata (blocks) data structure is not optimum in mdfreader and could lead to more memory consumption. I must admit that mdfreader was originally designed to convert complete files into another format in batch for processing with Matlab or excel -> you can feel it in its design. After all I did this project to learn python language.

from mdfreader.

ecalpy commented on July 17, 2024

I too also did heavy excel then MATLAB analysis on MDF files but now I try to stay completely in Python and only export results or reports to excel/html. Thanks to both of your tools, this is possible and relatively easy! I learned most of my python on mdfreader :)

I would close this issue but there was one remaining open question that I somehow formatted weird:
-Is there a way to specify which time axis is utilized in the export in column A?
-Would there be an easy rename option (e.g. "Time") if that's not possible?

Thanks!

from mdfreader.

ratal commented on July 17, 2024

If you resample before exporting, you can choose the master channel using master_channel parameter of resample() method.
There is a rename_channel() method existing. But be careful, it has to be unique name, otherwise, it will not rename.

from mdfreader.

ecalpy commented on July 17, 2024

Thanks, I will try that!

from mdfreader.

export_to_csv(): Is it possible to remove the 100hz resample? about mdfreader HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent