ercius / openncem Goto Github PK

A collection of packages and tools for electron microscopy data analysis supported by the National Center for Electron Microscopy facility of the Molecular Foundry

License: GNU General Public License v3.0

Python 16.39% HTML 15.76% Jupyter Notebook 67.84% Shell 0.01%

openncem's People

Contributors

Stargazers

Watchers

openncem's Issues

Change genfire import line

openNCEM/ncempy/edstomo/DoGenfire.py

Line 1 in 1e93ed6

import genfire as gf

@ZGainsforth Can you change the line referenced above to be something like
import genfire.reconstruct
You will probbaly have to change gf.reconstruct to genfire.reconstruct later.

I would try it but I figure you would be able to test it and confirm it works.

This should avoid importing the GUI portion of the genfire package. Here is what the genfire.__init__ imports:

from . import utility
from . import reconstruct
from . import fileio
from . import gui

I dont think you need utility, fileio or gui and then ncempy will build even on headless machines. I think this will fix #22.

DM3 class

I just uploaded a working DM3 class to io on the development branch. I have tested this with a single image and an image series file. Some more extensive testing is needed. To do:

Further data types (RGB, complex, spectra, etc.) will be added soon in the getDataset function to bring DM3.py to full capability as with my dm3Reader.m matlab code.
Cleaning up the code to be comparable to ser.py
Add a DM4 class which is fairly easy conversion from the DM3 class. I might put these in the same DM.py to share code. We can then use validDM3 and validDM4 to read dm3 and dm4 with the same dm.py file in io.

Let me know if you find any issues or have any questions.

`BufferError` caused by eager closing of DM file handles

In py4DSTEM, we use fileDM as a context manager and then use it to get a memory map to the 4D-STEM data.

Recently, we have been seeing this error (full trace pasted at the bottom): BufferError: cannot close exported pointers exist. It seems to be caused by fileDM.__exit__ forcing the file to close. Python keeps track of how many references to the data exist, and raises this error when we try to close a file while the py4DSTEM.io.DataCube object still retains a reference to the file via its memory map.

I'm not sure why this has become a problem just now, since this part of ncempy hasn't changed in a while, so I suspect that it is restricted to certain versions of Python or numpy. But it is definitely occurring for some people running fully up-to-date installs of py4DSTEM and ncempy.

Perhaps the solution is just to put the self.fid.close() in a try-except, so that if there are no references remaining and it's safe to close the file, it still gets closed:

openNCEM/ncempy/io/dm.py

Lines 227 to 234 in 0dcbead

 def __del__(self): 

 """Destructor which also closes the file 

  """ 

 if not self.fid.closed: 

 if self._v: 

 print('Closing input file: {}'.format(self.file_path)) 

 self.fid.close()

Though I'm not 100% sure what state the file handle gets left in, if it hits this error.

Full error:

---------------------------------------------------------------------------
BufferError                               Traceback (most recent call last)
Input In [15], in <cell line: 1>()
----> 1 dc = scan.get_data()

Input In [6], in Scan_Info_Storage.get_data(self, shift_correct)
     27 def get_data(self, shift_correct = False): 
---> 28     dc = py4DSTEM.io.read(self.data_filepath)
     29     dc.set_scan_shape(self.R_Nx,self.R_Ny)
     30     if shift_correct:

File ~/miniconda3/envs/cuda-multcorr-sez/lib/python3.8/site-packages/py4DSTEM/io/read.py:99, in read(fp, mem, binfactor, ft, metadata, **kwargs)
     95     data = read_py4DSTEM(
     96         fp, mem=mem, binfactor=binfactor, metadata=metadata, **kwargs
     97     )
     98 elif ft == "dm":
---> 99     data = read_dm(fp, mem, binfactor, metadata=metadata, **kwargs)
    100 elif ft == "empad":
    101     data = read_empad(fp, mem, binfactor, metadata=metadata, **kwargs)

File ~/miniconda3/envs/cuda-multcorr-sez/lib/python3.8/site-packages/py4DSTEM/io/nonnative/read_dm.py:53, in read_dm(fp, mem, binfactor, metadata, **kwargs)
     51                 dataSet = dmFile.getDataset(i)
     52             i += 1
---> 53         dc = DataCube(data=dataSet["data"])
     54 elif (mem, binfactor) == ("MEMMAP", 1):
     55     with dm.fileDM(fp, on_memory=False) as dmFile:
     56         # loop through the datasets until a >2D one is found:

File ~/miniconda3/envs/cuda-multcorr-sez/lib/python3.8/site-packages/ncempy/io/dm.py:246, in fileDM.__exit__(self, exception_type, exception_value, traceback)
    242 def __exit__(self, exception_type, exception_value, traceback):
    243     """Implement python's with statement
    244     and close the file via __del__()
    245     """
--> 246     self.__del__()
    247     return None

File ~/miniconda3/envs/cuda-multcorr-sez/lib/python3.8/site-packages/ncempy/io/dm.py:234, in fileDM.__del__(self)
    232 if self._v:
    233     print('Closing input file: {}'.format(self.file_path))
--> 234 self.fid.close()

BufferError: cannot close exported pointers exist

Matlab Code Location

I forget during the discussion we had in person when creating this repo, but are we going to split the top level folder structure into Python and Matlab folders such that there are only two top level folders and the few necessary text files (.gitignore, README, LICENSE)? This would require moving docs, tools, and ncempy into a python folder and updating any necessary cross-references. On the other hand, we could leave things the way they are and only add a specific Matlab folder.

Tom

Need a place to put images and binary files

We need some space to upload images for the documentation and the possibility to provide the example datasets which I have been using for unittesting. Without them, the unittests fail.

For the images I will attach them to this issue for now, to have them immediately available.

4D-STEM data reads incorrectly

Newer versions of DM with the STEMX capability for 4D-STEM now write 4-dimensional DM4 files with a different shape than previously was done. This is actually an improvement in the layout of the data on disk, but ncempy.io.dm does not currently read this data correctly.

There is a workaround. You can use the low level API in ncemy.io.dm.fileDM to create a memmap on disk with the correct ordering:

d1 = nio.dm.fileDM('filename.dm4')
data = np.memmap(d1.fid,dtype=d1._DM2NPDataType(d1.dataType[1]), mode = 'r', offset=d1.dataOffset[1], shape=(d1.xSize[1],d1.ySize[1],d1.zSize[1],d1.zSize2[1])[::-1],order='C')
image0 = data[:,:,0,0] # the first diffraction pattern

Note how the shape keyword has [::-1] to reverse the order of the data shape. The shape is now [ky, kx, scanY, scanX] making the diffraction patterns the fastest data to access according to C-ordering.

I will attempt to add automatic identification of this change in the file format, but it will take some time to figure out.

Unit tests

We need to overhaul the unit tests, to allow convenient collaboration of multiple coders.

Let's put up a list of requirements and current status, which we can use to discuss and work on these things.

Storage of binary data:

needs to be outside github
currently on peters google drive?

Structure of binary data:

currently badly organized in ncempy/test/resources

Test data:

should be small and handy
for file formats should cover ideally all possible combinations (data types, software versions, etc.)
Peter started with some nice SER/EMI files

Implementation:

all codes must be tested
how to test properly?

License

Need to figure out, which license we put this under!

DM import can't read newest-version files

It looks like Gatan changed the encoded type of tag "Acquisition Time (OS)" from type 7 (float64) to the very rarely used type 11 (int64) in a recent version change. This means fileDM cannot open these files since it doesn't include type 11 in all its lists.

I've tried a very quick fix and it seems to work: Add the entry 11: 8 to self._encodedTypeSizes

I don't have a large enough set of test cases to verify that this works generally. I would be happy to provide an example file where the current version fails.

ser.py and import ncempy.io.emd

@fniekiel
I have a suggestion to improve the usefulness of ser.py. Right now, you have to install ncempy (using pip) in order to use ser.py. This is due to the import ncempy.io.emd line in ser.py. Can we add a try/except statement to handle cases where you download the source code only or are working on a different branch?

Change:

import ncempy.io.emd

to:

try:
    import ncempy.io.emd
except:
    import emd

where .../ncempy/io/ is added to my python path.

OR is there a way to import ncempy by adding it to the python path without using pip?

Could not build ncempy. Cannot load backend 'Qt5Agg' which requires the 'qt5'

Hello,

I could not build ncempy on openSUSE Tumbleweed.
I tried installing python3-qt5 and python3-matplotlib-qt5 but the error still occur.

The error are shown below.
Thanks.

[   54s] ======================================================================
[   54s] ERROR: DoGenfire (unittest.loader._FailedTest)
[   54s] ----------------------------------------------------------------------
[   54s] ImportError: Failed to import test module: DoGenfire
[   54s] Traceback (most recent call last):
[   54s]   File "/usr/lib64/python3.8/unittest/loader.py", line 154, in loadTestsFromName
[   54s]     module = __import__(module_name)
[   54s]   File "/home/abuild/rpmbuild/BUILD/openNCEM-1.5.0.1583714936.999f7a3/ncempy/edstomo/DoGenfire.py", line 1, in <module>
[   54s]     import genfire as gf
[   54s]   File "/usr/lib/python3.8/site-packages/genfire/__init__.py", line 2, in <module>
[   54s]     from . import reconstruct
[   54s]   File "/usr/lib/python3.8/site-packages/genfire/reconstruct.py", line 16, in <module>
[   54s]     matplotlib.use("Qt5Agg")
[   54s]   File "/usr/lib64/python3.8/site-packages/matplotlib/cbook/deprecation.py", line 307, in wrapper
[   54s]     return func(*args, **kwargs)
[   54s]   File "/usr/lib64/python3.8/site-packages/matplotlib/__init__.py", line 1307, in use
[   54s]     switch_backend(name)
[   54s]   File "/usr/lib64/python3.8/site-packages/matplotlib/pyplot.py", line 233, in switch_backend
[   54s]     raise ImportError(
[   54s] ImportError: Cannot load backend 'Qt5Agg' which requires the 'qt5' interactive framework, as 'headless' is currently running
[   54s] 
[   54s] 
[   54s] ======================================================================
[   54s] ERROR: postprocess (unittest.loader._FailedTest)
[   54s] ----------------------------------------------------------------------
[   54s] ImportError: Failed to import test module: postprocess
[   54s] Traceback (most recent call last):
[   54s]   File "/usr/lib64/python3.8/unittest/loader.py", line 154, in loadTestsFromName
[   54s]     module = __import__(module_name)
[   54s]   File "/home/abuild/rpmbuild/BUILD/openNCEM-1.5.0.1583714936.999f7a3/ncempy/edstomo/postprocess.py", line 4, in <module>
[   54s]     import genfire
[   54s]   File "/usr/lib/python3.8/site-packages/genfire/__init__.py", line 2, in <module>
[   54s]     from . import reconstruct
[   54s]   File "/usr/lib/python3.8/site-packages/genfire/reconstruct.py", line 16, in <module>
[   54s]     matplotlib.use("Qt5Agg")
[   54s]   File "/usr/lib64/python3.8/site-packages/matplotlib/cbook/deprecation.py", line 307, in wrapper
[   54s]     return func(*args, **kwargs)
[   54s]   File "/usr/lib64/python3.8/site-packages/matplotlib/__init__.py", line 1307, in use
[   54s]     switch_backend(name)
[   54s]   File "/usr/lib64/python3.8/site-packages/matplotlib/pyplot.py", line 233, in switch_backend
[   54s]     raise ImportError(
[   54s] ImportError: Cannot load backend 'Qt5Agg' which requires the 'qt5' interactive framework, as 'headless' is currently running

The full error log is here: https://build.opensuse.org/package/live_build_log/home:andythe_great/python3-ncempy/home_andythe_great_openSUSE_TW/x86_64

IMG and SMV not recognized

The ncempy.read() function does not recognize smv and img files.

emd import does not support default dim attributes

From https://emdatasets.com/format/, it's specified that the "name" and "units" attributes on dimensions are not required and will be set to defaults if not present:

"Without these attributes, the viewer will still function but data processing routines may fail or produce incorrect results."
"Without the “name” and “units” attributes, the EMD viewer program can still parse the file, and these fields will default to numerical dimensions and pixels respectively. The “version” attribute is important for programs to validate which version of the EMD specification they are using before processing the data."

fileEMD.get_emddims() does not implement this, however. If for example the "units" attribute is missing, it throws an exception which is caught and turned into a text output warning that the data set "does not seem to be in emd specified shape"

need to handle bad TagOffSetArray

In ser.writeEMD() the time tags are read and attempted to be put into the EMD file. However, the time tags are not very reliable. For example, I have a perfectly good SER file with 5 1024x1024 images in it. The DataTypeID and TagTypeID indicate that this is a 'simple series' in writeEMD() (see line 678 in io/ser.py).

The SER file size is 10,486,918 bytes (exactly). The TagOffsetArray of this file is:
>>> fser.head['TagOffsetArray']
[10486918, 0, 0, 0, 0]

So in writeEMD, getTag(0) fails because there is nothing to read in at byte offset 10486918. Its the end of the file!

I suggest removing the time tags for a simple series. They are not really necessary. Or we can use a try block to skip reading/writing them in case something weird happens like in this case.

Unsupported tag breaks DM reading

A user brought a DM file to me which could not be read.

We found out a tag in the file called Saturation fraction has values that look like this:
[ inf 0.00218585 0.00237532 0.00249739 0.0026716 0.00234353
...

The inf cant be processed properly by the fileDM._bin2str function.

In the DM tag tree, this tag is printed with a value that indicates it "can not be displayed" and might not be that useful.

The solution is to wrap a try statement around the return of _bin2str() and return some other string (like "can not be read") instead. Then the file data can be loaded properly.

DM reader fails in `del` on current Python 3.9

Code like the following fails on current Python 3.9 versions (tested on 3.9.12):

with fileDM("/path/to/file.dm4", on_memory=True) as f:
    pass

Gives the following exception:

---------------------------------------------------------------------------
BufferError                               Traceback (most recent call last)
Input In [9], in <cell line: 1>()
      1 with fileDM("/path/to/file.dm4", on_memory=True) as f:
----> 2     pass

File ~/miniconda3/envs/lttest/lib/python3.9/site-packages/ncempy/io/dm.py:246, in fileDM.__exit__(self, exception_type, exception_value, traceback)
    242 def __exit__(self, exception_type, exception_value, traceback):
    243     """Implement python's with statement
    244     and close the file via __del__()
    245     """
--> 246     self.__del__()
    247     return None

File ~/miniconda3/envs/lttest/lib/python3.9/site-packages/ncempy/io/dm.py:234, in fileDM.__del__(self)
    232 if self._v:
    233     print('Closing input file: {}'.format(self.file_path))
--> 234 self.fid.close()

BufferError: cannot close exported pointers exist

This can be reproduced using mmap and np.frombuffer like this:

import mmap
raw_fh = open("/path/to/file.dm4", "rb")
mm = mmap.mmap(raw_fh.fileno(), 0, prot=mmap.PROT_READ)
arr = np.frombuffer(mm, dtype=np.float32, count=1024)
mm.close()

The cause is that arr references the memory mapped by mm, and closing the memory map invalidates any pointers to it, meaning accesses to arr would cause a segmentation fault. In the current Python version, this is prevented by raising an exception if there are any pointers still active. This can be prevented by actively removing the references before closing the memory map:

import mmap
raw_fh = open("/home/alex/source/LiberTEM/data/dm/2018-7-17 15_29_0000.dm4", "rb")
mm = mmap.mmap(raw_fh.fileno(), 0, prot=mmap.PROT_READ)
arr = np.frombuffer(mm, dtype=np.float32, count=1024)
del arr
mm.close()

EMD `memmap`s can lose their file handles prematurely

An issue we've encountered in py4DSTEM is that if you load a memory map to an EMD dataset using ncempy.io.emd.fileEMD.get_memmap(N), the handle to the HDF5 file is forcibly closed when the fileEMD object gets garbage collected.

If a fileEMD object gets created inside a function (and is not retained anywhere else), when that function exists the fileEMD will be deleted, calling its __del__ method and closing the file handle:

openNCEM/ncempy/io/emd.py

Lines 162 to 168 in ab0a40a

 def __del__(self): 

 """Destructor for EMD file object. 

  """ 

 # close the file 

 # if(not self.file_hdl.closed): 

 self.file_hdl.close()

A minimal reproduction of the problem is like so:

from ncempy.io.emd import fileEMD
fpath = "Sample_LFP_v0,12.h5"

def load_memmap(fpath,N):
    f = fileEMD(fpath)
    return f.get_memmap(N)

data, dims = load_memmap(fpath,0)

print(data)
# <Closed HDF5 dataset>

In py4DSTEM we use a context manager for most reading, so my approach when a memory map is asked for is to create a new h5py.File outside of the context manager. While this File will get garbage collected, this does not actually close the file handle because there is still a reference to the Dataset. A good question with my approach is whether the file handle would ever get closed; it's possible that when the Dataset is finally garbage collected then the file gets closed as well, but I haven't tested this.

I am using ncempy version 1.8.1

RingDiff tool and PyQT

It is known that the RingDiff tool is out of date with respect to the use of PyQT4 instead of the modern PyQT5. There should be a simple way to update this and Im working on this. If you are finding problems with this tool and want to use it please let me know here so that I can prioritize the support of this tool.

ser2EMD conversion issue and RingDiff dims issue

I suggested to Karen that she use EMDviewer to convert DM3 files to EMD files. There is then an issue with the dims vectors in Ring diffraction. @fniekiel probably already know about this.

The issue is two fold.

ser2emd needs to be changed slightly to avoid loading single image data as ndarray.shape = (1,numX,numY). This should just be ndarray.shape= (numX,numY). At the very end of the ser Image import you should add a np.squeeze() to remove singular dimensions. The EMD file will only have dim1 and dim2 just as with an EMD file created by EMDviewer. (Note: I changed the behavior of my serReader to take care of this same issue yesterday! Check out bitbucket/openNCEM to see the change).
With 1) fixed, single SER images converted by ser2emd will now only have 2 dimensions and not 3. Then the dims 2-tuple will only have len(dims) = 2 rather than 3. Then in Ring Diffraction all dim[x][y] need to be changed to dims[x-1][y] to compensate.

This will make EMDviewer export and ser2emd export produce the same /data folders with the same number of dim HDF5 datasets. Both programs can be used to create EMD files and then used in Ring Diffraction.

SMV and IMG reader errors

The smvReader has several errors:

dont use self
CAMERA_LENGTH should be DISTANCE

ser tab complete infinite loop

If you use

from ncempy.io import ser

you can get an infinite loop in tab completion for ser.ncempy.io.ser....

I think the issue is the import ncempy.io.emd at the start of ser.py. This can be moved to the function which uses the emd class.

SER array size backwards

The ser reader code does not work for non-square images. I think that the values in reshape() in line 380 in getDataset() need to be swapped.

Example code

I suggest we update the docs in ncempy/io to include a quick start example(s) "how to read a dataset using SER.py, EMD.py, etc. "

Submodule name io clobbers built in python3 module io

The standard library in python 3 for dealing with file streams is called io. Having our own submodule named io can clobber that namespace and cause incompatibilities with other code using io.

It also can cause a "bizzare" error in my zsh shell where it reports "Fatal Python error: Py_Initialize: can't initialize sys standard streams" because the python system standard streams depends on the builtin io.

I don't know but I suspect this could also be related to #12

A simple solution is to choose a name for io which is different than the builtin io module. I suggest ncemio.

dm data not accessible after file closed using on_memory option

Using on_memory no longer loads the image data into memory. Python crashes when trying to access the data after the file is closed. This is undesirable, since its suggested to use with context management where possible and close the file after reading the data.

This seems to be related to the way dm.fileDM.fromfile uses np.frombuffer. I am using numpy 1.18.1 on Windows 10, and this is now crashing. Maybe the behavior changed in recent versions of numpy?

Im going to explicitly load the data using np.array() to overwrite the ['data'] entry in the output dict.

@gonzalorodrigo You improved dm reading using this method. Do you have any suggestions as to why this is happening or a better way to fix it?

module 'ncempy' has no attribute 'io'

While (first time) using py4dstem, I encountered many problems from its dependencies including (llvmlite, numpy, matplotlib, etc.) all the problems were solved by downgrade to python to 3.6.12 and then reinstall the dependencies.

However, I finally encountered a problem originated in ncempy (which I could not identify the origin of problem and not too much resources on the internet):
module 'ncempy' has no attribute 'io'

I tried pip uninstall ncempy and installed again, but not helpful. Therefore I'm posting here, dont know if its the right place to solve my problem.

(py4dstem) C:\WINDOWS\system32>py4dstem
Traceback (most recent call last):
File "c:\programdata\miniconda3\envs\py4dstem\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "c:\programdata\miniconda3\envs\py4dstem\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\ProgramData\Miniconda3\envs\py4dstem\Scripts\py4DSTEM.exe_main.py", line 4, in
File "c:\programdata\miniconda3\envs\py4dstem\lib\site-packages\py4DSTEM_init.py", line 1, in
from . import process
File "c:\programdata\miniconda3\envs\py4dstem\lib\site-packages\py4DSTEM\process_init_.py", line 3, in
from . import preprocess
File "c:\programdata\miniconda3\envs\py4dstem\lib\site-packages\py4DSTEM\process\preprocess_init_.py", line 3, in
from .electroncount import *
File "c:\programdata\miniconda3\envs\py4dstem\lib\site-packages\py4DSTEM\process\preprocess\electroncount.py", line 11, in
from ...io.datastructure import PointListArray
File "c:\programdata\miniconda3\envs\py4dstem\lib\site-packages\py4DSTEM\io_init_.py", line 3, in
from . import nonnative
File "c:\programdata\miniconda3\envs\py4dstem\lib\site-packages\py4DSTEM\io\nonnative_init_.py", line 1, in
from .read_dm import *
File "c:\programdata\miniconda3\envs\py4dstem\lib\site-packages\py4DSTEM\io\nonnative\read_dm.py", line 5, in
from ncempy.io import dm
File "c:\programdata\miniconda3\envs\py4dstem\lib\site-packages\ncempy\io_init_.py", line 6, in
from .read import read
File "c:\programdata\miniconda3\envs\py4dstem\lib\site-packages\ncempy\io\read.py", line 1, in
import ncempy.io as nio
AttributeError: module 'ncempy' has no attribute 'io'

	def __del__(self):
	"""Destructor which also closes the file

	"""
	if not self.fid.closed:
	if self._v:
	print('Closing input file: {}'.format(self.file_path))
	self.fid.close()

	def __del__(self):
	"""Destructor for EMD file object.

	"""
	# close the file
	# if(not self.file_hdl.closed):
	self.file_hdl.close()

ercius / openncem Goto Github PK

openncem's People

Contributors

Stargazers

Watchers

Forkers

openncem's Issues

Recommend Projects

Recommend Topics

Recommend Org