Comments (18)
I assume you're just running out of memory, and there's some kind of reference counting-related bug in laspy that's causing stuff to hang on and not be deleted.
from laspy.
Points are provided using numpy frombuffer, so closing the file (and therefore the underlying map) and then trying to access the points is going to cause issues. If you're certain that you want to keep point data around after closing a file, you could do something like:
import glob, numpy as np, laspy
#assumes you are in a directory with a collection of las files
files = glob.glob("*.las")
f0 = files[0]
las = las.file.File(f0, mode="r")
points = las.points.copy()
las.close()
#list objects in main namespace
dir()
#points is still present in the namespace
#calling points now should not result in a crash
points
An even better solution would be to use a function to explicitly define what data you want out of each file before closing them.
import glob, numpy as np, laspy
#assumes you are in a directory with a collection of las files
files = glob.glob("*.las")
def process(fname):
las = las.file.File(fname, mode="r")
points = las.points.copy()
las.close()
return(points)
for i, f in enumerate(files):
print i, f
pts = process(f)
print points
I'd be interested to know if you continue to have problems with an approach like that.
from laspy.
Thanks (@hobu/@grantbrown) for the quick reply.
I tried the last code block (made a few minor changes: module name in process changed to 'laspy', 'points' in for loop changed to pts")...
import glob, numpy as np, laspy
#assumes you are in a directory with a collection of las files
files = glob.glob("*.las")
def process(fname):
las = laspy.file.File(fname, mode="r")
points = las.points.copy()
las.close()
return(points)
for i, f in enumerate(files):
print i, f
pts = process(f)
print pts
The problem persists. Memory is still not deallocated. Here is a screenshot after passing through 141 files (a subset of all the files I am working with). I ran this on a 64GB RAM server. There are >250 las files in the directory.
from laspy.
I'm looking into the issue.
from laspy.
Thank you! Glad to help in any way, but I'm not familiar with the code-base.
from laspy.
Following the instructions here, I've managed to reduce the size of the memory leak in this commit on a feature branch. There are still quite a few remaining objects, but I wasn't immediately able to figure out why or where they came from.
I'll have to return to this later, but if either of you run across the culprit in the meantime definitely let me know.
from laspy.
Leak check script:
from pympler import tracker
import laspy
def g():
f = laspy.file.File("simple.las")
pts = f.points.copy()
del(pts)
f.close()
del(f)
memory_tracker = tracker.SummaryTracker()
memory_tracker.print_diff()
print("########## START ##########")
for i in xrange(1000):
g()
memory_tracker.print_diff()
from laspy.
Hi,
I tested the the memory_tracker code, and I got OOM while testing on a 259653143 bytes long las file. I had to set xrange to 2 because 10 caused long-long wait and huge memory usage. 100..1000 caused OOM.
The output of the script:
python2.7 ./mem.py
types | # objects | total size
======================= | =========== | ============
list | 2623 | 266.38 KB
str | 2625 | 150.42 KB
int | 272 | 6.38 KB
dict | 2 | 1.30 KB
wrapper_descriptor | 7 | 560 B
weakref | 3 | 264 B
member_descriptor | 2 | 144 B
code | 1 | 128 B
function (store_info) | 1 | 120 B
cell | 2 | 112 B
getset_descriptor | 1 | 72 B
method_descriptor | 1 | 72 B
tuple | 0 | 8 B
instancemethod | -1 | -80 B
########## START ##########
types | # objects | total size
==================================== | =========== | ============
int | 18546627 | 424.50 MB
list | 23 | 155.51 MB
dict | 38 | 47.89 KB
instance | 188 | 13.22 KB
str | 121 | 6.42 KB
Struct | 10 | 5.75 KB
lxml.etree._Document | 10 | 880 B
lxml.etree._Element | 10 | 720 B
float | 24 | 576 B
set | 2 | 464 B
file | 2 | 288 B
numpy.dtype | 2 | 176 B
<class 'laspy.header.HeaderManager | 2 | 128 B
<class 'laspy.header.Header | 2 | 128 B
I am using multiprocessing module to handle my transforming task here:
https://github.com/KAMI911/lactransformer/blob/master/lactransformer.py
Using https://github.com/KAMI911/lactransformer/blob/master/lib/TransformerWorkflow.py and https://github.com/KAMI911/lactransformer/blob/master/lib/LasPyConverter.py modules. Probably using multiprocessing for laspy process can avoid the leak hence can be a workaround until the developers found the problem. Also I will try memory_tracker with my code. 👍
from laspy.
Thanks for the help - I'll try to find some time this weekend to dig back into this issue.
from laspy.
Just to keep folks posted, I haven't forgotten about this issue, but am pretty strapped for time right now. I'll try to get back to this by next weekend. In the meantime, I'd be curious to know if anyone learns anything about the cause of the problem.
from laspy.
To keep folks posted from my end (e.g., laspy user) my workaround was to write a driver program that uses multiprocessing and the subprocess library to run each las tile in parallel through subprocess (command line) calls to an outside program. The outside program called by subprocess opens (using laspy) a single 1 km x 1 km las tile and rasterizes that point cloud (using numpy and GDAL). The memory is freed when each subprocess finishes, so I was able to process a collection of over >250 1 km x 1 km las tiles this way. This may be helpful information for others interested in using laspy for processing multiple las files while the issue is being resolved.
from laspy.
Yes, great library. I too have recently encountered the following error after I open/close several LAS files consecutively: Error mapping file: [Error 8] Not enough storage is available to process this command
While monitoring my system resources, it appears that memory is allocated when a file is opened (e.g. inFile = laspy.file.File(las_path, mode='r')), but is not released with inFile.close(). If i restart my python shell, the memory is released. Any update on this would greatly appreciated.
from laspy.
This issue definitely needs to be fixed, however I haven't yet had a chance to revisit it. In the meantime, using the multiprocessing trick mentioned by KAMI911 and jeffrywolf should hopefully help (as sub-optimal as this is).
From my initial research, the leak may arise from circular references between several of the worker classes in laspy (Header, HeaderManager, Reader etc.)
from laspy.
I think I've contained the worst of the problem with bb41e5. It looks like there may still be a few bytes around after file close, but it doesn't seem to scale with the number of files processed. I'd be grateful if those of you who've had issues could test out the latest changes to see if you still run into problems (there could certainly be other problematic code paths).
Sorry it's taken me so long to return to the issue.
from laspy.
Does that actually work? You know del is only ever called when the garbage collector is reclaiming that object, right? As far as I can tell, deleting references in del should be no different from the normal behavior of a Python object.
I've been working on Python 3 compatibility for this library for a few days and my branch doesn't seem to have this issue. I only learned about this issue when I noticed that there's been a commit since I started my work.
I'm getting close to being confident that I haven't broken everything so maybe I'll send you a pull request soon? You probably won't like it, I changed a bunch of stuff.
from laspy.
You're right, I didn't continue testing to find the minimal subset of changes sufficient to deal with the leak. Looking again, it turns out that the problem was that I'd defined a del method already, which simply closed the file without doing the rest of the work to match the default object behavior. Refactored here, thanks.
from laspy.
Ah, I deleted those in my branch, as context managers are a better way to make sure files get closed at the right time.
from laspy.
Marking issue resolved, version bumped to 1.4.1, pypi updated
from laspy.
Related Issues (20)
- Appending points to a LasData object HOT 1
- migrate to urllib3 2.0 HOT 2
- Classification in laspy doesn't match with lasinfo HOT 6
- Reclassification of a COPC file HOT 5
- Allow for writing a custom creation date HOT 1
- append_points() function has size limit at approximately 15.1GB? HOT 3
- Ability to read COPC from fsspec sources HOT 2
- LAS/LAZ Header from Remote Source HOT 2
- laszip installation in ubuntu HOT 2
- Append data points to existing LAS file with header that contains extra dimensions. HOT 5
- List of 0 issue HOT 3
- Tests fail: ModuleNotFoundError: No module named 'laspy.cli.core' HOT 11
- laspy installs the 'tests' module globally that would conflict with other packages that accidentally do the same
- The 'rich' dependency is not listed in setup.py HOT 1
- Header update API? HOT 1
- AttributeError: module 'laspy' has no attribute 'read' HOT 7
- Provide pre-built wheels HOT 4
- Conda Forge test failures with Numpy 1.22 + Python 3.9 HOT 6
- Multiple point clouds structure in one LAS file HOT 2
- LAS / LAZ Shuffle in custom attributes in QGIS HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from laspy.