Giter VIP home page Giter VIP logo

Comments (8)

GoogleCodeExporter avatar GoogleCodeExporter commented on September 18, 2024
I've made some changes to reduce memory fragmentation which helps a little. The 
main user of memory is the lookup table to convert node ids to lat/lon when 
reading the ways. 

Currently all the nodes in the osm file are read to memory (a std::vector of 
{int id; long x,y;}).  This takes 12 bytes per node. France has 122 million 
nodes and this works.out at c.1.5 GB). It is reallocating this array where 
preprocessing first runs out of memory. I can improve things a little by 
reserve() ing the space for these nodes,  and France then compiles on my 2GB 
machine.

Other space optimisations could be:
    Read the file in two passes, work out which nodes are needed in the first pass, and then load these nodes only in the 2nd. The current drawing code uses about 80% of the nodes though so savings would be modest.

  As above, but allow processing in batches. eg. Pass 1 read first 1M ways. Pass 2 load nodes for these ways and write ways to temp file. Repeat until all ways read. Then load all ways from temp file, sort, index, and write final db. 

I think the second of these is the only way to read an arbitrarily large osm 
file (eg europe or planet), but it would be slow traversing all nodes multiple 
times. 

Any thoughts?

Original comment by [email protected] on 14 Feb 2011 at 3:49

  • Changed state: Started

from monav.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 18, 2024
This seems to help, the preprocessing gets further than before, but does not 
finish , yet.

[11:05:43][0] 15800000  ways with  133634762 nodes 

is as far as it goes.

Besides implementing a multi-pass approach you could try:

 - Store only IDs of nodes for each way instead of coordinates and do the coordinate lookup on-thy-fly. When splitting ways you have to insert new nodes.
 - Use a vector that uses realloc for PODS instead of allocate + copy ( e.g. QVector if I am not mistaken ). STL vectors have some bad design choices ( resize not using realloc, resize doubling the vector size, no shrink, clear not freeing memory etc ). Manual resizing might also squeeze out some more memory ( my dynamic graph in the Contraction Hierarchies plugin does this; it checks whether enough space is reserved and if not only enlarges by a small factor, e.g. 1.2, instead of 2 )

The multi-pass approach might be nice have have in the future, however, since 
you have to load all ways into memory at some point your memory savings are 
limited,


Original comment by [email protected] on 15 Feb 2011 at 11:05

from monav.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 18, 2024
What do you mean by doing "coordinate lookup on-the-fly". The only efficient 
way I could see of going from node-id --> coordinate is to look it up in an 
in-memory std::map or equivalent. It could be looked up from disk, but that 
would be really slow (worst case scenario polling the osm.bz2 file, or a bit 
faster from a custom generated file). Am I missing something obvious?

As far as doing a multi-pass approach goes it wouldn't ever be necessary to 
load all ways into memory. They can be processed one by one, and then 
bucket-sorted to one of a number of temporary files each of which will fit into 
memory for sorting and indexing. It would be slow though.

Original comment by [email protected] on 15 Feb 2011 at 2:24

from monav.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 18, 2024
With on-the-fly lookup I mean the following:

Right now you store for each way its list of coordinates in the global node 
array for all ways. Instead you could just store the nodes' ID and later when 
you want to access a node's coordinates you perform the node-lookup with the 
binary search.
Of course, this slows down the whole process if you have to access the way's 
coordinates more than once.

By the way, the binary search is way more efficient then a std::map, as the 
std::map is internally a fully fledged self-balancing binary tree. This usually 
means an overhead of at least 8 bytes per entry.

Original comment by [email protected] on 15 Feb 2011 at 2:31

from monav.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 18, 2024
This issue was closed by revision f4c62ed6fc75.

Original comment by [email protected] on 21 May 2011 at 2:12

  • Changed state: Fixed

from monav.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 18, 2024
It turns out none of the more minor changes you suggested allowed a large 
enough increase in file size, so I've implemented a multi-pass preprocessor. It 
scales poorly with increasing file size, but can handle France easeily, and 
even the whole of Europe in under 2h with 2GB of RAM which should be OK.

N.B. Processing the whole of Europe isn't actually useful at the moment as the 
resulting database file size (>4GB) breaks some assumptions in the client code, 
and would need a new DB format to fix.

Original comment by [email protected] on 21 May 2011 at 2:17

from monav.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 18, 2024
Ah, I just hit this issue while processing Europe. The last comment shows that 
it's a known issue. Maybe make a new bug out of it?

Original comment by [email protected] on 18 Jul 2011 at 12:51

from monav.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 18, 2024
Yeah, just go ahead and file a new bug report for that, I'll assign it to James.

Original comment by [email protected] on 18 Jul 2011 at 1:01

from monav.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.