astro / rust-osm-pbf-iter Goto Github PK
View Code? Open in Web Editor NEWParse OpenStreetMap .pbf dumps while trying to avoid copying
Parse OpenStreetMap .pbf dumps while trying to avoid copying
Would you mind tagging a new release, perhaps 0.2.1
? Then, people could easier use the commits you made in August 2022.
Seems that #2 and #3 are related, I'll try to explain some debugging I did here.
It seems that let role_sid = self.roles_sid.next()? as usize;
is not decoding the varint value right
If I print self.roles_sid.data
(that's a private var but I changed the code to see it here), the values show ok when printing.
I tried a lot of different things, the only one that worked is changing the i32 to u32 in this line https://github.com/astro/rust-osm-pbf-iter/blob/master/src/parse/relation.rs#L44 then that workaround makes the other 2 issues described go away,
But then also after testing some more, the memids are wrong, and if I change them from i64 to u64, that doesn't work. I think is because the numbers are larger than 7 bits, so the sign bit gets involved.
I think that maybe the issue is actually when trying to decode signed packedvarints. I tried to solve it by changing the parse_varint code to something like this https://github.com/dominictarr/signed-varint/blob/master/index.js#L8 but actually have almost no idea on how to do that.
Hi @astro, I'm seeing that I can't get all members from a relation by using the following code, based on one of the examples
...
let routetypes_all = ["train", "subway", "monorail", "tram", "light_rail", "bus", "trolleybus"];
loop {
let blob = match req_rx.recv() {
Ok(blob) => blob,
Err(_) => break,
};
let data = blob.into_data();
let primitive_block = PrimitiveBlock::parse(&data);
for primitive in primitive_block.primitives() {
match primitive {
Primitive::Relation(relation) => {
let routetag = relation.tags().find(|&kv| kv.0 == "route");
let nametag = relation.tags().find(|&kv| kv.0 == "name");
if routetag != None && routetypes_all.contains(&routetag.unwrap().1) && nametag != None {
if relation.members().count() == 0 {
print!("{:?}\n", relation.members().memid);
}
...
that prints these ids (for this dump wget http://download.geofabrik.de/south-america/ecuador-latest.osm.pbf)
2024373
2030180
9340444
and if I go to for example to see the first one returned https://www.openstreetmap.org/relation/2024373 you can see that it has members, but they are not returned in the relation.members() iterator
Also this is the same but printing directly the relation
Relation { id: 2024373, info: Some(Info { version: Some(24), timestamp: Some(1554993625), changeset: Some(0), uid: Some(0), user: Some(""), visible: None }), tags_iter: { from="Terminal Bastión Popular", name="B13 Mucho Lote - Guamote Retorno", network="Metrovía", operator="Fundación Municipal Transporte Masivo Urbano de Guayaquil", ref="B13", route="bus", state="connection", to="Mucho Lote", type="route" }, rels_iter: { } }
Relation { id: 2030180, info: Some(Info { version: Some(14), timestamp: Some(1554994171), changeset: Some(0), uid: Some(0), user: Some(""), visible: None }), tags_iter: { from="Terminal Bastión Popular", name="B1 Pascuales Ida", network="Metrovía", operator="Fundación Municipal Transporte Masivo Urbano de Guayaquil", ref="B1", route="bus", state="connection", to="Pascuales", type="route" }, rels_iter: { } }
Relation { id: 9340444, info: Some(Info { version: Some(3), timestamp: Some(1550910481), changeset: Some(0), uid: Some(0), user: Some(""), visible: None }), tags_iter: { from="Durán", name="Durán - Quito", operator="Tren Ecuador", public_transport:version="2", ref="Tren a las Nubes", route="train", to="Quito", type="route" }, rels_iter: { } }
How can we fix this? I don't know if it's an error in my code, but I think is in this library, around the members() iterator next() method maybe https://github.com/astro/rust-osm-pbf-iter/blob/master/src/parse/relation.rs#L51
or maybe when the different parts of the iterator are parsed from the pbf data https://github.com/astro/rust-osm-pbf-iter/blob/master/src/parse/relation.rs#L110
you can also try this repo to see the example running https://github.com/cualbondi/osmptparser
Let me know if I can help on this in any other way :)
Hello 🦀,
we (Rust group @sslab-gatech) found a memory-safety/soundness issue in this crate while scanning Rust code on crates.io for potential vulnerabilities.
rust-osm-pbf-iter/src/blob_reader.rs
Lines 35 to 41 in cb7a8bd
rust-osm-pbf-iter/src/blob_reader.rs
Lines 47 to 53 in cb7a8bd
BlobReader::<R>::read_blob()
method creates an uninitialized buffer and passes it to user-provided Read
implementation (two times within the same function). This is unsound, because it allows safe Rust code to exhibit an undefined behavior (read from uninitialized memory).
This part from the Read
trait documentation explains the issue:
It is your responsibility to make sure that
buf
is initialized before callingread
. Calling read with an uninitializedbuf
(of the kind one obtains viaMaybeUninit<T>
) is not safe, and can lead to undefined behavior.
The Naive & safe way to fix the issue is to always zero-initialize a buffer before lending it to a user-provided Read
implementation. Note that this approach will add runtime performance overhead of zero-initializing the buffer.
As of Jan 2021, there is not yet an ideal fix that works in stable Rust with no performance overhead. Below are links to relevant discussions & suggestions for the fix.
When iterating over the OSM planet dump with count.rs, the counts seem too low. Does the library skip some blobs?
$ aria2c https://planet.openstreetmap.org/pbf/planet-231225.osm.pbf.torrent
$ target/release/count planet-231225.osm.pbf
Open planet-231225.osm.pbf
Processed 74023 MB in 93.18 seconds (794.38 MB/s)
planet-231225.osm.pbf - 445152000 nodes, 117960000 ways, 73800 relations
Expected same counts as reported by taginfo or OSM stats: 8848241030 nodes, 988662259 ways, 11658204 relations
Hi, could you publish the latest code to crates.io?
This is because I was going to upload there a package I made that depends on yours, and crates won't let me use github as a source for the osm-pbf-iter dependency.
Thanks! and again awesome work!
Hi, great project!! I tried many pbf parsers, and I found yours was the best for parallelizing the job.
I think I found a bug when trying to get the role of the member of a relation. This is the code, based on one of the examples:
...
let primitive_block = PrimitiveBlock::parse(&data);
for primitive in primitive_block.primitives() {
match primitive {
Primitive::Relation(relation) => {
let routetag = relation.tags().find(|&kv| kv.0 == "route");
let nametag = relation.tags().find(|&kv| kv.0 == "name");
if routetag != None && routetypes_all.contains(&routetag.unwrap().1) && nametag != None {
// condicion para saber si esta relation es un public transport
let mut rd = RelationData {
name: nametag.unwrap().1.to_string(),
ways: HashMap::new(),
stops: HashMap::new(),
fixed_way: Vec::new(),
};
for member in relation.members() {
// member = (role: &str, id: u64, type: RelationMemberType)
print!("'{:?}' - '{:?}'\n", wayroles, member.0);
...
that prints the next thing
''["", "forward", "backward", "alternate"]' - '"Parc National Cerros de Amotape"
''["", "forward", "backward", "alternate"]' - '"Parc National Cerros de Amotape"
''["", "forward", "backward", "alternate"]' - '"wikipedia"
''["", "forward", "backward", "alternate"]' - '"wikipedia"
''["", "forward", "backward", "alternate"]' - '"wikipedia"
''["", "forward", "backward", "alternate"]' - '"wikipedia"
''["", "forward", "backward", "alternate"]' - '"Parc National Cerros de Amotape"
''["", "forward", "backward", "alternate"]' - '"wikipedia"
''["", "forward", "backward", "alternate"]' - '"wikipedia"
''["", "forward", "backward", "alternate"]' - '"wikipedia"
''["", "forward", "backward", "alternate"]' - '"wikipedia"
''["", "forward", "backward", "alternate"]' - '"wikipedia"
''["", "forward", "backward", "alternate"]' - '"wikipedia"
''["", "forward", "backward", "alternate"]' - '"wikipedia"
''["", "forward", "backward", "alternate"]' - '"wikipedia"
''["", "forward", "backward", "alternate"]' - '"wikipedia"
''["", "forward", "backward", "alternate"]' - '"wikipedia"
''["", "forward", "backward", "alternate"]' - '"wikipedia"
''["", "forward", "backward", "alternate"]' - '"wikipedia"
''["", "forward", "backward", "alternate"]' - '"en realidad, “stop_123” no se usa más"
''["", "forward", "backward", "alternate"]' - '"network"
''["", "forward", "backward", "alternate"]' - '"network"
''["", "forward", "backward", "alternate"]' - '"network"
the data should be some of ["", "forward", "backward", "alternate"], instead of "wikipedia" / "network" / etc.
this is sample data directly from osm web https://www.openstreetmap.org/api/0.6/relation/6311444
<osm version="0.6" generator="CGImap 0.7.5 (21106 thorn-02.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/">
<relation id="6311444" visible="true" version="44" changeset="56610589" timestamp="2018-02-23T13:49:43Z" user="lorandr_telenav" uid="4973384">
<member type="node" ref="4240686706" role=""/>
<member type="way" ref="424676728" role=""/>
<member type="way" ref="424682360" role=""/>
<member type="way" ref="563644959" role=""/>
<member type="way" ref="317307744" role=""/>
<member type="way" ref="563644958" role=""/>
<member type="way" ref="445348482" role=""/>
<member type="way" ref="538015362" role=""/>
<member type="way" ref="299379192" role=""/>
<member type="way" ref="231880904" role=""/>
<member type="way" ref="538015386" role=""/>
<member type="way" ref="538015383" role=""/>
<member type="way" ref="231880905" role=""/>
<member type="way" ref="538015379" role=""/>
<member type="way" ref="538015365" role=""/>
<member type="way" ref="231880903" role=""/>
Do you know what can be causing this? or maybe I'm using the wrong code to get the role
of a member?
The code to get the role seems to be this one https://github.com/astro/rust-osm-pbf-iter/blob/master/src/parse/relation.rs#L57 I couldn't really understand it, let me know if I can help you in some way
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.