Comments (11)
@purew I believe the perf changes you made have this side effect. Can you take a look?
from polyline.
The checked_shl()
method is available, but it may have perf implications.
from polyline.
The
checked_shl()
method is available, but it may have perf implications.
Hm, I don't quite understand how that could help here, can you elaborate?
IMHO a check comparing index
against the length of chars
is necessary before doing chars[index]
to prevent this bug from happening - this should be possible using a simple condition and reacting with some kind of error when it is triggered. Or maybe just constrain the loop with an upper bound/switch to a while loop (as done in the function wrapping trans
) so it may not run indefinitely. This should hopefully only have a negligible performance impact.
from polyline.
Hm, I don't quite understand how that could help here, can you elaborate?
Sure.
fn trans(chars: &[u8], mut index: usize) -> Result<(i64, usize), String> {
let mut at_index;
let mut shift = 0;
let mut result = 0;
let mut byte;
loop {
at_index = chars[index];
byte = (at_index as u64).saturating_sub(63);
index += 1;
let to_add = (byte & 0x1f)
.checked_shl(shift)
.ok_or(format!("Couldn't decode character at index {}", index - 1))?;
result |= to_add;
shift += 5;
if byte < 0x20 {
break;
}
}
let coordinate_change = if (result & 1) > 0 {
!(result >> 1)
} else {
result >> 1
} as i64;
Ok((coordinate_change, index))
}
However, the performance impact of this change is huge, so it's not feasible:
Finished bench [optimized] target(s) in 0.12s
Running benches/benchmarks.rs (target/release/deps/benchmarks-797a7b4ddb4d36ce)
bench encode: 1000 coordinates
time: [77.693 µs 78.226 µs 78.783 µs]
change: [-0.0171% +0.2637% +0.6474%] (p = 0.13 > 0.05)
No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) high mild
9 (9.00%) high severe
bench decode: 21502 coordinates
time: [11.087 ms 11.100 ms 11.114 ms]
change: [+5942.8% +5967.2% +5988.8%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
bench polyline6 decoding
time: [1.4322 µs 1.4341 µs 1.4361 µs]
change: [+2978.0% +2987.4% +2996.4%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
8 (8.00%) high mild
5 (5.00%) high severe
bench HUGE polyline6 decoding
time: [1.2993 ms 1.3056 ms 1.3128 ms]
change: [+5460.0% +5504.3% +5549.9%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
IMHO a check comparing index against the length of chars is necessary before doing chars[index] to prevent this bug from happening
Please post some code demonstrating your idea so we can test and benchmark it!
from polyline.
Oh wow, that's quite the impact. Okay, I think I understand a bit better now. I'm not totally up to speed has to how the polyline encoding/decoding works, I will probably do some research on it.
I was thinking something like this - it's basically the current code but the loop {}
has been changed to a while {}
. What do you think?
I don't have rust environment set up currently, so I wrote this out of thin air:
fn trans(chars: &[u8], mut index: usize) -> Result<(i64, usize), String> {
let mut at_index;
let mut shift = 0;
let mut result = 0;
let mut byte;
let mut valid = False;
while index < chars.len() {
at_index = chars[index];
byte = (at_index as u64).saturating_sub(63);
index += 1;
result |= (byte & 0x1f) << shift;
shift += 5;
if byte < 0x20 {
valid = True;
break;
}
}
if not valid {
// this would mean that we have terminated above loop
// but not because of the original condition on "byte < 0x20",
// so this should be treated as an error
Err( ? )
}
let coordinate_change = if (result & 1) > 0 {
!(result >> 1)
} else {
result >> 1
} as i64;
Ok((coordinate_change, index))
}
Edit: slightly refactored.
from polyline.
That doesn't catch the error, unfortunately:
fn trans(chars: &[u8], mut index: usize) -> Result<(i64, usize), String> {
let mut at_index;
let mut shift = 0;
let mut result = 0;
let mut byte;
let mut valid = false;
while index < chars.len() {
at_index = chars[index];
byte = (at_index as u64).saturating_sub(63);
index += 1;
result |= (byte & 0x1f) << shift;
shift += 5;
if byte < 0x20 {
valid = true;
break;
}
}
if !valid {
// this would mean that we have terminated above loop
// but not because of the original condition on "byte < 0x20",
// so this should be treated as an error
return Err("Couldn't shift".to_string());
}
let coordinate_change = if (result & 1) > 0 {
!(result >> 1)
} else {
result >> 1
} as i64;
Ok((coordinate_change, index))
}
The problem is this line: result |= (byte & 0x1f) << shift;
from polyline.
While I'm aware that this does not fix the actual error per-se, it should at least handle the binary crashing violently, right? I understand that it would be desirable to also know the exact character that is invalid though.
Currently, the main thing thats bugging me is that certain input to the pypolyline.cutil.decode_polyline
function causes the rust binary to crash which then also kills the python interpreter, so there is no way of handling this problem in some way.
This is not much different than the "solution" I proposed earlier, but might further clarify the situation:
fn trans(chars: &[u8], mut index: usize) -> Result<(i64, usize), String> {
let mut at_index;
let mut shift = 0;
let mut result = 0;
let mut byte;
loop {
if index >= chars.len() {
Err("Cannot decode polyline".to_string())
}
at_index = chars[index];
byte = (at_index as u64).saturating_sub(63);
index += 1;
result |= (byte & 0x1f) << shift;
shift += 5;
if byte < 0x20 {
break;
}
}
let coordinate_change = if (result & 1) > 0 {
!(result >> 1)
} else {
result >> 1
} as i64;
Ok((coordinate_change, index))
}
from polyline.
Both of your solutions work with release optimisations, and have negligible perf impact. But they don't work with test optimisations (running cargo test
) presumably due to overflow checks, so we need to figure out a way around that.
There's no need to worry about the pypolyline
crash – once we're able to correctly return a Result
from trans
the error will bubble up to the wrapper and pypolyline
will do the right thing.
from polyline.
Ah, okay! So if I read you correctly, cargo detects a possible overflow on byte
due to the repeated OR
ing? Then I also understand better why you'd rather fix the root of the problem. I'll try to think of a solution as well.
from polyline.
What about something like this?
fn trans(chars: &[u8], mut index: usize) -> Result<(i64, usize), String> {
let mut at_index;
let mut shift = 0;
let mut result = 0;
let mut byte;
loop {
at_index = chars[index];
byte = (at_index as u64).saturating_sub(63);
index += 1;
// Protect against overflow by checking if any bits would be moved out of the u64
if ((byte & 0x1f) >> (64 - shift)) != 0 {
Err(format!("Couldn't decode character at index {}", index - 1))
}
result |= (byte & 0x1f) << shift;
shift += 5;
if byte < 0x20 {
break;
}
}
let coordinate_change = if (result & 1) > 0 {
!(result >> 1)
} else {
result >> 1
} as i64;
Ok((coordinate_change, index))
}
Edit:
I haven't fully read up on the logic of polyline decoding, but probably we just need to check if (64 - shift) > 5
, e.g. if another shift by 5 is still possible without overflow?
loop {
at_index = chars[index];
byte = (at_index as u64).saturating_sub(63);
result |= (byte & 0x1f) << shift;
if byte < 0x20 {
break;
}
index += 1;
shift += 5;
if shift > (64 - 5) {
Err(format!("Couldn't decode character at index {}", index - 1))
}
}
I have prepared a PR: #37
from polyline.
This has been fairly comprehensively addressed by #42, #43, #45, #47.
from polyline.
Related Issues (14)
- Write docs HOT 1
- Publish to crates HOT 1
- Set up tests on travis.ci
- Set up documentation deployment HOT 4
- Reconsider encode_coordinates interface HOT 2
- Re-export the geo_types
- crates publishing team HOT 2
- Adopt a similar strategy to proj for geo-types compatibility
- panicked at 'attempt to subtract with overflow', src/lib.rs:129:20 HOT 2
- Add "Geospatial" category to `Cargo.toml`
- perf gap with flexpolyline HOT 2
- Error handling of bad polyline still broken HOT 3
- noisy benchmarks HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polyline.