Giter VIP home page Giter VIP logo

Comments (11)

urschrei avatar urschrei commented on June 11, 2024

@purew I believe the perf changes you made have this side effect. Can you take a look?

from polyline.

urschrei avatar urschrei commented on June 11, 2024

The checked_shl() method is available, but it may have perf implications.

from polyline.

mattiZed avatar mattiZed commented on June 11, 2024

The checked_shl() method is available, but it may have perf implications.

Hm, I don't quite understand how that could help here, can you elaborate?

IMHO a check comparing index against the length of chars is necessary before doing chars[index] to prevent this bug from happening - this should be possible using a simple condition and reacting with some kind of error when it is triggered. Or maybe just constrain the loop with an upper bound/switch to a while loop (as done in the function wrapping trans) so it may not run indefinitely. This should hopefully only have a negligible performance impact.

from polyline.

urschrei avatar urschrei commented on June 11, 2024

Hm, I don't quite understand how that could help here, can you elaborate?

Sure.

fn trans(chars: &[u8], mut index: usize) -> Result<(i64, usize), String> {
    let mut at_index;
    let mut shift = 0;
    let mut result = 0;
    let mut byte;
    loop {
        at_index = chars[index];
        byte = (at_index as u64).saturating_sub(63);
        index += 1;
        let to_add = (byte & 0x1f)
            .checked_shl(shift)
            .ok_or(format!("Couldn't decode character at index {}", index - 1))?;
        result |= to_add;
        shift += 5;
        if byte < 0x20 {
            break;
        }
    }
    let coordinate_change = if (result & 1) > 0 {
        !(result >> 1)
    } else {
        result >> 1
    } as i64;
    Ok((coordinate_change, index))
}

However, the performance impact of this change is huge, so it's not feasible:

    Finished bench [optimized] target(s) in 0.12s
     Running benches/benchmarks.rs (target/release/deps/benchmarks-797a7b4ddb4d36ce)
bench encode: 1000 coordinates
                        time:   [77.693 µs 78.226 µs 78.783 µs]
                        change: [-0.0171% +0.2637% +0.6474%] (p = 0.13 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) high mild
  9 (9.00%) high severe

bench decode: 21502 coordinates
                        time:   [11.087 ms 11.100 ms 11.114 ms]
                        change: [+5942.8% +5967.2% +5988.8%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

bench polyline6 decoding
                        time:   [1.4322 µs 1.4341 µs 1.4361 µs]
                        change: [+2978.0% +2987.4% +2996.4%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  8 (8.00%) high mild
  5 (5.00%) high severe

bench HUGE polyline6 decoding
                        time:   [1.2993 ms 1.3056 ms 1.3128 ms]
                        change: [+5460.0% +5504.3% +5549.9%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

IMHO a check comparing index against the length of chars is necessary before doing chars[index] to prevent this bug from happening

Please post some code demonstrating your idea so we can test and benchmark it!

from polyline.

mattiZed avatar mattiZed commented on June 11, 2024

Oh wow, that's quite the impact. Okay, I think I understand a bit better now. I'm not totally up to speed has to how the polyline encoding/decoding works, I will probably do some research on it.

I was thinking something like this - it's basically the current code but the loop {} has been changed to a while {}. What do you think?

I don't have rust environment set up currently, so I wrote this out of thin air:

fn trans(chars: &[u8], mut index: usize) -> Result<(i64, usize), String> {
    let mut at_index;
    let mut shift = 0;
    let mut result = 0;
    let mut byte;
    let mut valid = False;
    while index < chars.len() {
        at_index = chars[index];
        byte = (at_index as u64).saturating_sub(63);
        index += 1;
        result |= (byte & 0x1f) << shift;
        shift += 5;
        if byte < 0x20 {
            valid = True;
            break;
        }
    }

    if not valid {
      // this would mean that we have terminated above loop
      // but not because of the original condition on "byte < 0x20",
      // so this should be treated as an error
      Err( ? )
    }

    let coordinate_change = if (result & 1) > 0 {
        !(result >> 1)
    } else {
        result >> 1
    } as i64;
    Ok((coordinate_change, index))
}

Edit: slightly refactored.

from polyline.

urschrei avatar urschrei commented on June 11, 2024

That doesn't catch the error, unfortunately:

fn trans(chars: &[u8], mut index: usize) -> Result<(i64, usize), String> {
    let mut at_index;
    let mut shift = 0;
    let mut result = 0;
    let mut byte;
    let mut valid = false;
    while index < chars.len() {
        at_index = chars[index];
        byte = (at_index as u64).saturating_sub(63);
        index += 1;
        result |= (byte & 0x1f) << shift;
        shift += 5;
        if byte < 0x20 {
            valid = true;
            break;
        }
    }

    if !valid {
        // this would mean that we have terminated above loop
        // but not because of the original condition on "byte < 0x20",
        // so this should be treated as an error
        return Err("Couldn't shift".to_string());
    }

    let coordinate_change = if (result & 1) > 0 {
        !(result >> 1)
    } else {
        result >> 1
    } as i64;
    Ok((coordinate_change, index))
}

The problem is this line: result |= (byte & 0x1f) << shift;

from polyline.

mattiZed avatar mattiZed commented on June 11, 2024

While I'm aware that this does not fix the actual error per-se, it should at least handle the binary crashing violently, right? I understand that it would be desirable to also know the exact character that is invalid though.

Currently, the main thing thats bugging me is that certain input to the pypolyline.cutil.decode_polyline function causes the rust binary to crash which then also kills the python interpreter, so there is no way of handling this problem in some way.

This is not much different than the "solution" I proposed earlier, but might further clarify the situation:

fn trans(chars: &[u8], mut index: usize) -> Result<(i64, usize), String> {
    let mut at_index;
    let mut shift = 0;
    let mut result = 0;
    let mut byte;
    loop {
        if index >= chars.len() {
           Err("Cannot decode polyline".to_string())
        }
        at_index = chars[index];
        byte = (at_index as u64).saturating_sub(63);
        index += 1;
        result |= (byte & 0x1f) << shift;
        shift += 5;
        if byte < 0x20 {
            break;
        }
    }

    let coordinate_change = if (result & 1) > 0 {
        !(result >> 1)
    } else {
        result >> 1
    } as i64;
    Ok((coordinate_change, index))
}

from polyline.

urschrei avatar urschrei commented on June 11, 2024

Both of your solutions work with release optimisations, and have negligible perf impact. But they don't work with test optimisations (running cargo test) presumably due to overflow checks, so we need to figure out a way around that.

There's no need to worry about the pypolyline crash – once we're able to correctly return a Result from trans the error will bubble up to the wrapper and pypolyline will do the right thing.

from polyline.

mattiZed avatar mattiZed commented on June 11, 2024

Ah, okay! So if I read you correctly, cargo detects a possible overflow on byte due to the repeated ORing? Then I also understand better why you'd rather fix the root of the problem. I'll try to think of a solution as well.

from polyline.

mattiZed avatar mattiZed commented on June 11, 2024

What about something like this?

fn trans(chars: &[u8], mut index: usize) -> Result<(i64, usize), String> {
    let mut at_index;
    let mut shift = 0;
    let mut result = 0;
    let mut byte;
    loop {
        at_index = chars[index];
        byte = (at_index as u64).saturating_sub(63);
        index += 1;
        // Protect against overflow by checking if any bits would be moved out of the u64
        if ((byte & 0x1f) >> (64 - shift)) != 0 {
            Err(format!("Couldn't decode character at index {}", index - 1))
        }
        result |= (byte & 0x1f) << shift;
        shift += 5;
        if byte < 0x20 {
            break;
        }
    }

    let coordinate_change = if (result & 1) > 0 {
        !(result >> 1)
    } else {
        result >> 1
    } as i64;
    Ok((coordinate_change, index))
}

Edit:

I haven't fully read up on the logic of polyline decoding, but probably we just need to check if (64 - shift) > 5, e.g. if another shift by 5 is still possible without overflow?

loop {
    at_index = chars[index];
    byte = (at_index as u64).saturating_sub(63);
    result |= (byte & 0x1f) << shift;

    if byte < 0x20 {
        break;
    }

    index += 1;
    shift += 5;

    if shift > (64 - 5) {
        Err(format!("Couldn't decode character at index {}", index - 1))
    }
}

I have prepared a PR: #37

from polyline.

urschrei avatar urschrei commented on June 11, 2024

This has been fairly comprehensively addressed by #42, #43, #45, #47.

from polyline.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.