Giter VIP home page Giter VIP logo

base64-bytestring's People

Contributors

23skidoo avatar andersk avatar andreasabel avatar bos avatar emilypi avatar fisx avatar hvr avatar juhp avatar justinwatt avatar meiersi avatar ocheron avatar qrilka avatar sgraf812 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

base64-bytestring's Issues

Add support for base64url with no padding

RFC4648 Section 3.2:

Padding of Encoded Data

In some circumstances, the use of padding ("=") in base-encoded data
is not required or used. In the general case, when assumptions about
the size of transported data cannot be made, padding is required to
yield correct decoded data.

Implementations MUST include appropriate pad characters at the end of
encoded data unless the specification referring to this document
explicitly states otherwise.

As an example of such a specification is RFC7049 (section 2.4.4.2) which has this little snippet:

These three tag types suggest conversions to three of the base data
encodings defined in [RFC4648]. For base64url encoding, padding is
not used (see Section 3.2 of RFC 4648); that is, all trailing equals
signs ("=") are removed from the base64url-encoded string.

So to support such specifications it would be convenient for the Base64.URL modules to provide variants of encode and decode that produce and expect no padding. It's actually especially useful to have the decoder variant since working around the lack of direct support on the encode side is easy, but adding back the correct amount of padding is more involved and expensive.

Massive performance slowdown in 0.1.1.0

Hi Bryan,

I got a report from Kirill of a massive performance problem in Yesod.
If sessions were turned on, on my system, req/sec went from 6000 to
200. I checked clientsession, and found it to be the culprit: encoding
a minimal payload requires a few milliseconds. Felipe checked the
recent changes, and localized it to the most recent release of
base64-bytestring. I put together a simple benchmark:

import Data.ByteString.Base64
import Data.ByteString.Char8 (pack)
import Criterion.Main

main :: IO ()
main = defaultMain
   [ bench "encode" $ whnf encode $ pack "qwerty"
   ]

On version 0.1.0.3, this takes 229.4312 ns. On 0.1.1.0, it takes
3.556598 ms. It looks like the problem is coming from the recent
addition of URL encoding
(f1916d8).

As a temporary workaround, I'm planning on adding an upper bound on
the base64-bytestring dependency in clientsession, so we shouldn't
have any immediate issues, but obviously it would be best if we didn't
have to put restrictive upper bounds in.

Thanks,
Michael

Rename `decode` to `decodeStrict`, and deprecate `decode`?

I only looked into the RFC after I ran into a production issue, and learned that base64 does not allow line breaks, except it's allowed by some other standard document. To help people like me in the future, I would like to do any of these (please check the ones you'd accept as PRs):

  • add a line of haddocks explaining that decode does not allow non-alphabet characters, not even line breaks, and users should consider using decodeLenient.
  • implement decodeStrict.as an alias to decode.
  • deprecate decode.

I'm in favor of making decodeLenient the default in a distant future because I don't see any security problems. I'm not sure about which of the two available options is faster, but lenient decoding has the benefit of not allocating the input as a strict bytestring.

Thanks!

Support for rejecting non-canonical encodings

Consider the following Base64-encoded string: "ZE==". What is the correct result of decoding the string "ZE=="?

Answer: It is not valid Base64, but it still satisfies the decoder's understanding of Base64 encoded data. Unfortunately, there is no way to construct such a result from binary, which leads to confusion - the decoder in base64-bytestring is not smart enough to differentiate such data. In fact, this value never round trips:

П> decode "ZE=="
Right "d"
П> encode "d"
"ZA=="
П> fmap encode (decode "ZE==")
Right "ZA=="

A more correct implementation should fail with an "invalid input" error. Or we can leave it as is and leave a note about the support status for "impossible by construction" inputs to the decoder.

Add Head validations for correct padding

The code that validates the correctness of padding in the last two chars of Base64Url-encoded bytestring needs a refactor, and we must make sure all bases are covered so that the following invariant holds:

\x -> ((e2m $ B64.decodePadded x) <|> (e2m $ B64.decodeUnpadded x)) == (e2m $ B64.decode x)

where

e2m = either (const Nothing) Just

1.1.0.0 release planning

I propose to cut the next minor release once the fixes for #18 (for which there's a PR) and #24 (which @hvr is working on - his patch actually has a bit bigger scope than described in that ticket) are merged.

Since there will be API additions, version number should be 1.0.1.0.

Refactor and Expand Test Coverage

The tests are looking a little grody after the recent coverage hackathon. I'd like to refactor these and modernize both the property-checking code, as well as the unit tests.

`joinWith` does not always terminate the input

Hello,

I just noticed that joinWith only terminates the input when its length is a multiple of the separator. Here is an example:

ghci> unpack $ joinWith (pack [0]) 64 $ pack [1]
[1]

Notice that there is no 0 at the end. I am not sure if this was intentional but, if so, then we should clarify the documentation.

As a data point, in my use case I was hoping that the input would always be terminated, even if the last chunk is shorter than the rest.

-Iavor

Poor performance

decodeLenient might be performing badly. Here's a profile:

  checkCerts Network.HTTP.Conduit.Manager                        3903          19   0.0    0.0    29.5   59.4
   defaultCheckCerts Network.HTTP.Conduit.Manager                3909          19   0.0    0.0    29.5   59.4
    certificateVerifyChain Network.TLS.Extra.Certificate         3912          19   0.0    0.0    29.5   59.4
     certificateVerifyChain_ Network.TLS.Extra.Certificate       3914          38  10.9   24.7    29.5   59.4
      certificateVerifyAgainst Network.TLS.Extra.Certificate     4583          38   0.0    0.0     0.1    0.1
       verifyF Network.TLS.Extra.Certificate                     4584          38   0.0    0.0     0.1    0.1
        rsaVerify Network.TLS.Extra.Certificate                  4586          38   0.1    0.1     0.1    0.1
      decodeUtf8With'/isComplete Data.Text.Lazy.Encoding         3998        7676   0.0    0.0     0.0    0.0
      certMatchDN Network.TLS.Extra.Certificate                  3987       12426   0.1    0.0     0.1    0.0
      mapBuilder Data.Serialize.Builder                          3985      290738   0.1    0.1     0.1    0.1
      flush Data.Serialize.Builder                               3984      290738   0.1    0.1     0.1    0.1
      decodeLenient Data.ByteString.Base64                       3973      581476   0.7    1.3    18.1   34.3
       decodeLenient/fill Data.ByteString.Base64                 3974    23093720   8.6    8.9    17.4   33.0
        poke8 Data.ByteString.Base64                             3983    13827326   0.8    0.8     0.8    0.8
        dValue Data.ByteString.Base64                            3982    27673576   0.5    1.7     0.5    1.7
        dNext Data.ByteString.Base64                             3981    18443756   0.6    2.4     0.6    2.4
        decodeLenient/look Data.ByteString.Base64                3975    55354296   5.2   14.7     6.9   19.2
         peek8 Data.ByteString.Base64                            3976    36902864   1.7    4.5     1.7    4.5

incorrect decoding on GHC 9.0.1

Originally reported as decoding errors in frasertweedale/hs-jose#102.

Incorrect decoding behavior when sequencing decodes (via Applicative or Monad).
Observed with GHC 9.0.1 on Linux x86-64 and Mac OS X x86-64.

ghci> import qualified Data.ByteString.Base64.URL as B64U
ghci> :set -XOverloadedStrings
ghci> emptyObj = "e30" :: B.ByteString  -- base64url encoding of "{}"
ghci> (,) <$> B64U.decodeUnpadded emptyObj <*> B64U.decodeUnpadded emptyObj :: Either String (B.ByteString, B.ByteString)
Right (" \161","{}")

Use lazy ByteStrings

I would like to be able to use decodeLenient in a streaming style, so that the whole string doesn't need to be in memory at one time.

However, the strict ByteString type is used, which means the whole string must be in memory at once.

Please use lazy bytestrings instead.

Looming regression with 8.10

Encoding takes a serious performance hit with the release candidate of 8.10. See https://gitlab.haskell.org/ghc/ghc/issues/17653

While the root cause seems to be a regression within GHC, it's easily fixed by adding a few bangs to encodeWith. (See also the above ticket). Given that a fix might not make it into 8.10 working around it seems reasonable.

Suggestion: introduce a Base64 newtype

It'd be nice if base64-bytestring exported a canonical type

newtype Base64 = Base64 { toByteString :: ByteString }
    deriving ( Eq, Ord, Show, IsString )

where a regular, full-binary string is stored but the type suggests a base64 serialization.

import qualified Data.ByteString.Base64 as S64

instance ToJSON S64.Base64 where
  toJSON (S64.Base64 bs) = toJSON (S64.encode bs)

I bring this up because I repeatedly reinvent this type when writing parsers and printers and would prefer to have a single source for it.

relax bytestring version constraint

Hey, i'm testing building stuff with ghc 7.5, and things build fine if the
bytestring version constraint is relaxed to being >=0.9.0 rather than == 0.9.*

Fails When String Includes Byte Order Mark

I'm working with an app where users are expected to upload a CSV file. When the file contains a BOM, decoding the base64 string fails. This example base64 string includes a byte order mark: 77u/SGVhZGVyIDEsSGVhZGVyIDIsSGVhZGVyIDMNCkRhdGEgMS4xLERhdGEgMS4yLERhdGEgMS4zDQpEYXRhIDIuMSxEYXRhIDIuMixEYXRhIDIuMw0K. The data should look like this:

Header 1,Header 2,Header 3
Data 1.1,Data 1.2,Data 1.3
Data 2.1,Data 2.2,Data 2.3

But when I try to decode it, I get this result: Left "invalid character at offset: 3".
If I try to decode it without the BOM, it works.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.