haskell / base64-bytestring Goto Github PK

View Code? Open in Web Editor NEW

45.0 45.0 28.0 181 KB

Fast base64 encoding and decoding for Haskell.

Home Page: http://hackage.haskell.org/package/base64-bytestring

License: Other

Haskell 99.51% Python 0.49%

base64-bytestring's People

Contributors

Stargazers

Watchers

base64-bytestring's Issues

Add support for base64url with no padding

RFC4648 Section 3.2:

Padding of Encoded Data

In some circumstances, the use of padding ("=") in base-encoded data
is not required or used. In the general case, when assumptions about
the size of transported data cannot be made, padding is required to
yield correct decoded data.

Implementations MUST include appropriate pad characters at the end of
encoded data unless the specification referring to this document
explicitly states otherwise.

As an example of such a specification is RFC7049 (section 2.4.4.2) which has this little snippet:

These three tag types suggest conversions to three of the base data
encodings defined in [RFC4648]. For base64url encoding, padding is
not used (see Section 3.2 of RFC 4648); that is, all trailing equals
signs ("=") are removed from the base64url-encoded string.

So to support such specifications it would be convenient for the Base64.URL modules to provide variants of encode and decode that produce and expect no padding. It's actually especially useful to have the decoder variant since working around the lack of direct support on the encode side is easy, but adding back the correct amount of padding is more involved and expensive.

Massive performance slowdown in 0.1.1.0

Hi Bryan,

I got a report from Kirill of a massive performance problem in Yesod.
If sessions were turned on, on my system, req/sec went from 6000 to
200. I checked clientsession, and found it to be the culprit: encoding
a minimal payload requires a few milliseconds. Felipe checked the
recent changes, and localized it to the most recent release of
base64-bytestring. I put together a simple benchmark:

import Data.ByteString.Base64
import Data.ByteString.Char8 (pack)
import Criterion.Main

main :: IO ()
main = defaultMain
   [ bench "encode" $ whnf encode $ pack "qwerty"
   ]

On version 0.1.0.3, this takes 229.4312 ns. On 0.1.1.0, it takes
3.556598 ms. It looks like the problem is coming from the recent
addition of URL encoding
(f1916d8).

As a temporary workaround, I'm planning on adding an upper bound on
the base64-bytestring dependency in clientsession, so we shouldn't
have any immediate issues, but obviously it would be best if we didn't
have to put restrictive upper bounds in.

Thanks,
Michael

Rename `decode` to `decodeStrict`, and deprecate `decode`?

I only looked into the RFC after I ran into a production issue, and learned that base64 does not allow line breaks, except it's allowed by some other standard document. To help people like me in the future, I would like to do any of these (please check the ones you'd accept as PRs):

add a line of haddocks explaining that decode does not allow non-alphabet characters, not even line breaks, and users should consider using decodeLenient.
implement decodeStrict.as an alias to decode.
deprecate decode.

I'm in favor of making decodeLenient the default in a distant future because I don't see any security problems. I'm not sure about which of the two available options is faster, but lenient decoding has the benefit of not allocating the input as a strict bytestring.

Thanks!

Support for rejecting non-canonical encodings

Consider the following Base64-encoded string: "ZE==". What is the correct result of decoding the string "ZE=="?

Answer: It is not valid Base64, but it still satisfies the decoder's understanding of Base64 encoded data. Unfortunately, there is no way to construct such a result from binary, which leads to confusion - the decoder in base64-bytestring is not smart enough to differentiate such data. In fact, this value never round trips:

П> decode "ZE=="
Right "d"
П> encode "d"
"ZA=="
П> fmap encode (decode "ZE==")
Right "ZA=="

A more correct implementation should fail with an "invalid input" error. Or we can leave it as is and leave a note about the support status for "impossible by construction" inputs to the decoder.

Add Head validations for correct padding

The code that validates the correctness of padding in the last two chars of Base64Url-encoded bytestring needs a refactor, and we must make sure all bases are covered so that the following invariant holds:

\x -> ((e2m $ B64.decodePadded x) <|> (e2m $ B64.decodeUnpadded x)) == (e2m $ B64.decode x)

where

e2m = either (const Nothing) Just

disable padding for Base64.URL?

How can I disable padding in the fuction encode of Data.ByteString.Base64.URL?

1.1.0.0 release planning

I propose to cut the next minor release once the fixes for #18 (for which there's a PR) and #24 (which @hvr is working on - his patch actually has a bit bigger scope than described in that ticket) are merged.

Since there will be API additions, version number should be 1.0.1.0.

Refactor and Expand Test Coverage

The tests are looking a little grody after the recent coverage hackathon. I'd like to refactor these and modernize both the property-checking code, as well as the unit tests.

`joinWith` does not always terminate the input

Hello,

I just noticed that joinWith only terminates the input when its length is a multiple of the separator. Here is an example:

ghci> unpack $ joinWith (pack [0]) 64 $ pack [1]
[1]

Notice that there is no 0 at the end. I am not sure if this was intentional but, if so, then we should clarify the documentation.

As a data point, in my use case I was hoping that the input would always be terminated, even if the last chunk is shorter than the rest.

-Iavor

Improve Performance of Encoding Loop

The current encoding loop can be made more efficient. See: the implementation in base64

Build needs UndecidableInstances with GHC 6.12.x

Data/String/UTF8.hs:56:0:
    Variable occurs more often in a constraint than in the instance head
      in the constraint: UTF8Bytes string index
    (Use -XUndecidableInstances to permit this)
    In the instance declaration for `Show (UTF8 string)'

The complete build log is at http://hydra.cryp.to/build/135947/nixlog/1/raw.

Poor performance

decodeLenient might be performing badly. Here's a profile:

  checkCerts Network.HTTP.Conduit.Manager                        3903          19   0.0    0.0    29.5   59.4
   defaultCheckCerts Network.HTTP.Conduit.Manager                3909          19   0.0    0.0    29.5   59.4
    certificateVerifyChain Network.TLS.Extra.Certificate         3912          19   0.0    0.0    29.5   59.4
     certificateVerifyChain_ Network.TLS.Extra.Certificate       3914          38  10.9   24.7    29.5   59.4
      certificateVerifyAgainst Network.TLS.Extra.Certificate     4583          38   0.0    0.0     0.1    0.1
       verifyF Network.TLS.Extra.Certificate                     4584          38   0.0    0.0     0.1    0.1
        rsaVerify Network.TLS.Extra.Certificate                  4586          38   0.1    0.1     0.1    0.1
      decodeUtf8With'/isComplete Data.Text.Lazy.Encoding         3998        7676   0.0    0.0     0.0    0.0
      certMatchDN Network.TLS.Extra.Certificate                  3987       12426   0.1    0.0     0.1    0.0
      mapBuilder Data.Serialize.Builder                          3985      290738   0.1    0.1     0.1    0.1
      flush Data.Serialize.Builder                               3984      290738   0.1    0.1     0.1    0.1
      decodeLenient Data.ByteString.Base64                       3973      581476   0.7    1.3    18.1   34.3
       decodeLenient/fill Data.ByteString.Base64                 3974    23093720   8.6    8.9    17.4   33.0
        poke8 Data.ByteString.Base64                             3983    13827326   0.8    0.8     0.8    0.8
        dValue Data.ByteString.Base64                            3982    27673576   0.5    1.7     0.5    1.7
        dNext Data.ByteString.Base64                             3981    18443756   0.6    2.4     0.6    2.4
        decodeLenient/look Data.ByteString.Base64                3975    55354296   5.2   14.7     6.9   19.2
         peek8 Data.ByteString.Base64                            3976    36902864   1.7    4.5     1.7    4.5

incorrect decoding on GHC 9.0.1

Originally reported as decoding errors in frasertweedale/hs-jose#102.

Incorrect decoding behavior when sequencing decodes (via Applicative or Monad).
Observed with GHC 9.0.1 on Linux x86-64 and Mac OS X x86-64.

ghci> import qualified Data.ByteString.Base64.URL as B64U
ghci> :set -XOverloadedStrings
ghci> emptyObj = "e30" :: B.ByteString  -- base64url encoding of "{}"
ghci> (,) <$> B64U.decodeUnpadded emptyObj <*> B64U.decodeUnpadded emptyObj :: Either String (B.ByteString, B.ByteString)
Right (" \161","{}")

Use lazy ByteStrings

I would like to be able to use decodeLenient in a streaming style, so that the whole string doesn't need to be in memory at one time.

However, the strict ByteString type is used, which means the whole string must be in memory at once.

Please use lazy bytestrings instead.

Looming regression with 8.10

Encoding takes a serious performance hit with the release candidate of 8.10. See https://gitlab.haskell.org/ghc/ghc/issues/17653

While the root cause seems to be a regression within GHC, it's easily fixed by adding a few bangs to encodeWith. (See also the above ticket). Given that a fix might not make it into 8.10 working around it seems reasonable.

Suggestion: introduce a Base64 newtype

It'd be nice if base64-bytestring exported a canonical type

newtype Base64 = Base64 { toByteString :: ByteString }
    deriving ( Eq, Ord, Show, IsString )

where a regular, full-binary string is stored but the type suggests a base64 serialization.

import qualified Data.ByteString.Base64 as S64

instance ToJSON S64.Base64 where
  toJSON (S64.Base64 bs) = toJSON (S64.encode bs)

I bring this up because I repeatedly reinvent this type when writing parsers and printers and would prefer to have a single source for it.

Test suite failure on Linux/x86_64

The complete build log is available at http://hydra.cryp.to/build/129918/nixlog/1/raw.

RFC: Should Stringly-typed messages be replaced with Ints?

Per @hvr, parsers may want an integer offset rather than a string so that they can emit src-location positions. Thoughts and comments?

relax bytestring version constraint

Hey, i'm testing building stuff with ghc 7.5, and things build fine if the
bytestring version constraint is relaxed to being >=0.9.0 rather than == 0.9.*

Fails When String Includes Byte Order Mark

I'm working with an app where users are expected to upload a CSV file. When the file contains a BOM, decoding the base64 string fails. This example base64 string includes a byte order mark: 77u/SGVhZGVyIDEsSGVhZGVyIDIsSGVhZGVyIDMNCkRhdGEgMS4xLERhdGEgMS4yLERhdGEgMS4zDQpEYXRhIDIuMSxEYXRhIDIuMixEYXRhIDIuMw0K. The data should look like this:

Header 1,Header 2,Header 3
Data 1.1,Data 1.2,Data 1.3
Data 2.1,Data 2.2,Data 2.3

But when I try to decode it, I get this result: Left "invalid character at offset: 3".
If I try to decode it without the BOM, it works.

haskell / base64-bytestring Goto Github PK

base64-bytestring's People

Contributors

Stargazers

Watchers

Forkers

base64-bytestring's Issues

Recommend Projects

Recommend Topics

Recommend Org