haskell / base64-bytestring Goto Github PK
View Code? Open in Web Editor NEWFast base64 encoding and decoding for Haskell.
Home Page: http://hackage.haskell.org/package/base64-bytestring
License: Other
Fast base64 encoding and decoding for Haskell.
Home Page: http://hackage.haskell.org/package/base64-bytestring
License: Other
Padding of Encoded Data
In some circumstances, the use of padding ("=") in base-encoded data
is not required or used. In the general case, when assumptions about
the size of transported data cannot be made, padding is required to
yield correct decoded data.Implementations MUST include appropriate pad characters at the end of
encoded data unless the specification referring to this document
explicitly states otherwise.
As an example of such a specification is RFC7049 (section 2.4.4.2) which has this little snippet:
These three tag types suggest conversions to three of the base data
encodings defined in [RFC4648]. For base64url encoding, padding is
not used (see Section 3.2 of RFC 4648); that is, all trailing equals
signs ("=") are removed from the base64url-encoded string.
So to support such specifications it would be convenient for the Base64.URL
modules to provide variants of encode
and decode
that produce and expect no padding. It's actually especially useful to have the decoder variant since working around the lack of direct support on the encode side is easy, but adding back the correct amount of padding is more involved and expensive.
Hi Bryan,
I got a report from Kirill of a massive performance problem in Yesod.
If sessions were turned on, on my system, req/sec went from 6000 to
200. I checked clientsession, and found it to be the culprit: encoding
a minimal payload requires a few milliseconds. Felipe checked the
recent changes, and localized it to the most recent release of
base64-bytestring. I put together a simple benchmark:
import Data.ByteString.Base64
import Data.ByteString.Char8 (pack)
import Criterion.Main
main :: IO ()
main = defaultMain
[ bench "encode" $ whnf encode $ pack "qwerty"
]
On version 0.1.0.3, this takes 229.4312 ns. On 0.1.1.0, it takes
3.556598 ms. It looks like the problem is coming from the recent
addition of URL encoding
(f1916d8).
As a temporary workaround, I'm planning on adding an upper bound on
the base64-bytestring dependency in clientsession, so we shouldn't
have any immediate issues, but obviously it would be best if we didn't
have to put restrictive upper bounds in.
Thanks,
Michael
I only looked into the RFC after I ran into a production issue, and learned that base64 does not allow line breaks, except it's allowed by some other standard document. To help people like me in the future, I would like to do any of these (please check the ones you'd accept as PRs):
decode
does not allow non-alphabet characters, not even line breaks, and users should consider using decodeLenient
.decodeStrict
.as an alias to decode
.decode
.I'm in favor of making decodeLenient
the default in a distant future because I don't see any security problems. I'm not sure about which of the two available options is faster, but lenient decoding has the benefit of not allocating the input as a strict bytestring.
Thanks!
Consider the following Base64-encoded string: "ZE=="
. What is the correct result of decoding the string "ZE=="
?
Answer: It is not valid Base64, but it still satisfies the decoder's understanding of Base64 encoded data. Unfortunately, there is no way to construct such a result from binary, which leads to confusion - the decoder in base64-bytestring
is not smart enough to differentiate such data. In fact, this value never round trips:
П> decode "ZE=="
Right "d"
П> encode "d"
"ZA=="
П> fmap encode (decode "ZE==")
Right "ZA=="
A more correct implementation should fail with an "invalid input" error. Or we can leave it as is and leave a note about the support status for "impossible by construction" inputs to the decoder.
The code that validates the correctness of padding in the last two chars of Base64Url-encoded bytestring needs a refactor, and we must make sure all bases are covered so that the following invariant holds:
\x -> ((e2m $ B64.decodePadded x) <|> (e2m $ B64.decodeUnpadded x)) == (e2m $ B64.decode x)
where
e2m = either (const Nothing) Just
How can I disable padding in the fuction encode
of Data.ByteString.Base64.URL
?
The tests are looking a little grody after the recent coverage hackathon. I'd like to refactor these and modernize both the property-checking code, as well as the unit tests.
Hello,
I just noticed that joinWith
only terminates the input when its length is a multiple of the separator. Here is an example:
ghci> unpack $ joinWith (pack [0]) 64 $ pack [1]
[1]
Notice that there is no 0 at the end. I am not sure if this was intentional but, if so, then we should clarify the documentation.
As a data point, in my use case I was hoping that the input would always be terminated, even if the last chunk is shorter than the rest.
-Iavor
The current encoding loop can be made more efficient. See: the implementation in base64
Data/String/UTF8.hs:56:0:
Variable occurs more often in a constraint than in the instance head
in the constraint: UTF8Bytes string index
(Use -XUndecidableInstances to permit this)
In the instance declaration for `Show (UTF8 string)'
The complete build log is at http://hydra.cryp.to/build/135947/nixlog/1/raw.
decodeLenient might be performing badly. Here's a profile:
checkCerts Network.HTTP.Conduit.Manager 3903 19 0.0 0.0 29.5 59.4
defaultCheckCerts Network.HTTP.Conduit.Manager 3909 19 0.0 0.0 29.5 59.4
certificateVerifyChain Network.TLS.Extra.Certificate 3912 19 0.0 0.0 29.5 59.4
certificateVerifyChain_ Network.TLS.Extra.Certificate 3914 38 10.9 24.7 29.5 59.4
certificateVerifyAgainst Network.TLS.Extra.Certificate 4583 38 0.0 0.0 0.1 0.1
verifyF Network.TLS.Extra.Certificate 4584 38 0.0 0.0 0.1 0.1
rsaVerify Network.TLS.Extra.Certificate 4586 38 0.1 0.1 0.1 0.1
decodeUtf8With'/isComplete Data.Text.Lazy.Encoding 3998 7676 0.0 0.0 0.0 0.0
certMatchDN Network.TLS.Extra.Certificate 3987 12426 0.1 0.0 0.1 0.0
mapBuilder Data.Serialize.Builder 3985 290738 0.1 0.1 0.1 0.1
flush Data.Serialize.Builder 3984 290738 0.1 0.1 0.1 0.1
decodeLenient Data.ByteString.Base64 3973 581476 0.7 1.3 18.1 34.3
decodeLenient/fill Data.ByteString.Base64 3974 23093720 8.6 8.9 17.4 33.0
poke8 Data.ByteString.Base64 3983 13827326 0.8 0.8 0.8 0.8
dValue Data.ByteString.Base64 3982 27673576 0.5 1.7 0.5 1.7
dNext Data.ByteString.Base64 3981 18443756 0.6 2.4 0.6 2.4
decodeLenient/look Data.ByteString.Base64 3975 55354296 5.2 14.7 6.9 19.2
peek8 Data.ByteString.Base64 3976 36902864 1.7 4.5 1.7 4.5
Originally reported as decoding errors in frasertweedale/hs-jose#102.
Incorrect decoding behavior when sequencing decodes (via Applicative or Monad).
Observed with GHC 9.0.1 on Linux x86-64 and Mac OS X x86-64.
ghci> import qualified Data.ByteString.Base64.URL as B64U
ghci> :set -XOverloadedStrings
ghci> emptyObj = "e30" :: B.ByteString -- base64url encoding of "{}"
ghci> (,) <$> B64U.decodeUnpadded emptyObj <*> B64U.decodeUnpadded emptyObj :: Either String (B.ByteString, B.ByteString)
Right (" \161","{}")
I would like to be able to use decodeLenient in a streaming style, so that the whole string doesn't need to be in memory at one time.
However, the strict ByteString type is used, which means the whole string must be in memory at once.
Please use lazy bytestrings instead.
Encoding takes a serious performance hit with the release candidate of 8.10. See https://gitlab.haskell.org/ghc/ghc/issues/17653
While the root cause seems to be a regression within GHC, it's easily fixed by adding a few bangs to encodeWith. (See also the above ticket). Given that a fix might not make it into 8.10 working around it seems reasonable.
It'd be nice if base64-bytestring
exported a canonical type
newtype Base64 = Base64 { toByteString :: ByteString }
deriving ( Eq, Ord, Show, IsString )
where a regular, full-binary string is stored but the type suggests a base64 serialization.
import qualified Data.ByteString.Base64 as S64
instance ToJSON S64.Base64 where
toJSON (S64.Base64 bs) = toJSON (S64.encode bs)
I bring this up because I repeatedly reinvent this type when writing parsers and printers and would prefer to have a single source for it.
The complete build log is available at http://hydra.cryp.to/build/129918/nixlog/1/raw.
Per @hvr, parsers may want an integer offset rather than a string so that they can emit src-location positions. Thoughts and comments?
Hey, i'm testing building stuff with ghc 7.5, and things build fine if the
bytestring version constraint is relaxed to being >=0.9.0 rather than == 0.9.*
I'm working with an app where users are expected to upload a CSV file. When the file contains a BOM, decoding the base64 string fails. This example base64 string includes a byte order mark: 77u/SGVhZGVyIDEsSGVhZGVyIDIsSGVhZGVyIDMNCkRhdGEgMS4xLERhdGEgMS4yLERhdGEgMS4zDQpEYXRhIDIuMSxEYXRhIDIuMixEYXRhIDIuMw0K
. The data should look like this:
Header 1,Header 2,Header 3
Data 1.1,Data 1.2,Data 1.3
Data 2.1,Data 2.2,Data 2.3
But when I try to decode it, I get this result: Left "invalid character at offset: 3"
.
If I try to decode it without the BOM, it works.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.