05nelsonm / encoding Goto Github PK

A Kotlin Multiplatform library for configurable, streamable, efficient and extensible Encoding/Decoding with support for base16/32/64.

License: Apache License 2.0

Kotlin 99.71% Java 0.29%

kotlin kotlin-android kotlin-library kotlin-multiplatform kotlin-multiplatform-library android-library android android-lib android-libraries android-libs

encoding's Introduction

encoding

Configurable, streamable, efficient and extensible Encoding/Decoding for Kotlin Multiplatform.

Base16 (a.k.a. "hex")

RFC 4648 section 8

Base32

Base64

Default RFC 4648 section 4
UrlSafe RFC 4648 section 5

A full list of kotlin-components projects can be found HERE

Usage

Configure EncoderDecoder(s) to your needs

val base16 = Base16 {
    // Ignore whitespace and new lines when decoding
    isLenient = true

    // Insert line breaks every X characters of encoded output
    lineBreakInterval = 10

    // Use lowercase instead of uppercase characters when encoding
    encodeToLowercase = true
}

// Shortcuts
val base16StrictSettings = Base16(strict = true)
val base16DefaultSettings = Base16()

// Alternatively, use the static instance with its default settings
Base16

val base32Crockford = Base32Crockford {
    isLenient = true
    encodeToLowercase = false

    // Insert hyphens every X characters of encoded output
    hyphenInterval = 5

    // Optional data integrity check unique to the Crockford spec
    checkSymbol('*')

    // Only apply the checkSymbol & reset hyphen interval counter
    // when Encoder.Feed.doFinal is called (see builder docs for
    // more info) 
    finalizeWhenFlushed = false
}

// Alternatively, use the static instance with its default settings
Base32.Crockford

val base32Default = Base32Default {
    isLenient = true
    lineBreakInterval = 64
    encodeToLowercase = true
    
    // Skip padding of the encoded output
    padEncoded = false
}

// Alternatively, use the static instance with its default settings
Base32.Default

val base32Hex = Base32Hex {
    isLenient = true
    lineBreakInterval = 64
    encodeToLowercase = false
    padEncoded = true
}

// Alternatively, use the static instance with its default settings
Base32.Hex

// NOTE: Base64 can _decode_ both Default and UrlSafe, no matter what
// encodeToUrlSafe is set to.
val base64 = Base64 {
    isLenient = true
    lineBreakInterval = 64
    encodeToUrlSafe = false
    padEncoded = true
}

// Alternatively, use the static instance with its default settings
Base64.Default

// Inherit settings from another EncoderDecoder's Config
val base64UrlSafe = Base64(base64.config) {
    encodeToUrlSafe = true
    padEncoded = false
}

// Alternatively, use the static instance with its default settings
Base64.UrlSafe

Encoding/Decoding Extension Functions

val text = "Hello World!"
val bytes = text.encodeToByteArray()

// Choose the output type that suits your needs
// without having to perform unnecessary intermediate
// transformations (can be useful for security 
// purposes, too, as you are able to clear Arrays
// before they are de-referenced).
val encodedString = bytes.encodeToString(Base64.Default)
val encodedChars = bytes.encodeToCharArray(Base32.Default)
val encodedBytes = bytes.encodeToByteArray(Base16)

val decodedString = try {
    encodedString.decodeToByteArray(Base64.Default)
} catch (e: EncodingException) {
    Log.e("Something went terribly wrong", e)
    null
}
// Swallow `EncodingException`s by using the `*OrNull` variants
val decodedChars = encodedChars.decodeToByteArrayOrNull(Base32.Default)
val decodedBytes = encodedBytes.decodeToByteArrayOrNull(Base16)

Encoding/Decoding Feed(s) (i.e. Streaming)

Feed's are a new concept which enable some pretty awesome things. They break the encoding/decoding process into its individual parts, such that the medium for which data is coming from or going to can be anything; Feed's only care about Byte(s) and Char(s)!

// e.g. Concatenate multiple encodings
val sb = StringBuilder()

// Use our own line break out feed in order to add a delimiter between
// encodings and preserve the counter.
val out = LineBreakOutFeed(interval = 64) { char -> sb.append(char) }

Base64.Default.newEncoderFeed(out).use { feed ->
    "Hello World 1!".forEach { c -> feed.consume(c.code.toByte())  }
    feed.flush()
    out.output('.')
    "Hello World 2!".forEach { c -> feed.consume(c.code.toByte())  }
}

println(sb.toString())
// SGVsbG8gV29ybGQgMSE=.SGVsbG8gV29ybGQgMiE=

// e.g. Writing encoded data to a File in Java.
// NOTE: try/catch omitted for this example.

file.outputStream().use { oStream ->
    Base64.Default.newEncoderFeed { encodedChar ->
        // As encoded data comes out of the feed,
        // write it to the file.
        oStream.write(encodedChar.code)
    }.use { feed ->

        // Push data through the feed.
        //
        // There are NO size/length limitations with `Feed`s.
        // You are only limited by the medium you use to store
        // the output (e.g. the maximum size of a ByteArray is
        // Int.MAX_VALUE).
        //
        // The `Feed.use` extension function calls `doFinal`
        // automatically, which closes the `Encoder.Feed`
        // and performs finalization of the operation (such as
        // adding padding).
        "Hello World!".forEach { c ->
            feed.consume(c.code.toByte())
        }
    }
}

As Feed(s) is a new concept, they can be "bulky" to use (as you will see in the example below). This is due to a lack of extension functions for them, but it's something I hope can be built out over time with your help (PRs and FeatureRequests are always welcome)!

// e.g. Reading encoded data from a File in Java.
// NOTE: try/catch omitted for this example.

// Pre-calculate the output size for the given encoding
// spec; in this case, Base64.
val size = Base64.Default.config.decodeOutMaxSize(file.length())

// Since we will be storing the data in a StringBuilder,
// we need to check if the output size would exceed
// StringBuilder's maximum capacity.
if (size > Int.MAX_VALUE.toLong()) {
    // Alternatively, one could fall back to chunking, but that
    // is beyond the scope of this example.
    throw EncodingSizeException(
        "File contents would be too large after decoding to store in a StringBuilder"
    )
}

val sb = StringBuilder(size.toInt())

file.inputStream().reader().use { iStreamReader ->
    Base64.Default.newDecoderFeed { decodedByte ->
        // As decoded data comes out of the feed,
        // update the StringBuilder.
        sb.append(decodedByte.toInt().toChar())
    }.use { feed ->

        val buffer = CharArray(4096)
        while (true) {
            val read = iStreamReader.read(buffer)
            if (read == -1) break
            
            // Push encoded data from the file through the feed.
            //
            // The `Feed.use` extension function calls `doFinal`
            // automatically, which closes the `Decoder.Feed`
            // and performs finalization of the operation.
            for (i in 0 until read) {
                feed.consume(buffer[i])
            }
        }
    }
}

println(sb.toString())

Alternatively, create your own EncoderDecoder(s) using the abstractions provided by encoding-core!

Sample

See sample project

Get Started

// build.gradle.kts
dependencies {
    val encoding = "2.2.1"
    implementation("io.matthewnelson.encoding:base16:$encoding")
    implementation("io.matthewnelson.encoding:base32:$encoding")
    implementation("io.matthewnelson.encoding:base64:$encoding")

    // Only necessary if you just want the abstractions to create your own EncoderDecoder(s)
    implementation("io.matthewnelson.encoding:core:$encoding")
}

Alternatively, you can use the BOM.

// build.gradle.kts
dependencies {
    // define the BOM and its version
    implementation(platform("io.matthewnelson.encoding:bom:2.2.1"))

    // define artifacts without version
    implementation("io.matthewnelson.encoding:base16")
    implementation("io.matthewnelson.encoding:base32")
    implementation("io.matthewnelson.encoding:base64")

    // Only necessary if you just want the abstractions to create your own EncoderDecoder(s)
    implementation("io.matthewnelson.encoding:core")
}

encoding's People

Contributors

Stargazers

Watchers

Forkers

woren wavesonics pattmehta

encoding's Issues

`Base32.Crockford.Config.hyphenInterval` should be a `Byte`

Instead of using Short, use Byte to reduce the range to 1..127. Who needs to hyphenate at an interval of more than 127 characters?

`Decoder.newDecoderFeed` should have a `Protected` function like `Encoder`

To mitigate potential breaking of APIs, the Decoder.newDecoderFeed function should be implemented in Decoder, and call newDecoderFeedProtected to reserve the ability to intercept the Decoder.OutFeed w/o breaking the API.

Make `EncoderDecoder.Feed.config` public

Would be very useful if the config was public so anyone with a feed can check things, or get the decode out size.

Enable compiler flag for compatibility metadata

re-add compiler flag in project gradle.properties:

kotlin.mpp.enableCompatibilityMetadataVariant=true

Add CI to project

Add `Bech32`/`Bech32m` Encoding/Decoding

Add `Base16.CHARS` constant

Base32 and Base64 sealed classes have the CHARS constant available; Base16 should too.

Remove Android target

Remove the Android target and publish JVM compiled to java 8.

Use `explicitApi`

Use explicitApi() for library modules

// build.gradle.kts

plugins {
    id(pluginId.kmp.configuration)
    id(pluginId.kmp.publish)
}

kmpConfiguration {
    setupMultiplatform(
        // ...
        kotlin = {
            explicitApi()
        }
    )
}

Add kotlin version compatibility matrix to README

EX:

encoding	kotlin
1.0.3	1.5.31

Update `gradle-kmp-configuration-plugin`

0.1.0-beta02 adds support for composite builds.

See KotlinCrypto/hash#9

[Security Improvement] Add extension method override for filling interim arrays.

Extension functions should give the option via method override to fill the interim arrays if desired such that upon garbage collection they will not reveal their contents.

Ex:

@Suppress("nothing_to_inline")
inline fun String.decodeBase16ToArray(fill: Char? = null): ByteArray? {
    return toCharArray.let { chars ->
        chars.decodeBase16ToArray()
            .also { if (fill != null) chars.fill(fill) }
    }
}

Rework `Decoder.decodeToArray` extension functions

The Decoder extension function names should be changed to decodeToByteArray and decodeToByteArrayOrNull to follow the Encoder extension function naming convention.

Need to also think about adding extension functions decodeToCharArray and decodeToString.

Deprecate old code

Part 6 of #36

Deprecate old code in modules:

encoding-base16
encoding-base32
encoding-base64

`EncoderDecoder.Config.Setting` should only take into consideration `name` for `equals`/`hashCode`

Currently, Setting class computes the equals and hashCode by including the value. This is incorrect as it should only consider the name property. The way it is now, the Set<Setting> returned by toStringAddSettings could contain multiple settings with the same name.

Adapt `encoding-test` to running tests via `EncoderDecoder`

Part 7 of #36

Publishing the encoding-test for others to utilize would be fantastic, such that they can easily run through their own implementations of the Encoder/Decoder to validate correctness.

Will require refactoring it so that it is a test suite. I think making it depend on kotlin("test") using compileOnly might work as a good catch if someone depends on it in their main source sets.

Fix `base64` package path

See #25

Bump `kotlin-components`

Migrate `encoding-base32` to use `encoding-core`

Part 4 of #36

Migrate the base32 module to utilize the encoding-core module. This is an opportunity to write the Base16 Decoder/Encoder in package io.matthewnelson.encoding.base32 and remove component, then have all the current method bodies simply use it as to not disturb the current APIs and remain backwards compatible.

Related to #29

Clean up the new `BitBuffer` abstraction

Some clean up, refactoring, and documentation is needed on BitBuffer.

Maybe move it to the core.util package, too?

Bump Java version from 8 -> 11

This requires a minor version bump, as inlined classes will break if library consumers are running Java 8.

Initialize `encoding-core` module

Part 1 of #36

Project modules need some refactoring in order to get encoding-core initially set up. Will need to:

Move the following modules to the new library directory:
- encoding-base16
- encoding-base32
- encoding-base64
- encoding-test
Initialize the encoding-core module and add as dependency for above modules (excluding encoding-test)

Add ability to insert line breaks when encoding

When encoding data, sometimes it is preferable to insert line breaks every X number of characters of output. All EncoderDecoder.Config's should be configurable to do this (like with Crockford's ability to insert hyphens) in order to properly pre-calculate the encoded outsize.

Add Base58 Encoding/Decoding

`Base32` is broken

When working on 05nelsonm/kmp-tor#274 updating to 1.2.0 results in test failures.

expected:<OBTD[DB6TEGTGVPNR2XDA5X5DEB4YXGLEHJHNGIIVBGF33]S2HNNRQ====> but was:<OBTD[FCGTEKTGXPVR23DA7YFDEB5IZGLEHJH5GIIVBKGL5]S2HNNRQ====>
Expected :OBTDDB6TEGTGVPNR2XDA5X5DEB4YXGLEHJHNGIIVBGF33S2HNNRQ====
Actual   :OBTDFCGTEKTGXPVR23DA7YFDEB5IZGLEHJH5GIIVBKGL5S2HNNRQ====

The correct value should be OBTDFCGTEKTGXPVR23DA7YFDEB5IZGLEHJH5GIIVBKGL5S2HNNRQ====

This is attributed to the change in the base32 bitwise operations.

`EncoderDecoder.Config.isLenient` should be nullable

Some encoders might want to do something with those characters. The base abstraction should be Boolean? such that, when set to null, it will EncoderDecoder.Feed will do nothing and pass it along to the implementation.

Fix encode/decode out calculations

The EncoderDecoder.Config methods for pre-calculating the encoded and decoded out size should accept a Long instead of Int, and return a Long
decodeOutMaxSizeOrFail should accept null for input to be usable with streams.
3. A stream won't have access to the entirety of what is being decoded and thus, cannot send DecoderInput.
DecoderInput should check for a negative return value of decodeOutMaxSizeOrFail and throw EncodingException
DecoderInput should check if the returned Long is greater than Int.MAX_VALUE and throw EncodingException

There's a possibility of the EncoderDecoder.Config implementation having an overflow issue, that should be checked for in the base abstraction and throw such that we can guarantee a positive value is always returned.

`Base16` and `Base32` should be case insensitive

Currently published code does not allow for decoding of lowercase in either the Base16, or all variants of the Base32 implementation.

With the refactor to EncoderDecoder and Feeds, a configuration option was added; acceptLowercase. This is incorrect as per RFC 4648. The decoding process should be case insensitive by default and accept both upper and lower case letters.

Regarding the refactor and pointing the old extension functions to use the new EncoderDecoder implementations, anyone currently using those will be doing something like encodedData.uppercase().decodeBase16ToArray(), so modifying it to also accept lowercase should not be an issue at all in terms of decoding failures.

If someone were to downgrade the dependency to an older version (1.1.5 or lower), while still using the old extension functions, while not calling encodedData.uppercase(), it would fail to decode. So, super rare edge case.

Binary compatibility is still be preserved, but the functionality will now include automatically accepting lowercase letters.

This issue serves as documentation to point to if any issues may arise from consumers that fall into the aforementioned category of downgrading dependency versions and encountering decoding failures.

Optomize `Base32`

There's a lot of logic in the base32 implementation in that when statement. This could be better broken out into separate methods which will optimize parsing.

Change `Feed.update` to `Feed.consume`

It all started HERE

Annotate classes/methods `@JvmOverloads` where needed

Several overloaded constructors/methods supply default args and should be annotated with @JvmOverloads for Java users.

Update dependencies

./gradlew dependencyUpdates

Add support for `linuxArm64`

Migrate away from `kotlin-components` submodule

Remove the kotlin-components submodule in favor of using gradle-kmp-configuration-plugin

Migrate `Base32` and `Base16` to use `FeedBuffer`

FeedBuffer was introduced in #62 as a final buffer product to replace core.internal.buffer.Buffer. This ticket is for:

Converting Base32 implementation over to use it
Converting Base16 implementation over to use it
Deleting the core.internal.buffer directory and all it's contents.

Enable Kotlin 1.6's new native memory model

Add License header to files

`EncoderDecoder.Config.toStringAddSettings` implementation should add lines before/after

The implementation of EncoderDecoder.Config should call appendLine before and after adding it's custom settings to the output, not EncoderDecoder.Config.toString. The current implementation could result in 2 empty lines in the output if the implementation does not have any settings to add.

Add `binary-compatibility-validator` plugin

Add `encoding-core` module

Add a base module (i.e. encoding-core) to:

Move all common code to it
Have common extension functions that utilize the Encoder/Decoder abstraction
- So a new encoder (not implemented by this library) can be utilized if library consumer creates it.
- Optimization, because right now there is a lot of unnecessary array creation. Would be great if the
  extension functions for the type (String, CharArray, ByteArray) were able to instantiate those types
  off the bat and then input things as decoding/encoding occurs by converting the returned Byte.
Enable ability to have Encoder and Decoder classes which can be passed around
Enable ability to stream bytes/chars in or out, and receive encoded/decoded bytes on the other end
Encoder/Decoder configurations
Give people the ability to easily create their own Encoder/Decoders

Add unit test to check encoding/decoding of random data

A Unit test should be added to all implementations that checks a copious amount of random data to ensure that, when encoded and then decoded, the original bytes match.

`DeocderInput` should not call `EncoderDecoder.Config.decodeOutMaxSizeOrFail`

Currently the DecoderInput utility class takes in an EncoderDecoder.Config and calls it's decodeOutMaxSizeOrFail method from the DecoderInput.init block.

This should be a separate step. The EncoderDecoder.Config should have 4 methods:

A: public method that accepts DecoderInput and returns an Int
B: protected abstract method that accepts DecoderInput and the size of the input up to the last relevant character
C: public method that accepts a Long and returns a Long
D: protected abstract method that accepts a Long and returns a Long

public class DecoderInput private constructor(private val input: Any, internal val size: Int) {

    @Throws(EncodingException::class)
    public operator fun get(index: Int): Char get() {
        try {
            when (input) {
                is CharSequence -> input[index]
                is CharArray -> input[index]
                is ByteArray -> input[index].char
            }
        } catch (e: IndexOutOfBoundsException) {
            throw EncodingException("Index out of bounds", e)
        }
    }
}

public sealed class EncoderDecoder(config: Config): Encoder(config) {

    // ...

    public abstract class Config(
        @JvmField
        isLenient: Boolean?,
        @JvmField
        paddingByte: Byte?,
    ) {
        @Throws(EncodingException::class)
        protected abstract decodeOutMaxSizeOrFailProtected(lastRelevantCharacter: Int, input: DecoderInput): Int

        @Throws(EncodingException::class)
        public fun decodeOutMaxSizeOrFail(input: DecoderInput): Int {
            var lastRelevantChar = input.size
            while (size > 0) {
                val c = input[lastRelevantChar - 1]

                if (isLenient != null && c.isSpaceOrNewLine) {
                    if (isLenient) {
                        lastRelevantChar--
                        continue
                    } else {
                        throw EncodingException("...")
                    }
                }

                if (c.byte == paddingByte) {
                    lastRelevantChar--
                    continue
                }

                break
            }

            if (lastRelevantChar == 0) return 0
            val maxSize = decodeOutMaxSizeOrFailProtected(lastRelevantChar, input)
            if (maxSize < 0) throw EncodingSizeException("...")
            return maxSize
        }

        // ...

    }

    // ...
}

Add a `BOM` publication

Add a Bill of Materials publication.

Update the README but comment out that block until next release.

Add Native & Java samples

Migrate `encoding-base64` to use `encoding-core`

Part 5 of #36

Migrate the base64 module to utilize the encoding-core module. This is an opportunity to write the Base16 Decoder/Encoder in package io.matthewnelson.encoding.base64 and remove component, then have all the current method bodies simply use it as to not disturb the current APIs and remain backwards compatible.

Related to #26

Migrate `encoding-base16` to use `encoding-core`

Part 3 of #36

Migrate the base16 module to utilize the encoding-core module. This is an opportunity to write the Base16 Decoder/Encoder in package io.matthewnelson.encoding.base16 and remove component, then have all the current method bodies simply use it as to not disturb the current APIs and remain backwards compatible.

Related to #28

Add `wasmJs` & `wasmWasi` support

Add support for wasm target

Use generics with `EncoderDecoder` to pass `Config` type

Accessing the EncoderDecoder.Config always requires a cast because there is no type specified for EncoderDecoder. This also occurs within a Feed now that #70 made EncoderDecoder.Feed.config public.

public sealed class Decoder<C: EncoderDecoder.Config>(public val config: C) {

    public abstract fun newDecoderFeed(out: OutFeed): Decoder<C>.Feed

    public inner class Feed: EncoderDecoder.Feed<C>(config) {
        // ...
    }

    // ...
}

public sealed class Encoder<C: EncoderDecoder.Config>(config: C): Decoder(config) {

    public abstract fun newEncoderFeed(out: OutFeed): Encoder<C>.Feed

    public inner class Feed: EncoderDecoder.Feed<C>(config) {
        // ...
    }

    // ...
}

public abstract class EncoderDecoder<C: Config>(config: C): Encoder(config) {
    public abstract class Config(
        // ...
    ) {
        // ...
    }

    public sealed class Feed<C: Config>(public val config: C) {
        // ...
    }
}

Need to think on this more because it affects user experience with the library. NOT doing it and requiring a cast all the time also affects user experience... so could go both ways.

Clean up unnecessary conversions

Currently EncoderDecoder.Feed.consume is utilized for both Decoder.Feed and Encoder.Feed, and accepts a Byte for both operations. This results in a lot of conversions between Byte to Char and Char to Byte.

Because we can't specify the Char type using generics (JS does not do Chars), consume and consumeProtected must become wholly separate methods implemented on each Decoder.Feed and Encoder.Feed with the correct input types, Char and Byte respectively.

Furthermore, OutFeed should also be separate interfaces for each operation. Encoder.OutFeed which outputs a Char, and Decoder.OutFeed which outputs a Byte

Move common logic to `encoding-core`

Part 2 of #36

Build out the abstractions needed for commonizing the implementations. Low level stuff should go here and remain agnostic of the encoding specification.