Giter VIP home page Giter VIP logo

05nelsonm / encoding Goto Github PK

View Code? Open in Web Editor NEW
28.0 2.0 4.0 680 KB

A Kotlin Multiplatform library for configurable, streamable, efficient and extensible Encoding/Decoding with support for base16/32/64.

License: Apache License 2.0

Kotlin 99.71% Java 0.29%
kotlin kotlin-android kotlin-library kotlin-multiplatform kotlin-multiplatform-library android-library android android-lib android-libraries android-libs

encoding's Introduction

encoding

badge-license badge-latest-release

badge-kotlin

badge-platform-android badge-platform-jvm badge-platform-js badge-platform-js-node badge-platform-wasm badge-platform-linux badge-platform-macos badge-platform-ios badge-platform-tvos badge-platform-watchos badge-platform-windows badge-support-android-native badge-support-apple-silicon badge-support-js-ir badge-support-linux-arm

Configurable, streamable, efficient and extensible Encoding/Decoding for Kotlin Multiplatform.

Base16 (a.k.a. "hex")

Base32

Base64

A full list of kotlin-components projects can be found HERE

Usage

Configure EncoderDecoder(s) to your needs

val base16 = Base16 {
    // Ignore whitespace and new lines when decoding
    isLenient = true

    // Insert line breaks every X characters of encoded output
    lineBreakInterval = 10

    // Use lowercase instead of uppercase characters when encoding
    encodeToLowercase = true
}

// Shortcuts
val base16StrictSettings = Base16(strict = true)
val base16DefaultSettings = Base16()

// Alternatively, use the static instance with its default settings
Base16
val base32Crockford = Base32Crockford {
    isLenient = true
    encodeToLowercase = false

    // Insert hyphens every X characters of encoded output
    hyphenInterval = 5

    // Optional data integrity check unique to the Crockford spec
    checkSymbol('*')

    // Only apply the checkSymbol & reset hyphen interval counter
    // when Encoder.Feed.doFinal is called (see builder docs for
    // more info) 
    finalizeWhenFlushed = false
}

// Alternatively, use the static instance with its default settings
Base32.Crockford

val base32Default = Base32Default {
    isLenient = true
    lineBreakInterval = 64
    encodeToLowercase = true
    
    // Skip padding of the encoded output
    padEncoded = false
}

// Alternatively, use the static instance with its default settings
Base32.Default

val base32Hex = Base32Hex {
    isLenient = true
    lineBreakInterval = 64
    encodeToLowercase = false
    padEncoded = true
}

// Alternatively, use the static instance with its default settings
Base32.Hex
// NOTE: Base64 can _decode_ both Default and UrlSafe, no matter what
// encodeToUrlSafe is set to.
val base64 = Base64 {
    isLenient = true
    lineBreakInterval = 64
    encodeToUrlSafe = false
    padEncoded = true
}

// Alternatively, use the static instance with its default settings
Base64.Default

// Inherit settings from another EncoderDecoder's Config
val base64UrlSafe = Base64(base64.config) {
    encodeToUrlSafe = true
    padEncoded = false
}

// Alternatively, use the static instance with its default settings
Base64.UrlSafe

Encoding/Decoding Extension Functions

val text = "Hello World!"
val bytes = text.encodeToByteArray()

// Choose the output type that suits your needs
// without having to perform unnecessary intermediate
// transformations (can be useful for security 
// purposes, too, as you are able to clear Arrays
// before they are de-referenced).
val encodedString = bytes.encodeToString(Base64.Default)
val encodedChars = bytes.encodeToCharArray(Base32.Default)
val encodedBytes = bytes.encodeToByteArray(Base16)

val decodedString = try {
    encodedString.decodeToByteArray(Base64.Default)
} catch (e: EncodingException) {
    Log.e("Something went terribly wrong", e)
    null
}
// Swallow `EncodingException`s by using the `*OrNull` variants
val decodedChars = encodedChars.decodeToByteArrayOrNull(Base32.Default)
val decodedBytes = encodedBytes.decodeToByteArrayOrNull(Base16)

Encoding/Decoding Feed(s) (i.e. Streaming)

Feed's are a new concept which enable some pretty awesome things. They break the encoding/decoding process into its individual parts, such that the medium for which data is coming from or going to can be anything; Feed's only care about Byte(s) and Char(s)!

// e.g. Concatenate multiple encodings
val sb = StringBuilder()

// Use our own line break out feed in order to add a delimiter between
// encodings and preserve the counter.
val out = LineBreakOutFeed(interval = 64) { char -> sb.append(char) }

Base64.Default.newEncoderFeed(out).use { feed ->
    "Hello World 1!".forEach { c -> feed.consume(c.code.toByte())  }
    feed.flush()
    out.output('.')
    "Hello World 2!".forEach { c -> feed.consume(c.code.toByte())  }
}

println(sb.toString())
// SGVsbG8gV29ybGQgMSE=.SGVsbG8gV29ybGQgMiE=
// e.g. Writing encoded data to a File in Java.
// NOTE: try/catch omitted for this example.

file.outputStream().use { oStream ->
    Base64.Default.newEncoderFeed { encodedChar ->
        // As encoded data comes out of the feed,
        // write it to the file.
        oStream.write(encodedChar.code)
    }.use { feed ->

        // Push data through the feed.
        //
        // There are NO size/length limitations with `Feed`s.
        // You are only limited by the medium you use to store
        // the output (e.g. the maximum size of a ByteArray is
        // Int.MAX_VALUE).
        //
        // The `Feed.use` extension function calls `doFinal`
        // automatically, which closes the `Encoder.Feed`
        // and performs finalization of the operation (such as
        // adding padding).
        "Hello World!".forEach { c ->
            feed.consume(c.code.toByte())
        }
    }
}

As Feed(s) is a new concept, they can be "bulky" to use (as you will see in the example below). This is due to a lack of extension functions for them, but it's something I hope can be built out over time with your help (PRs and FeatureRequests are always welcome)!

// e.g. Reading encoded data from a File in Java.
// NOTE: try/catch omitted for this example.

// Pre-calculate the output size for the given encoding
// spec; in this case, Base64.
val size = Base64.Default.config.decodeOutMaxSize(file.length())

// Since we will be storing the data in a StringBuilder,
// we need to check if the output size would exceed
// StringBuilder's maximum capacity.
if (size > Int.MAX_VALUE.toLong()) {
    // Alternatively, one could fall back to chunking, but that
    // is beyond the scope of this example.
    throw EncodingSizeException(
        "File contents would be too large after decoding to store in a StringBuilder"
    )
}

val sb = StringBuilder(size.toInt())

file.inputStream().reader().use { iStreamReader ->
    Base64.Default.newDecoderFeed { decodedByte ->
        // As decoded data comes out of the feed,
        // update the StringBuilder.
        sb.append(decodedByte.toInt().toChar())
    }.use { feed ->

        val buffer = CharArray(4096)
        while (true) {
            val read = iStreamReader.read(buffer)
            if (read == -1) break
            
            // Push encoded data from the file through the feed.
            //
            // The `Feed.use` extension function calls `doFinal`
            // automatically, which closes the `Decoder.Feed`
            // and performs finalization of the operation.
            for (i in 0 until read) {
                feed.consume(buffer[i])
            }
        }
    }
}

println(sb.toString())

Alternatively, create your own EncoderDecoder(s) using the abstractions provided by encoding-core!

Sample

See sample project

Get Started

// build.gradle.kts
dependencies {
    val encoding = "2.2.1"
    implementation("io.matthewnelson.encoding:base16:$encoding")
    implementation("io.matthewnelson.encoding:base32:$encoding")
    implementation("io.matthewnelson.encoding:base64:$encoding")

    // Only necessary if you just want the abstractions to create your own EncoderDecoder(s)
    implementation("io.matthewnelson.encoding:core:$encoding")
}

Alternatively, you can use the BOM.

// build.gradle.kts
dependencies {
    // define the BOM and its version
    implementation(platform("io.matthewnelson.encoding:bom:2.2.1"))

    // define artifacts without version
    implementation("io.matthewnelson.encoding:base16")
    implementation("io.matthewnelson.encoding:base32")
    implementation("io.matthewnelson.encoding:base64")

    // Only necessary if you just want the abstractions to create your own EncoderDecoder(s)
    implementation("io.matthewnelson.encoding:core")
}

encoding's People

Contributors

05nelsonm avatar asemy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

encoding's Issues

Use `explicitApi`

Use explicitApi() for library modules

// build.gradle.kts

plugins {
    id(pluginId.kmp.configuration)
    id(pluginId.kmp.publish)
}

kmpConfiguration {
    setupMultiplatform(
        // ...
        kotlin = {
            explicitApi()
        }
    )
}

[Security Improvement] Add extension method override for filling interim arrays.

Extension functions should give the option via method override to fill the interim arrays if desired such that upon garbage collection they will not reveal their contents.

Ex:

@Suppress("nothing_to_inline")
inline fun String.decodeBase16ToArray(fill: Char? = null): ByteArray? {
    return toCharArray.let { chars ->
        chars.decodeBase16ToArray()
            .also { if (fill != null) chars.fill(fill) }
    }
}

Rework `Decoder.decodeToArray` extension functions

The Decoder extension function names should be changed to decodeToByteArray and decodeToByteArrayOrNull to follow the Encoder extension function naming convention.

Need to also think about adding extension functions decodeToCharArray and decodeToString.

Deprecate old code

Part 6 of #36

Deprecate old code in modules:

  • encoding-base16
  • encoding-base32
  • encoding-base64

Adapt `encoding-test` to running tests via `EncoderDecoder`

Part 7 of #36

Publishing the encoding-test for others to utilize would be fantastic, such that they can easily run through their own implementations of the Encoder/Decoder to validate correctness.

Will require refactoring it so that it is a test suite. I think making it depend on kotlin("test") using compileOnly might work as a good catch if someone depends on it in their main source sets.

Migrate `encoding-base32` to use `encoding-core`

Part 4 of #36

Migrate the base32 module to utilize the encoding-core module. This is an opportunity to write the Base16 Decoder/Encoder in package io.matthewnelson.encoding.base32 and remove component, then have all the current method bodies simply use it as to not disturb the current APIs and remain backwards compatible.

Related to #29

Initialize `encoding-core` module

Part 1 of #36

Project modules need some refactoring in order to get encoding-core initially set up. Will need to:

  • Move the following modules to the new library directory:
    • encoding-base16
    • encoding-base32
    • encoding-base64
    • encoding-test
  • Initialize the encoding-core module and add as dependency for above modules (excluding encoding-test)

Add ability to insert line breaks when encoding

When encoding data, sometimes it is preferable to insert line breaks every X number of characters of output. All EncoderDecoder.Config's should be configurable to do this (like with Crockford's ability to insert hyphens) in order to properly pre-calculate the encoded outsize.

`Base32` is broken

When working on 05nelsonm/kmp-tor#274 updating to 1.2.0 results in test failures.

expected:<OBTD[DB6TEGTGVPNR2XDA5X5DEB4YXGLEHJHNGIIVBGF33]S2HNNRQ====> but was:<OBTD[FCGTEKTGXPVR23DA7YFDEB5IZGLEHJH5GIIVBKGL5]S2HNNRQ====>
Expected :OBTDDB6TEGTGVPNR2XDA5X5DEB4YXGLEHJHNGIIVBGF33S2HNNRQ====
Actual   :OBTDFCGTEKTGXPVR23DA7YFDEB5IZGLEHJH5GIIVBKGL5S2HNNRQ====

image

The correct value should be OBTDFCGTEKTGXPVR23DA7YFDEB5IZGLEHJH5GIIVBKGL5S2HNNRQ====

This is attributed to the change in the base32 bitwise operations.

`EncoderDecoder.Config.isLenient` should be nullable

Some encoders might want to do something with those characters. The base abstraction should be Boolean? such that, when set to null, it will EncoderDecoder.Feed will do nothing and pass it along to the implementation.

Fix encode/decode out calculations

  1. The EncoderDecoder.Config methods for pre-calculating the encoded and decoded out size should accept a Long instead of Int, and return a Long
  2. decodeOutMaxSizeOrFail should accept null for input to be usable with streams.
    3. A stream won't have access to the entirety of what is being decoded and thus, cannot send DecoderInput.
  3. DecoderInput should check for a negative return value of decodeOutMaxSizeOrFail and throw EncodingException
  4. DecoderInput should check if the returned Long is greater than Int.MAX_VALUE and throw EncodingException

There's a possibility of the EncoderDecoder.Config implementation having an overflow issue, that should be checked for in the base abstraction and throw such that we can guarantee a positive value is always returned.

`Base16` and `Base32` should be case insensitive

Currently published code does not allow for decoding of lowercase in either the Base16, or all variants of the Base32 implementation.

With the refactor to EncoderDecoder and Feeds, a configuration option was added; acceptLowercase. This is incorrect as per RFC 4648. The decoding process should be case insensitive by default and accept both upper and lower case letters.

image

image

Regarding the refactor and pointing the old extension functions to use the new EncoderDecoder implementations, anyone currently using those will be doing something like encodedData.uppercase().decodeBase16ToArray(), so modifying it to also accept lowercase should not be an issue at all in terms of decoding failures.

If someone were to downgrade the dependency to an older version (1.1.5 or lower), while still using the old extension functions, while not calling encodedData.uppercase(), it would fail to decode. So, super rare edge case.

Binary compatibility is still be preserved, but the functionality will now include automatically accepting lowercase letters.

This issue serves as documentation to point to if any issues may arise from consumers that fall into the aforementioned category of downgrading dependency versions and encountering decoding failures.

Optomize `Base32`

There's a lot of logic in the base32 implementation in that when statement. This could be better broken out into separate methods which will optimize parsing.

Migrate `Base32` and `Base16` to use `FeedBuffer`

FeedBuffer was introduced in #62 as a final buffer product to replace core.internal.buffer.Buffer. This ticket is for:

  • Converting Base32 implementation over to use it
  • Converting Base16 implementation over to use it
  • Deleting the core.internal.buffer directory and all it's contents.

Add `encoding-core` module

Add a base module (i.e. encoding-core) to:

  • Move all common code to it
  • Have common extension functions that utilize the Encoder/Decoder abstraction
    • So a new encoder (not implemented by this library) can be utilized if library consumer creates it.
    • Optimization, because right now there is a lot of unnecessary array creation. Would be great if the
      extension functions for the type (String, CharArray, ByteArray) were able to instantiate those types
      off the bat and then input things as decoding/encoding occurs by converting the returned Byte.
  • Enable ability to have Encoder and Decoder classes which can be passed around
  • Enable ability to stream bytes/chars in or out, and receive encoded/decoded bytes on the other end
  • Encoder/Decoder configurations
  • Give people the ability to easily create their own Encoder/Decoders

`DeocderInput` should not call `EncoderDecoder.Config.decodeOutMaxSizeOrFail`

Currently the DecoderInput utility class takes in an EncoderDecoder.Config and calls it's decodeOutMaxSizeOrFail method from the DecoderInput.init block.

This should be a separate step. The EncoderDecoder.Config should have 4 methods:

  • A: public method that accepts DecoderInput and returns an Int
  • B: protected abstract method that accepts DecoderInput and the size of the input up to the last relevant character
  • C: public method that accepts a Long and returns a Long
  • D: protected abstract method that accepts a Long and returns a Long
public class DecoderInput private constructor(private val input: Any, internal val size: Int) {

    @Throws(EncodingException::class)
    public operator fun get(index: Int): Char get() {
        try {
            when (input) {
                is CharSequence -> input[index]
                is CharArray -> input[index]
                is ByteArray -> input[index].char
            }
        } catch (e: IndexOutOfBoundsException) {
            throw EncodingException("Index out of bounds", e)
        }
    }
}
public sealed class EncoderDecoder(config: Config): Encoder(config) {

    // ...

    public abstract class Config(
        @JvmField
        isLenient: Boolean?,
        @JvmField
        paddingByte: Byte?,
    ) {
        @Throws(EncodingException::class)
        protected abstract decodeOutMaxSizeOrFailProtected(lastRelevantCharacter: Int, input: DecoderInput): Int

        @Throws(EncodingException::class)
        public fun decodeOutMaxSizeOrFail(input: DecoderInput): Int {
            var lastRelevantChar = input.size
            while (size > 0) {
                val c = input[lastRelevantChar - 1]

                if (isLenient != null && c.isSpaceOrNewLine) {
                    if (isLenient) {
                        lastRelevantChar--
                        continue
                    } else {
                        throw EncodingException("...")
                    }
                }

                if (c.byte == paddingByte) {
                    lastRelevantChar--
                    continue
                }

                break
            }

            if (lastRelevantChar == 0) return 0
            val maxSize = decodeOutMaxSizeOrFailProtected(lastRelevantChar, input)
            if (maxSize < 0) throw EncodingSizeException("...")
            return maxSize
        }

        // ...

    }

    // ...
}

Add a `BOM` publication

Add a Bill of Materials publication.

Update the README but comment out that block until next release.

Migrate `encoding-base64` to use `encoding-core`

Part 5 of #36

Migrate the base64 module to utilize the encoding-core module. This is an opportunity to write the Base16 Decoder/Encoder in package io.matthewnelson.encoding.base64 and remove component, then have all the current method bodies simply use it as to not disturb the current APIs and remain backwards compatible.

Related to #26

Migrate `encoding-base16` to use `encoding-core`

Part 3 of #36

Migrate the base16 module to utilize the encoding-core module. This is an opportunity to write the Base16 Decoder/Encoder in package io.matthewnelson.encoding.base16 and remove component, then have all the current method bodies simply use it as to not disturb the current APIs and remain backwards compatible.

Related to #28

Use generics with `EncoderDecoder` to pass `Config` type

Accessing the EncoderDecoder.Config always requires a cast because there is no type specified for EncoderDecoder. This also occurs within a Feed now that #70 made EncoderDecoder.Feed.config public.

public sealed class Decoder<C: EncoderDecoder.Config>(public val config: C) {

    public abstract fun newDecoderFeed(out: OutFeed): Decoder<C>.Feed

    public inner class Feed: EncoderDecoder.Feed<C>(config) {
        // ...
    }

    // ...
}

public sealed class Encoder<C: EncoderDecoder.Config>(config: C): Decoder(config) {

    public abstract fun newEncoderFeed(out: OutFeed): Encoder<C>.Feed

    public inner class Feed: EncoderDecoder.Feed<C>(config) {
        // ...
    }

    // ...
}

public abstract class EncoderDecoder<C: Config>(config: C): Encoder(config) {
    public abstract class Config(
        // ...
    ) {
        // ...
    }

    public sealed class Feed<C: Config>(public val config: C) {
        // ...
    }
}

Need to think on this more because it affects user experience with the library. NOT doing it and requiring a cast all the time also affects user experience... so could go both ways.

Clean up unnecessary conversions

Currently EncoderDecoder.Feed.consume is utilized for both Decoder.Feed and Encoder.Feed, and accepts a Byte for both operations. This results in a lot of conversions between Byte to Char and Char to Byte.

Because we can't specify the Char type using generics (JS does not do Chars), consume and consumeProtected must become wholly separate methods implemented on each Decoder.Feed and Encoder.Feed with the correct input types, Char and Byte respectively.

Furthermore, OutFeed should also be separate interfaces for each operation. Encoder.OutFeed which outputs a Char, and Decoder.OutFeed which outputs a Byte

Move common logic to `encoding-core`

Part 2 of #36

Build out the abstractions needed for commonizing the implementations. Low level stuff should go here and remain agnostic of the encoding specification.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.