twitter / bijection Goto Github PK

View Code? Open in Web Editor NEW

657.0 657.0 125.0 2.44 MB

Reversible conversions between types

License: Apache License 2.0

Scala 78.26% Java 21.73% Shell 0.01%

bijection's People

Contributors

Stargazers

Watchers

Forkers

jedws sorenmacbeth samdev22 hotboyksa jcoveney azymnis mpilquist jennadawn simonandluna krishnanraman softprops mosesn slyphon cruncherbigdata mansurashraf richwhitjr declerambaul anujsrc biswapanda donadam13 hammer travisbrown lkjx77 banno alexarchambault kanzhang adam-singer alvinhom animeshinvinci emaxerrno bdacode coderxiang sigalslpsn bigdbcloud plaflamme demandcube math4youbyusgroupillinois colinmarc folone nicky-isaacs dln pinguo-liguo jbripley ahjohannessen fikrimuhal jeremiesimon dschobel ianoc benpence sportagraph kyleprier joshdeanmcdonald bkirwi sriramkrishnan rkulan007 mjeffryes kakamessi99 kobefeng gitter-badger running-fish solidm is00hcw rtvt123 piyushnarang hangjun mkralka olshansk bedeedidiong clhodapp mattthomson ttim wanshenghua mio-stripe rubanm 3tty0n flavioprosperi christo4b renanpalmeira sugengbin jcohen noway56 chetaldrich martijnhoekstra scala-steward regadas chrisbenincasa volchik17 finch0001 oudenaar sethtisue olafurpg adedayoominiyi drruisseau doytsujin classicvalues isabella232 raj-codal geluoltean msbarry psftc

bijection's Issues

ImplicitBijection has some issues.

reverse should not use lazy val, just val.
should not extend Function (unwanted implicit conversion).
there should be an implicit conversion from Bijection, so if someone explicitly passes a Bijection, it works without doing ImplicitBijection.forward(bij)

Injection should extend A => B

Or should it? It seems like it would be pretty natural, and would apply to cases where people just need one way. Thoughtz?

use sbt source generator for running code gen.

one thing I found a little annoying when refactoring some of this library was having to run the codegen code by hand. Since the project is already using sbt as its build tool, let's let sbt do the work for us! I've had experience with code generation in sbt before. I'll grab this one.

Add BigInteger to numeric bijections and injections

Tuple* => Iterable

Something like

class Tuple2ToColl[A,B,C](implicit a2c: Bijection[A,C], b2c: Bijection[B,C])
extends Bijection[(A,B),Iterable[C]] {
  override def apply(tuple: (A,B)) = {
    val (a,b) = tuple
    Iterable(a2s(a), b2s(a))
  }
  override def invert(iter: Iterable[C]) = {
    val a :: b = iter
    (a2c.invert(a), b2c.invert(b))
  }
}

Bijection[Iterable[Array[Byte]], Array[Byte]]

Something like writing the length of the Array[Byte], then the bytes themselves.

This could be combined with Bijection[T,Array[Byte]] to do any Bijection[Iterable[T], Array[Byte]] using toContainer.

I guess you can do: Bijection[Iterable[Array[T]], Array[U]] if you have Bijection[Int, Array[U]] and Bijection[T, Array[U]]

We can loosen up the types from Array/Iterable using CanBuildFrom.

Then get something like: C1[C2[T]] <=> C3[U] if you have the right CanBuildFrom and Int <=> C3[U]. and T <=> C3[U]

Remove maven.twttr.com

To quote @sritchie, "Once science bumps, we can get all this jank out"

bijection-edn module

Bijections to and from Extensible Data Notation will allow for solid interop with Clojure and Clojurescript.

This code will be very similar to the json parser.

https://github.com/edn-format/edn

Don't use Bijection/Bufferable.build on base implementation

Use of .build creates two anonymous functions.

improve the tests

The roundTrip test only goes from A -> B -> A. I think we should have: inInjective[A,B] and isSurjective[A,B], and then isBijective[A,B] = isInjective[A,B] && isSurjective[A,B]

Add immutable map -> JMap etc to CollectionBijections

Use toMap on the way out.

Also, should we be doing an allocation on the way to JMap with new JHashMap(m.asJava)?

Bufferable using Builder is not (thread)safe

The .newBuilder code is very risky as it is creating hidden mutable state that could be a problem. Bufferables (and everything) should be immutable and referentially transparent.

CastInjection's type signature is wrong

should be

object CastInjection {
  def of[A <: B, B]: Injection[A, B] = new AbstractInjection[A, B] {
    def apply(a: A) = a.asInstanceOf[B] // Always succeeds
    override def invert(b: B) = allCatch.opt(b.asInstanceOf[A])
  }
}

Should we add a "bijection" target that contains every subproject?

Might be useful.

Java support and tests

We should have some test java code, and perhaps some shims to make it easier to use this code from Java.

I'm concerned about the @@ Rep[T] types particularly, but since those are generally applied to encoding cases, we can possibly side step.

Add a Bijection[Map[A,B],Map[C,D])

using implicit Bijection[A,C], Bijection[B,D]

DeflaterBijection

Bijection that moves between deflated and inflated bytes.

Here's the idea (please forgive my lack of wrapper classes):

import java.util.zip.{ Deflater, Inflater, InflaterInputStream, DeflaterOutputStream }
import java.io.{ ByteArrayInputStream, ByteArrayOutputStream }

class DeflaterBijection(deflater: Deflater, inflater: Inflater) extends AbstractBijection[Array[Byte], Array[Byte]] {
  def apply(bytes: Array[Byte]) = {
    val baos = new ByteArrayOutputStream
    val dos = new DeflaterOutputStream(baos, deflater)
    dos.write(bytes)
    dos.close
    baos.toByteArray
  }
  override def invert(bytes: Array[Byte]) = {
    val baos = new ByteArrayOutputStream
    StreamUtils.copy(new InflaterInputStream(new ByteArrayInputStream(bytes), inflater), baos)
    baos.toByteArray
  }
}

Bijection should build its own thrift, scrooge, protobuf for testing

Using

Can Injection[A,B] be a subtype of Function1[A, B]?

Twitter Util Bijections

It would be nice to have Future, Try, Return bijections, probably in a separate build target.

Set jdk 6 compiler option before publishing

bijection-protobuf should rt to Json

inverse of Injection

Want a function like:

def inverse[A, B](inj: Injection[A, B]): Injection[B, Either[B, A]]

which inverts B to A when possible, else it returns Left of the input.

invert has to be careful to return None for Left(b) if inj.invert(b).isDefined

Add implicit from Bijection[A, B @@ Rep[A]] to Injection[A, B]

have to be careful with cycles in the implicit resolution.

PartialBijection

We should add a class to represent cases where we can't reverse:

trait PartialBijection[A,B] extends ((A) => Option[B]) {
  def apply(a: A): Option[B]
  def invert(b: B): Option[A]
  def andThen(p: PartialBijection[B,C]): PartialBijection[A,C]
  def orElse(bij: Bijection[A,B]): Bijection[A,B]
}

Create Injection[String,Long] by lifting code from Scalding DateOps

I am really missing DateOps object in SummingBird-Storm. Basically the issue is that
TimeExtractor() expects Long and I am getting dates encoded as String in various formats. In Scalding DateOps.stringToRichDate does a reasonably good job of converting String to RichDate. Would it be useful to steal code from DateOps/DateParser and create an Injection that converts a String to Long by guessing the date format?

Let me know if it make sense and I can take a stab at creating the Injection

bijection-avro module

Add java specific docs

https://github.com/typesafehub/genjavadoc

Create a bijection in -util for twitter.util.Future to scala Future.

Would be useful until twitter futures extend scala Future.

Specify version of SBT

Please specify version of SBT used to build bijection. I have tried a few versions of SBT and build kept failing due to a bug in sbt referenced below

sbt/sbt#728

bijection-protobuf + JSON

Check out this JSON<->Protobuf conversion package: https://github.com/turn/shapeshifter
Would be neat to have a Bijection that wrapped this.

Create a new Trait for one way morphism

I would like a way to encode one way hash functions like md5, sha1 from Guava (http://docs.guava-libraries.googlecode.com/git-history/v11.0/javadoc/com/google/common/hash/Hashing.html). I thought about encoding them as injections and always returning failure in invert method but it feels like a code smell and also strictly speaking md5 etc are not injective due to the possibility of collision and more than one element in A mapping to an element in B.

I am not sure what this trait would be called but it would be nice to have a trait that can be used to encode these one way transformation

Add more heap space for Travis-CI's runners

Something like:

before_script:
 - "echo $JAVA_OPTS"
 - "export JAVA_OPTS=-Xmx512m"

Inversion failure for TEnumCodec

Scrooge generated ThriftEnum does extend the apache TEnum, e.g. https://github.com/twitter/scrooge/blob/master/scrooge-core/src/main/scala/com/twitter/scrooge/ThriftEnum.scala#L5

sealed trait EngagementType extends ThriftEnum with Serializable

But the com.twitter.bijection.thrift.TEnumCodec.toBinary[T <: TEnum] throws when used with scrooge TEnum.

Maybe the solution is to simply create a com.twitter.bijection.scrooge.ThriftEnumCodec, though it would be nice to catch this at compile time.

scala> import com.twitter.discover.summingbird.common.thriftscala.EngagementType
import com.twitter.discover.summingbird.common.thriftscala.EngagementType

scala> import EngagementType._
import EngagementType._

scala> val inj = com.twitter.bijection.thrift.TEnumCodec.toBinary[EngagementType]
inj: com.twitter.bijection.Injection[com.twitter.discover.summingbird.common.thriftscala.EngagementType,Array[Byte]] = com.twitter.bijection.Injection$$anon$1@12141052

scala> inj(Click)
res0: Array[Byte] = Array(0, 0, 0, 1)

scala> inj.invert(res0).get
java.lang.NoSuchMethodException: com.twitter.discover.summingbird.common.thriftscala.EngagementType.findByValue(int)
    at java.lang.Class.getMethod(Class.java:1622)
    at com.twitter.bijection.thrift.TEnumCodec.findByValue(ThriftCodecs.scala:117)
    at com.twitter.bijection.thrift.TEnumCodec$$anonfun$invert$3.apply(ThriftCodecs.scala:121)
    at com.twitter.bijection.thrift.TEnumCodec$$anonfun$invert$3.apply(ThriftCodecs.scala:121)
    at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:176)
    at scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:45)
    at com.twitter.bijection.thrift.TEnumCodec.invert(ThriftCodecs.scala:121)
    at com.twitter.bijection.thrift.TEnumCodec.invert(ThriftCodecs.scala:114)
    at com.twitter.bijection.Injection$$anon$1$$anonfun$invert$1.apply(Injection.scala:41)
    at com.twitter.bijection.Injection$$anon$1$$anonfun$invert$1.apply(Injection.scala:41)
    at scala.util.Success.flatMap(Try.scala:199)
    at com.twitter.bijection.Injection$$anon$1.invert(Injection.scala:41)
    at .<init>(<console>:14)
    at .<clinit>(<console>)
    at .<init>(<console>:11)
    at .<clinit>(<console>)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:704)
    at scala.tools.nsc.interpreter.IMain$Request$$anonfun$14.apply(IMain.scala:920)
    at scala.tools.nsc.interpreter.Line$$anonfun$1.apply$mcV$sp(Line.scala:43)
    at scala.tools.nsc.io.package$$anon$2.run(package.scala:25)
    at java.lang.Thread.run(Thread.java:722)

In Bufferable should take a type parameter on Buffer

So we can do ByteBuffer or netty Buffers, or potentially Kryo Output/Input objects.

The idea is to write the code for the primitives, and then just leverage those in the containers.

JavaConverters use wrappers, don't map

https://github.com/twitter/bijection/blob/develop/bijection-core/src/main/scala/com/twitter/bijection/CollectionBijections.scala

I think doing the map here is really slowing things down. JavaConverters don't require copies, just wrappers (I am fairly sure).

I think we need to have connect[Set[T], Set[U], JSet[U]] rather than connect[Set[T], JSet[U]] to make it clear the costs involved. I'd rather remove the implicits using map with .as.

0.5.2 release

README file claims that both bijection-avro and bijection-hbase are released as part of 0.5.2 but I cant find them on maven central. Am i doing something wrong or the documentation is incorrect?

Proposal: Add Encoder and Decoder

Something like:

trait Encoder[T, U] {
  def apply(t: T): U
  // and andThen, toFn, 
}

trait Decoder[U, T] {
  def apply(u: U): Try[T]
}

// composition might be clearer than inheritance here.
trait Injection[T, U] {
  def encoder: Encoder[T, U]
  def decoder: Decoder[U, T]
}

trait Bijection[T, U] {
  def encoder: Encoder[T, U]
  def inverseEncoder: Encoder[U, T]
}

Injection[Boolean, Array[Byte]]

bijection-netty's ChannelBufferBijection is mispackaged

labeled as storehaus.

JODA Time Bijection Module

Fix TBinaryProtocol bugs addressed by Elephant-Bird update

We should copy this new TBinaryProtocol override:

https://github.com/rangadi/elephant-bird/blob/a92210fa9f365728a109f5dbee60781b4371d902/core/src/main/java/com/twitter/elephantbird/thrift/ThriftBinaryDeserializer.java

This will speed up errors in existing Storm jobs.

Should Bijection/Injection extend Function? (I think not)

So, bijection was about making conversion explicit and following a typesafe pattern, but to find an ImplicitBijection, you need to have the Bijection marked implicit. If you customize and put a local implicit Bijection, due to extending Function, it becomes an implicit conversion again. Not what we intended:

scala> trait Danger[A, B] extends (A => B)
defined trait Danger

scala> implicit val danger = new Danger[String, Int] { def apply(s: String) = s.toInt }
danger: java.lang.Object with Danger[String,Int] = <function1>

scala> def test(i: Int) = 3 + i
test: (i: Int)Int

scala> test("4")
res5: Int = 7

Proposal: remove the extending of Function1 and put an implicit conversion form Bijection/Injection to Function1 in the campanion objects. This breaks binary compatibility, but not source level compatibility.

Add "connect" and thrift implicit to library

object BijectionImplicits {
  def connect[A, B, C](implicit bij: Bijection[A, B], bij2: Bijection[B, C]): Bijection[A, C] = bij andThen bij2
  def connect[A, B, C, D](implicit bij: Bijection[A, B], bij2: Bijection[B, C], bij3: Bijection[C, D]): Bijection[A, D] = connect[A, B, C] andThen bij3
  def connect[A, B, C, D, E](implicit bij: Bijection[A, B],
                             bij2: Bijection[B, C],
                             bij3: Bijection[C, D],
                             bij4: Bijection[D, E]): Bijection[A, E] =
    connect[A, B, C, D] andThen bij4
  implicit def thrift2Bytes[T <: TBase[_,_]: Manifest]: Bijection[T, Array[Byte]] = BinaryThriftCodec.apply[T]
}

What a Bijection is not

On the front page, the definition for bijection is given:

A Bijection is an invertible function that converts back and forth between two different types, with the contract that a round-trip through the Bijection will bring back the original object.

This is not true. At first I thought, this is just a matter of correcting the definition. However, what I saw next were code examples like:

scala> Bijection[Int, String](100)
res2: String = 100

This is when I fell over.

At least I can see that the code has taken this incorrect definition seriously! I have picked myself up now, so let me clarify things.

First, a bijection is always injective and surjective. The code above is not a bijection, because it is not even a surjection. In fact, it is not possible to product a surjection from Int to String, let alone a bijection. However, in this case, there is an injection from Int to String and I expect this is the implementation. Note that this is an injective, non-surjective (and therefore, non-bijective) implementation.

Further, there are open issues such as "integration with Lens" #4. While indeed there is an arrow (homomorphism) from Bijection to Lens[1], since much of what is implemented here are not bijections, then any attempt to implement this transformation will result in "not a lens." It is just asking for even more trouble.

My primary suggestion is that all values of the type Bijection are bijective. This should include the removal of all non-bijective values of the type Bijection. Also, the written definition of bijection requires correction as it is very misleading. Finally, it might be worth considering a project split into "injection" and "surjection."

[1] Example of an implementation here https://github.com/scalaz/scalaz/blob/scalaz-seven/core/src/main/scala/scalaz/BijectionT.scala#L24

bijection-guava should handle immutable collections

Remove simple-json and only use jackson.

as should be upper bounded on B

I think we want:

def as[B](a: A)(implicit bij: Bijection[A, _ <: B]): B = bij(a)

But I'm not sure the implicit resolution works in that case (no reason it shouldn't, just haven't tested).

Integrate with the notion of Lens

http://twanvl.nl/blog/haskell/isomorphism-lenses

I think something like:

// Type inference should work without having to specify R in the below unless there are multiple valid bijections
object Lens {
  def get[A,B](a: A)(implicit bij: Bijection[A,(B,_)]): B = bij(a)._1
  def modify[A,B,R](a: A)(update: B => B)(implicit bij: Bijection[A,(B,R)]): A = {
    val decons = bij(a)
    bij.invert((update(decons._1), decons._2))
  }
  def set[A,B,R](a: A, b: B)(implicit bij: Bijection[A,(B,R)]): A = {
    val decons = bij(a)
    bij.invert((b, decons._2))
  } 
}

This relies on having good bijections from A into each of it's parts. It would be great if we could have a way to signify that (B,R) could really be any tuple of B with any other items (e.g. (B,R1,R2,R3...) would work.