twitter / bijection Goto Github PK
View Code? Open in Web Editor NEWReversible conversions between types
License: Apache License 2.0
Reversible conversions between types
License: Apache License 2.0
Or should it? It seems like it would be pretty natural, and would apply to cases where people just need one way. Thoughtz?
one thing I found a little annoying when refactoring some of this library was having to run the codegen code by hand. Since the project is already using sbt as its build tool, let's let sbt do the work for us! I've had experience with code generation in sbt before. I'll grab this one.
Something like
class Tuple2ToColl[A,B,C](implicit a2c: Bijection[A,C], b2c: Bijection[B,C])
extends Bijection[(A,B),Iterable[C]] {
override def apply(tuple: (A,B)) = {
val (a,b) = tuple
Iterable(a2s(a), b2s(a))
}
override def invert(iter: Iterable[C]) = {
val a :: b = iter
(a2c.invert(a), b2c.invert(b))
}
}
Something like writing the length of the Array[Byte], then the bytes themselves.
This could be combined with Bijection[T,Array[Byte]] to do any Bijection[Iterable[T], Array[Byte]] using toContainer.
I guess you can do: Bijection[Iterable[Array[T]], Array[U]] if you have Bijection[Int, Array[U]] and Bijection[T, Array[U]]
We can loosen up the types from Array/Iterable using CanBuildFrom.
Then get something like: C1[C2[T]] <=> C3[U] if you have the right CanBuildFrom and Int <=> C3[U]. and T <=> C3[U]
To quote @sritchie, "Once science bumps, we can get all this jank out"
Bijections to and from Extensible Data Notation will allow for solid interop with Clojure and Clojurescript.
This code will be very similar to the json parser.
Use of .build creates two anonymous functions.
The roundTrip test only goes from A -> B -> A. I think we should have: inInjective[A,B] and isSurjective[A,B], and then isBijective[A,B] = isInjective[A,B] && isSurjective[A,B]
Use toMap
on the way out.
Also, should we be doing an allocation on the way to JMap with new JHashMap(m.asJava)
?
The .newBuilder code is very risky as it is creating hidden mutable state that could be a problem. Bufferables (and everything) should be immutable and referentially transparent.
should be
object CastInjection {
def of[A <: B, B]: Injection[A, B] = new AbstractInjection[A, B] {
def apply(a: A) = a.asInstanceOf[B] // Always succeeds
override def invert(b: B) = allCatch.opt(b.asInstanceOf[A])
}
}
Might be useful.
We should have some test java code, and perhaps some shims to make it easier to use this code from Java.
I'm concerned about the @@ Rep[T] types particularly, but since those are generally applied to encoding cases, we can possibly side step.
using implicit Bijection[A,C], Bijection[B,D]
Bijection that moves between deflated and inflated bytes.
Here's the idea (please forgive my lack of wrapper classes):
import java.util.zip.{ Deflater, Inflater, InflaterInputStream, DeflaterOutputStream }
import java.io.{ ByteArrayInputStream, ByteArrayOutputStream }
class DeflaterBijection(deflater: Deflater, inflater: Inflater) extends AbstractBijection[Array[Byte], Array[Byte]] {
def apply(bytes: Array[Byte]) = {
val baos = new ByteArrayOutputStream
val dos = new DeflaterOutputStream(baos, deflater)
dos.write(bytes)
dos.close
baos.toByteArray
}
override def invert(bytes: Array[Byte]) = {
val baos = new ByteArrayOutputStream
StreamUtils.copy(new InflaterInputStream(new ByteArrayInputStream(bytes), inflater), baos)
baos.toByteArray
}
}
It would be nice to have Future, Try, Return bijections, probably in a separate build target.
Want a function like:
def inverse[A, B](inj: Injection[A, B]): Injection[B, Either[B, A]]
which inverts B to A when possible, else it returns Left of the input.
invert has to be careful to return None for Left(b)
if inj.invert(b).isDefined
have to be careful with cycles in the implicit resolution.
We should add a class to represent cases where we can't reverse:
trait PartialBijection[A,B] extends ((A) => Option[B]) {
def apply(a: A): Option[B]
def invert(b: B): Option[A]
def andThen(p: PartialBijection[B,C]): PartialBijection[A,C]
def orElse(bij: Bijection[A,B]): Bijection[A,B]
}
I am really missing DateOps object in SummingBird-Storm. Basically the issue is that
TimeExtractor() expects Long and I am getting dates encoded as String in various formats. In Scalding DateOps.stringToRichDate does a reasonably good job of converting String to RichDate. Would it be useful to steal code from DateOps/DateParser and create an Injection that converts a String to Long by guessing the date format?
Let me know if it make sense and I can take a stab at creating the Injection
Would be useful until twitter futures extend scala Future.
Please specify version of SBT used to build bijection. I have tried a few versions of SBT and build kept failing due to a bug in sbt referenced below
Check out this JSON<->Protobuf conversion package: https://github.com/turn/shapeshifter
Would be neat to have a Bijection that wrapped this.
I would like a way to encode one way hash functions like md5, sha1 from Guava (http://docs.guava-libraries.googlecode.com/git-history/v11.0/javadoc/com/google/common/hash/Hashing.html). I thought about encoding them as injections and always returning failure in invert method but it feels like a code smell and also strictly speaking md5 etc are not injective due to the possibility of collision and more than one element in A mapping to an element in B.
I am not sure what this trait would be called but it would be nice to have a trait that can be used to encode these one way transformation
Something like:
before_script:
- "echo $JAVA_OPTS"
- "export JAVA_OPTS=-Xmx512m"
Scrooge generated ThriftEnum does extend the apache TEnum, e.g. https://github.com/twitter/scrooge/blob/master/scrooge-core/src/main/scala/com/twitter/scrooge/ThriftEnum.scala#L5
sealed trait EngagementType extends ThriftEnum with Serializable
But the com.twitter.bijection.thrift.TEnumCodec.toBinary[T <: TEnum] throws when used with scrooge TEnum.
Maybe the solution is to simply create a com.twitter.bijection.scrooge.ThriftEnumCodec, though it would be nice to catch this at compile time.
scala> import com.twitter.discover.summingbird.common.thriftscala.EngagementType
import com.twitter.discover.summingbird.common.thriftscala.EngagementType
scala> import EngagementType._
import EngagementType._
scala> val inj = com.twitter.bijection.thrift.TEnumCodec.toBinary[EngagementType]
inj: com.twitter.bijection.Injection[com.twitter.discover.summingbird.common.thriftscala.EngagementType,Array[Byte]] = com.twitter.bijection.Injection$$anon$1@12141052
scala> inj(Click)
res0: Array[Byte] = Array(0, 0, 0, 1)
scala> inj.invert(res0).get
java.lang.NoSuchMethodException: com.twitter.discover.summingbird.common.thriftscala.EngagementType.findByValue(int)
at java.lang.Class.getMethod(Class.java:1622)
at com.twitter.bijection.thrift.TEnumCodec.findByValue(ThriftCodecs.scala:117)
at com.twitter.bijection.thrift.TEnumCodec$$anonfun$invert$3.apply(ThriftCodecs.scala:121)
at com.twitter.bijection.thrift.TEnumCodec$$anonfun$invert$3.apply(ThriftCodecs.scala:121)
at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:176)
at scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:45)
at com.twitter.bijection.thrift.TEnumCodec.invert(ThriftCodecs.scala:121)
at com.twitter.bijection.thrift.TEnumCodec.invert(ThriftCodecs.scala:114)
at com.twitter.bijection.Injection$$anon$1$$anonfun$invert$1.apply(Injection.scala:41)
at com.twitter.bijection.Injection$$anon$1$$anonfun$invert$1.apply(Injection.scala:41)
at scala.util.Success.flatMap(Try.scala:199)
at com.twitter.bijection.Injection$$anon$1.invert(Injection.scala:41)
at .<init>(<console>:14)
at .<clinit>(<console>)
at .<init>(<console>:11)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:704)
at scala.tools.nsc.interpreter.IMain$Request$$anonfun$14.apply(IMain.scala:920)
at scala.tools.nsc.interpreter.Line$$anonfun$1.apply$mcV$sp(Line.scala:43)
at scala.tools.nsc.io.package$$anon$2.run(package.scala:25)
at java.lang.Thread.run(Thread.java:722)
So we can do ByteBuffer or netty Buffers, or potentially Kryo Output/Input objects.
The idea is to write the code for the primitives, and then just leverage those in the containers.
I think doing the map here is really slowing things down. JavaConverters don't require copies, just wrappers (I am fairly sure).
I think we need to have connect[Set[T], Set[U], JSet[U]] rather than connect[Set[T], JSet[U]] to make it clear the costs involved. I'd rather remove the implicits using map with .as.
README file claims that both bijection-avro and bijection-hbase are released as part of 0.5.2 but I cant find them on maven central. Am i doing something wrong or the documentation is incorrect?
Something like:
trait Encoder[T, U] {
def apply(t: T): U
// and andThen, toFn,
}
trait Decoder[U, T] {
def apply(u: U): Try[T]
}
// composition might be clearer than inheritance here.
trait Injection[T, U] {
def encoder: Encoder[T, U]
def decoder: Decoder[U, T]
}
trait Bijection[T, U] {
def encoder: Encoder[T, U]
def inverseEncoder: Encoder[U, T]
}
labeled as storehaus.
We should copy this new TBinaryProtocol override:
This will speed up errors in existing Storm jobs.
So, bijection was about making conversion explicit and following a typesafe pattern, but to find an ImplicitBijection, you need to have the Bijection marked implicit. If you customize and put a local implicit Bijection, due to extending Function, it becomes an implicit conversion again. Not what we intended:
scala> trait Danger[A, B] extends (A => B)
defined trait Danger
scala> implicit val danger = new Danger[String, Int] { def apply(s: String) = s.toInt }
danger: java.lang.Object with Danger[String,Int] = <function1>
scala> def test(i: Int) = 3 + i
test: (i: Int)Int
scala> test("4")
res5: Int = 7
Proposal: remove the extending of Function1 and put an implicit conversion form Bijection/Injection to Function1 in the campanion objects. This breaks binary compatibility, but not source level compatibility.
object BijectionImplicits {
def connect[A, B, C](implicit bij: Bijection[A, B], bij2: Bijection[B, C]): Bijection[A, C] = bij andThen bij2
def connect[A, B, C, D](implicit bij: Bijection[A, B], bij2: Bijection[B, C], bij3: Bijection[C, D]): Bijection[A, D] = connect[A, B, C] andThen bij3
def connect[A, B, C, D, E](implicit bij: Bijection[A, B],
bij2: Bijection[B, C],
bij3: Bijection[C, D],
bij4: Bijection[D, E]): Bijection[A, E] =
connect[A, B, C, D] andThen bij4
implicit def thrift2Bytes[T <: TBase[_,_]: Manifest]: Bijection[T, Array[Byte]] = BinaryThriftCodec.apply[T]
}
On the front page, the definition for bijection is given:
A Bijection is an invertible function that converts back and forth between two different types, with the contract that a round-trip through the Bijection will bring back the original object.
This is not true. At first I thought, this is just a matter of correcting the definition. However, what I saw next were code examples like:
scala> Bijection[Int, String](100)
res2: String = 100
This is when I fell over.
At least I can see that the code has taken this incorrect definition seriously! I have picked myself up now, so let me clarify things.
First, a bijection is always injective and surjective. The code above is not a bijection, because it is not even a surjection. In fact, it is not possible to product a surjection from Int to String, let alone a bijection. However, in this case, there is an injection from Int to String and I expect this is the implementation. Note that this is an injective, non-surjective (and therefore, non-bijective) implementation.
Further, there are open issues such as "integration with Lens" #4. While indeed there is an arrow (homomorphism) from Bijection to Lens[1], since much of what is implemented here are not bijections, then any attempt to implement this transformation will result in "not a lens." It is just asking for even more trouble.
My primary suggestion is that all values of the type Bijection are bijective. This should include the removal of all non-bijective values of the type Bijection. Also, the written definition of bijection requires correction as it is very misleading. Finally, it might be worth considering a project split into "injection" and "surjection."
[1] Example of an implementation here https://github.com/scalaz/scalaz/blob/scalaz-seven/core/src/main/scala/scalaz/BijectionT.scala#L24
I think we want:
def as[B](a: A)(implicit bij: Bijection[A, _ <: B]): B = bij(a)
But I'm not sure the implicit resolution works in that case (no reason it shouldn't, just haven't tested).
http://twanvl.nl/blog/haskell/isomorphism-lenses
I think something like:
// Type inference should work without having to specify R in the below unless there are multiple valid bijections
object Lens {
def get[A,B](a: A)(implicit bij: Bijection[A,(B,_)]): B = bij(a)._1
def modify[A,B,R](a: A)(update: B => B)(implicit bij: Bijection[A,(B,R)]): A = {
val decons = bij(a)
bij.invert((update(decons._1), decons._2))
}
def set[A,B,R](a: A, b: B)(implicit bij: Bijection[A,(B,R)]): A = {
val decons = bij(a)
bij.invert((b, decons._2))
}
}
This relies on having good bijections from A into each of it's parts. It would be great if we could have a way to signify that (B,R) could really be any tuple of B with any other items (e.g. (B,R1,R2,R3...) would work.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.