Comments (4)
This is a very interesting bug!
@Test
fun `describe twice 1`() {
val df = dataFrameOf("a", "b")(1, 2, 3, 4)
val desc1 = df.describe()
val desc2 = desc1.describe()
desc2::class shouldBe DataFrameImpl::class
}
works fine, but
@Test
fun `describe twice 2`() {
val df = dataFrameOf("a", "b")(1, "foo", 3, "bar")
val desc1 = df.describe()
val desc2 = desc1.describe()
desc2::class shouldBe DataFrameImpl::class
}
breaks.
I suspect this is due to columns being created with type Comparable<*>
/Comparable<Nothing>
after running describe()
:
name:String type:Any count:Int unique:Int nulls:Int top:Comparable<*> freq:Int mean:Double? std:Double? min:Comparable<*> median:Comparable<*> max:Comparable<*>
0 a Int 2 2 0 1 1 2.0 1.414214 1 2 3
1 b String 2 2 0 foo 1 null null bar bar foo
If you now run another describe()
on this table, it will try to find the min of columns like top
and compare Int
and String
. However, these two are incomparable, as we can see by the exception.
Our current implementation only checks if a column AnyCol.isComparable() = isSubtypeOf<Comparable<*>?>()
, not whether the type T != Nothing
. I'm not sure we can actually.
from dataframe.
Our current implementation only checks if a column AnyCol.isComparable() = isSubtypeOf<Comparable<*>?>(), not whether the type T != Nothing. I'm not sure we can actually.
Maybe typeOf<Comparable<Any?>?>()
will work as expected here. I suspect this code was written with an assumption that * means Any?, but Comparable has in variance and Comparable<*>
== Comparable<Nothing>
and from type system perspective you can't compare two Comparable<Nothing>
from dataframe.
@koperagen thanks for the tip! But unfortunately:
typeOf<Int>().isSubtypeOf(typeOf<Comparable<Any?>>()) == false
typeOf<Int>().isSubtypeOf(typeOf<Comparable<Any>>()) == false
variance is fun :)
from dataframe.
It can be fixed like:
/**
* Returns `true` if [this] column is comparable, i.e. its type is a subtype of [Comparable] and its
* type argument is not [Nothing].
*/
public fun AnyCol.isComparable(): Boolean = isSubtypeOf<Comparable<*>?>()
&& type().projectTo(Comparable::class).arguments[0].let {
it != KTypeProjection.STAR &&
it.type?.isNothing != true
}
I'll probably make a PR later :)
from dataframe.
Related Issues (20)
- isOpenApiStr logger leaks to Gradle HOT 2
- Improve and document CSV reading options HOT 3
- Add a migration guide for Pandas developers
- DataFrame with empty column infers type `Any` instead of `Nothing`
- Upgrade Database dependencies for 0.14 Release
- Update a README with examples descriptions
- JDBC KType mismatches in "debug" mode
- Create a KDoc Preprocessor guide for the DataFrame Project
- Apache Hive (or extensible JDBC) support HOT 1
- Remove old linter information from website
- Provide KDocs for public API: Continued
- Add KDocs for `move`
- Add KDocs for `flatten`
- Add KDocs for `gather`
- Add KDocs for `add`
- Add KDocs for `distinct`/`distinctBy`
- Add KDocs for `remove`
- Add KDocs for `group`/`ungroup`
- Add KDocs for `groupBy`/`pivot`
- DataFrame.describe() Sortable by Type HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataframe.