Comments (5)
Thanks! You both convinced me to have Arquero throw an error when there are no shared columns. I've also updated the PR to (1) ensure proper handling of natural left, right, and full joins, and (2) suppress duplicated key column output.
The handling of full outer joins is a bit subtle. In SQL, natural full joins are not allowed. In dplyr, they are allowed but cause the "duplicate" key columns to be merged, using the value of whichever column is non-empty. I've followed the dplyr semantics here in Arquero when de-duplicating shared key columns in a full outer join.
from arquero.
Thanks for the suggestion! Rather than add a new method, I'm wondering if instead this should be the default behavior of join when given only a table as input. That sounds like it matches dplyr's behavior, as well as both your expectation and @juba's. Any thoughts?
from arquero.
I have a candidate PR for this issue here: #28
Rather than throw an error if there are no shared keys, the result is a Cartesian product (akin to cross
), as this matches SQL natural join semantics.
from arquero.
I'd be happy to have this be default behavior for join
, just thought this behavior might have been dictated by some standard.
That pull request works when 'on' is not specified, but it's also sensible to have this behavior if there is an on condition, as
sometimes happens with generic column names. E.g.:
const t1 = aq.table({'state': ["MA", "NY"], 'count': [1, 4]})
const t2 = aq.table({'state': ["MA", "NY"], 'count': [100, 400]})
t1.join(t2, [['state'], ['state']])
could return the equivalent of
t1.join(t2, [['state'], ['state']], [aq.all(), aq.not("state")])
My fear about cross joins is that frequently when I'm being dumb, I try to join tables before properly aligning the column names and end up trying to allocate an ungodly number of rows in MySQL. dplyr
supplies this message in the equivalent case: "Error: by
must be supplied when x
and y
have no common variables. Use by = character()` to perform a cross-join."
from arquero.
I think this would be great, and I also agree with @bmschmidt caution about the potential danger of non explicitly asked for cartesian product : I already ended several times with a hanged session in R doing these sort of things, too.
from arquero.
Related Issues (20)
- Include an option to treat Arrow binary columns as String
- Interpolate missing values
- arquero op.first_value function issue HOT 2
- table.print() should return table to enable chaining HOT 2
- Citation for academic paper HOT 2
- Nuxt 2.0 build failing due to apache-arrow exports HOT 6
- Add op functions to work with Map and Set objects HOT 1
- `fromCSV` fails with uncaught TypeError on CSV with headers only HOT 1
- Join ignores empty string as suffix
- Table expressions do not support underscores as numeric separators in numeric literals
- CSV parse functions don't get run on null values
- Failing during production build: minification problem? HOT 10
- Verb to drop columns by name? HOT 2
- derive can not handle string? HOT 2
- Table concatenation results in empty table
- Problems getting Arquero to find it's types in Typescript HOT 5
- array_agg and undefined/none values HOT 3
- fromArrow -> Unrecognized type: "undefined"
- COUNTIF-like aggregate function
- NextJS swcMinify is breaking arquero
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arquero.