Comments (5)
from dfply.
The only issue I can see is if you use by as list of length 2, e.g. by=['x1', 'x2']
. It is not clear whether you want to use x1
from left table and x2
from the right one or both columns from both tables. I would suggest using list for multiple columns and tuple for different names of the same column. The third option is to use dictionary (which is btw the closest to the implementation in dplyr
).
List (two columns to join by, same names in bot data frames):
a >> inner_join(b, by=['x1', 'x2'])
a.merge(b, left_on=['x1', 'x2'], right_on=['x1', 'x2'])
Tuple (single column, different names):
a >> inner_join(b, by=('x1', 'x2'))
a.merge(b, left_on='x1', right_on='x2')
from dfply.
from dfply.
I agree that by=['x1', 'x2']
should use both columns in both tables. But having to rename column before join is annoying. That's why I would use tuples for that case.
See this commit:
https://github.com/jankislinger/dfply/commit/2d892186eeda4f837e0f46a63ef45434b5c5b502
from dfply.
Cool thank you. Will merge the PR now.
from dfply.
Related Issues (20)
- A way to outer_join() by index?
- rename(CUT=X.cut, COLOR='color') - counterintuitive IMO HOT 1
- Liveness of this project HOT 1
- group_by ModuleNotFoundError: No module named 'dfply.group'; 'dfply' is not a package
- Performance issue
- NameError: name 'filter_for_jpeg' is not defined
- mutate not support the new variables that created above HOT 2
- Can I use `%in%` operator in dfply framework? HOT 1
- missing cumcount() function in embedded column functions
- unable to aggregate and summarize counts for a categorical variable HOT 1
- When using `spread`, Type error occurs on not callable 'Client' object HOT 1
- Issues to convert to pandas dataframe
- how to use case_when function
- Issue arranging data after summarising with a new variable HOT 1
- Not taking into account None values in group_by HOT 1
- missing sum() from summary function
- inner_join can not use by this way
- #group_by #mode #iter()
- Filtering two or more elements from a column
- how to summarise multiple variables? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dfply.