Comments (9)
Hi @deepyaman , thanks a lot for pointing this out. It's something I 've thought of proposing to fix quite a few times but was never a priority.
Another solution to consider would be to have default_save_args
and default_load_args
as class variables (as this is what they really are): (edit: This is actually what you did as well, sorry, somehow I missed that 🤦♂ )
class Base:
default_save_args = {}
default_load_args = {}
@property
def _load_args(self):
return ({**self.default_load_args, **self._load_args_}
if hasattr(self, '_load_args_') else {**self.default_load_args})
@property
def _save_args(self):
return ({**self.default_save_args, **self._save_args_}
if hasattr(self, '_save_args_') else {**self.default_save_args})
@_load_args.setter
def _load_args(self, load_args):
self._load_args_ = load_args if load_args is not None else {}
@_save_args.setter
def _save_args(self, save_args):
self._save_args_ = save_args if save_args is not None else {}
class Child(Base):
default_save_args = {'index': False}
def __init__(self, load_args=None, save_args=None):
self._load_args = load_args
self._save_args = save_args
So that:
In [7]: c = Child({'hi': 'there'}, {'extra': 1})
In [8]: c._load_args
Out[8]: {'hi': 'there'}
In [9]: c._save_args
Out[9]: {'index': False, 'extra': 1}
In [10]: c.default_save_args
Out[10]: {'index': False}
This would avoid the __init__
on the parent class, remove the pylint: disable=super-init-not-called
or simplify the code for classes that don't make use of defaults.
I also like you proposition of making the default_*_args
a public attribute, its kind of hidden in the constructor atm, so less "magic" for our users
Btw, I m not sure if what I suggested is the right way
Maybe wait and see what others also say? @idanov @tolomea
from kedro.
Thank you so much for this @deepyaman! We'll await feedback from @idanov on this and will get back to you.
from kedro.
@tsanikgr Thanks for the feedback! I believe the property implementations can be further simplified as:
@property
def _load_args(self):
return {**self.default_load_args, **getattr(self, '_load_args_', {})}
(and the analogous method for _save_args
)
from kedro.
👏 Indeed! Unfortunately, we can't avoid the setters, otherwise self._loads_args = load_args
in the constructor of children won't work
from kedro.
I'm fine with the above approach and happy to make the changes if others agree. @idanov @tolomea
from kedro.
Thanks @deepyaman for raising this. We've had discussions internally and this has come up a few times already. However I don't think this is something which should be added - we should leave each implementation to deal with default arguments as they see fit - they might not even have default arguments if they want to.
The dataset abstraction is meant to be not very prescriptive in order to allow for all heterogenous and very different in their own way datasets. Adding more structure to the abstraction will make that functionality usefull for only some datasets but not others. That is going to undermine the abstraction - abstract classes are not meant to remove code repetition, but to create useful and general abstractions. Also in this case we are talking about very small code repetition, which does not justify paying the price of having a more complex and bad abstraction at the end.
Having heterogenous classes means that the code will remain to be federated by nature, which will inevitably result in some code repetition here and there, and that's not such a bad thing.
Duplication is far cheaper than the wrong abstraction
See https://www.sandimetz.com/blog/2016/1/20/the-wrong-abstraction
from kedro.
@idanov I understand your point. That being said, there are options (from the Possible Alternatives
section in the original issue) that leave the ABC as is. Copied from above:
- Create a mixin so as to not modify AbstractDataSet and thereby only apply to those with load_args/save_args (e.g.
DefaultArgsMixin
) - Move default argument handling into a utility function and call it from each individual init method (not preferred)
Finally, from the Context
provided in the original issue:
When I want to implement a new dataset, I look at existing datasets as a baseline for implementing my own. However, there are inconsistencies between these datasets [...]. I don't know which one to follow to maintain consistency across the codebase.
At the very minimum, I feel like it would be an improvement to standardize this functionality across existing datasets, even if the code is identical/copy-pasted in each.
Thoughts?
from kedro.
@deepyaman I agree with your last point, so I suppose we can do at least that. I like your idea for DefaultArgsMixin
, however we need to find more benefits to it to justify paying the price in terms of complexity. For the time being, we can do that for contrib
datasets only, where the built-in will just have identical copy of the defaults setting code.
from kedro.
@deepyaman I agree with your last point, so I suppose we can do at least that. I like your idea for
DefaultArgsMixin
, however we need to find more benefits to it to justify paying the price in terms of complexity. For the time being, we can do that forcontrib
datasets only, where the built-in will just have identical copy of the defaults setting code.
@idanov Sounds great! I've updated #15 with the above changes.
from kedro.
Related Issues (20)
- README generated by `kedro new` references Spaceflights, even if that's not what you chose HOT 1
- Using --name and --config with project name provided to kedro new produces unexpected behaviour
- Tweak search ranking weighting to improve docs search results
- Use of `.env` and enable logging automatically HOT 1
- Nodes with same output dataset for Partitioned Scenarios
- Unexpected Namespace Filtering Behavior in Modular Pipelines HOT 2
- move `kedro-datasets` from core dependency to optional dependency
- Customise `kedro new` hint after interactive project creation flow
- Fix the overwriting of nested parameters with `OmegaConfigLoader` HOT 6
- `kedro micropkg pull` is slow
- Docs rendering a bit weird when click on header. HOT 1
- ci: Nightly build failure on `main` HOT 11
- ci: Nightly build failure on `develop` HOT 11
- Enhancing Pipeline Context Preservation in Runners for Deployment of Kedro (on AWS Batch) HOT 1
- ci: Nightly build failure on `main` HOT 1
- (security) Fix Arbitrary File Write via Archive Extraction
- (security) Path Traversal HOT 2
- ci: Nightly build failure on `main` HOT 1
- ci: Nightly build failure on `develop`
- Revise cookiecutter.json in starters
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kedro.