Comments (15)
@omesser, thanks for the precise feedback!
print-index
and print-state
are purely technical commands. They're not supposed to be exposed to the user, so I hide them from gto --help
.
from gto.
My 2cs on show btw. My initial reaction was to suggest to change it to list
. show
has a different meaning in git (show an object, and if you need to list things you would use different commands). Same in DVC - it's not about listing things.
from gto.
@shcheklein - wrt list/show - dvc list
= dvc ls
so that's more akin to linux's ls
where it's implied you listing a well known central resource (fs-like entries, revs?). That's quite different than what we have here (displaying an internal object to the user) in my opinion.
git show - git show [<options>] [<object>...]
so it shows info about different types of objects (but one specific object), and it defaults the object to HEAD commit. So gto show state|status|registry|whatever
actually is similar in nature - describing single object, not listing a collection of objects, right? The problem is "audit" but if we treat audit is just a display option of our state/registry - it would make sense
Here is another pass of suggestions, let me know what you folks think:
gto show
-> gto show env|state|status
(whatever is decided on #68, we would also want gto show labels
for the generic stuff probably)
gto print-state
-> gto show registry -f yaml|json
(because state can be ambiguous if we have status
/state
above)
gto audit [resource]
-> gto show state --audit -ft <tabulate tableformats>
from gto.
I've implemented gto ls
command (WIP): #98
Usage example
$ gto ls pytest-cache/test_api0
features
nn
rf
$ gto ls pytest-cache/test_api0 --json
[
{
"type": "dataset",
"name": "features",
"path": "datasets/features.csv",
"virtual": true
},
{
"type": "model",
"name": "nn",
"path": "models/neural-network.pkl",
"virtual": false
},
{
"type": "model",
"name": "rf",
"path": "models/random-forest.pkl",
"virtual": false
}
]
$ gto ls pytest-cache/test_api0 --table
╒═════════╤══════════╤═══════════════════════════╤═══════════╕
│ type │ name │ path │ virtual │
╞═════════╪══════════╪═══════════════════════════╪═══════════╡
│ dataset │ features │ datasets/features.csv │ True │
│ model │ nn │ models/neural-network.pkl │ False │
│ model │ rf │ models/random-forest.pkl │ False │
╘═════════╧══════════╧═══════════════════════════╧═══════════╛
$ gto ls
$ gto ls --table
No artifacts found
Note that gto show
output looks different now - it has versions and promotions:
$ gto show
╒══════════════╤═══════════╤══════════════════╤═══════════════╕
│ name │ version │ env/production │ env/staging │
╞══════════════╪═══════════╪══════════════════╪═══════════════╡
│ nn │ v0.0.1 │ - │ v0.0.1 │
│ rf │ v1.0.1 │ v1.0.0 │ v1.0.1 │
│ features │ - │ - │ - │
╘══════════════╧═══════════╧══════════════════╧═══════════════╛
One thing to note here is that gto ls . --rev HEAD
will show the same list of artifacts gto show
.
We can add latest versions and promotions to gto ls
, but while it does make sense for --rev HEAD
, it doesn't make sense for omitting --rev
(because gto ls
should read from artifacts.yaml
directly and skip using Git), and doesn't make sense for --rev some-previous-commit
, because at that moment latest version and what was promoted was different (so should we also parse repo and try to reconstruct those?)
I would be careful with implementing this, because the actual state of repo could be different. E.g. you may delete some git tags, branches, etc, so while we're can say for sure about what was there in some specific reference (git tag or commit), we can't tell what the whole repo state was at that moment.
And even if we implement this, then still pure gto ls
won't have anything about latest versions and stages, comparing to gto ls --rev something
. That could look strange a bit.
So, while we don't plan to implement "tell me the repo state as of revision REV" functionality, I would keep gto ls
and gto show
separate. That also make sense from the command groups:
- there is a group about managing index (add, rm, ls),
- there is a group about managing versions/promotions (register, promote, deprecate),
- there is a group that manages this together (show, audit, history).
@omesser @shcheklein Considering this, do you think ls
and show
commands should be separate?
from gto.
I've implemented one more command that can give us perspective on the question
$ gto ls-versions pytest-cache/test_api0 rf
v1.2.3
v1.2.4
$ gto ls-versions pytest-cache/test_api0 rf --json
[
{
"artifact": {
"type": "model",
"name": "rf",
"path": "models/random-forest.pkl",
"virtual": false
},
"name": "v1.2.3",
"creation_date": "2022-03-29T17:16:58",
"author": "Alexander Guschin",
"commit_hexsha": "1166bf9eb1c9a2b70e72b3e4be2514e43adca225",
"deprecated_date": null
},
{
"artifact": {
"type": "model",
"name": "rf",
"path": "models/random-forest.pkl",
"virtual": false
},
"name": "v1.2.4",
"creation_date": "2022-03-29T17:17:00",
"author": "Alexander Guschin",
"commit_hexsha": "78593535bba6e043feec53cb26caec9e47bd1641",
"deprecated_date": null
}
]
$ gto ls-versions pytest-cache/test_api0 rf --table
╒════════════╤════════╤═════════════════════╤═══════════════════╤══════════════════════════════════════════╤═══════════════════╕
│ artifact │ name │ creation_date │ author │ commit_hexsha │ deprecated_date │
╞════════════╪════════╪═════════════════════╪═══════════════════╪══════════════════════════════════════════╪═══════════════════╡
│ rf │ v1.2.3 │ 2022-03-29 17:16:58 │ Alexander Guschin │ 1166bf9eb1c9a2b70e72b3e4be2514e43adca225 │ - │
│ rf │ v1.2.4 │ 2022-03-29 17:17:00 │ Alexander Guschin │ 78593535bba6e043feec53cb26caec9e47bd1641 │ - │
╘════════════╧════════╧═════════════════════╧═══════════════════╧══════════════════════════════════════════╧═══════════════════╛
from gto.
Summary of our discussion with @shcheklein and @omesser:
- Some commands act upon a single revision, others on the registry (repo). Probably (?) one command (e.g.
show
shouldn't act in some cases on a revision, in others on the whole repo. - If
ls
is a technical command, it may be better to use not that well-known name and use namels
forshow
. - It's good to be similar to git CLI experience, but the most important - don't be controversial with it.
- Group commands by human-friendly groups (4)
- manage index:
add
,rm
,ls
- manage register/promotion:
register
,promote
, - view the registry state:
audit
,history
,show
,latest
,which
,print-stages
- helper commands for CI and downstream systems:
check-ref
So far I've keep ls
to print index, and show
to print registry state. To print artifact versions, show myartifact
. I'll think this over a bit later.
from gto.
To print artifact versions,
show myartifact
. I'll think this over a bit later.
In case it's helpful, my take is that ls
and show
make sense, but I find the various history commands confusing. The differences between gto audit
and gto history
aren't obvious to me, and the command names don't help differentiate them. From what I can tell, I would expect gto audit myartifact
to print artifact versions (and promotions).
from gto.
- In #109 I've added command groups
audit
is removed and replaced byhistory
ls
is removed for now as user don't need to list of enrichments for now. In future we can add this toshow
somehow or introduce this command once again.
other questions discussed here are also taken into account, I believe. Therefore closing the issue. Thanks everyone for great discussion!
from gto.
But we definetely need a command to show the index at some ref like $ gto show index HEAD
.
So maybe current $ gto show
should be $ gto show state
?
from gto.
"They're not supposed to be exposed to the user,"
But they are very useful, I'm not sure it's necessary to hide it here tbh.
The tool is already "low level" compared to mlem
for example
from gto.
It's good (better), but we are sacrificing one of the most common commands (list all available models + some additional info) by making it a subcommand, right? From my git habits I would expect:
gto show
instead of gto audit
(if I understand correctly what audit does)
gto list
instead of gto show
actually is similar in nature - describing single object, not listing a collection of objects, right?
That's not how I see it. We have a collection that we are listing. Why collection? Because, for example, we have commands like add, delete - etc. So, it's a list of objects. (granted, you could consider the whole list as a single item, but in this case by introducing add, etc we are already exposing its structure).
displaying an internal object to the user
I'm not sure I understand this tbh. What is the intention of the command (I mean gto show
, how would we be describing it in the docs?). Can it be described like: "Prints the list of all registered models"? Or is there another command that serves that purpose?
from gto.
@shcheklein I agree with everything you've written, but I think we're talking about different commands.
Adding gto list artifacts
for example is not replacing (IMO) the need to show
index/state, which is what this issue was about originally
from gto.
Hmm, not sure.
Are we still talking about the gto show
primarily?
the need to show index/state
yes, what I suggest is to use gto list
for that and do not use gto show
at all.
from gto.
I checked out how MLflow does this, and their list
is similar to gto show
- because they're not bound to Git and revisions.
from gto.
while it does make sense for
--rev HEAD
Just realized that I'm wrong, it doesn't make sense. Because --rev
is about a single commit, not about the whole repo. So, here's the controversy: gto ls
must have --rev
option. But at the same time, showing latest versions and what's promoted means that it's not about any revision in particular, it's about all the revisions in repo. => gto ls
cannot show latest versions and what's promoted.
from gto.
Related Issues (20)
- Tag increment doesn't calculated right
- pydantic 2.0 release has breaking changes - cli crash
- Make "v" in "v1.2.3" optional HOT 2
- Bug: gto irresponsive HOT 5
- Sweep (slow): bump pylint to the latest version HOT 1
- Sweep: Update pylint version in setup.py HOT 1
- Sweep: Fix failing tests after pylint update HOT 1
- Sweep: Update code to comply with new pylint rules HOT 1
- Sweep: Remove unnecessary pylint skip comments HOT 1
- update GTO to be in line with iterative/py-template standards HOT 4
- Allow uppercase letters HOT 5
- gto should not push on register without --push flag. HOT 3
- Stale secret deletion HOT 1
- Bug: `--bump-...` falgs don't work HOT 2
- Feature proposal: `gto show` settings controlled by `.gto` config HOT 2
- Improve tests
- `deprecate`: why depracate an entire model/artifact? HOT 1
- bug: gto does not work with ssh-based repository urls HOT 4
- `gto show` returns tag names with a `\n` newline at the end.
- Inconsistent behaviour of GTO in Studio vs. CLI
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gto.