Giter VIP home page Giter VIP logo

Comments (15)

aguschin avatar aguschin commented on September 27, 2024 1

@omesser, thanks for the precise feedback!
print-index and print-state are purely technical commands. They're not supposed to be exposed to the user, so I hide them from gto --help.

from gto.

shcheklein avatar shcheklein commented on September 27, 2024 1

My 2cs on show btw. My initial reaction was to suggest to change it to list. show has a different meaning in git (show an object, and if you need to list things you would use different commands). Same in DVC - it's not about listing things.

from gto.

omesser avatar omesser commented on September 27, 2024 1

@shcheklein - wrt list/show - dvc list = dvc ls so that's more akin to linux's ls where it's implied you listing a well known central resource (fs-like entries, revs?). That's quite different than what we have here (displaying an internal object to the user) in my opinion.

git show - git show [<options>] [<object>...] so it shows info about different types of objects (but one specific object), and it defaults the object to HEAD commit. So gto show state|status|registry|whatever actually is similar in nature - describing single object, not listing a collection of objects, right? The problem is "audit" but if we treat audit is just a display option of our state/registry - it would make sense

Here is another pass of suggestions, let me know what you folks think:
gto show -> gto show env|state|status (whatever is decided on #68, we would also want gto show labels for the generic stuff probably)
gto print-state -> gto show registry -f yaml|json (because state can be ambiguous if we have status/state above)
gto audit [resource] -> gto show state --audit -ft <tabulate tableformats>

from gto.

aguschin avatar aguschin commented on September 27, 2024 1

I've implemented gto ls command (WIP): #98
Usage example

$ gto ls pytest-cache/test_api0
features
nn
rf

$ gto ls pytest-cache/test_api0 --json
[
    {
        "type": "dataset",
        "name": "features",
        "path": "datasets/features.csv",
        "virtual": true
    },
    {
        "type": "model",
        "name": "nn",
        "path": "models/neural-network.pkl",
        "virtual": false
    },
    {
        "type": "model",
        "name": "rf",
        "path": "models/random-forest.pkl",
        "virtual": false
    }
]

$ gto ls pytest-cache/test_api0 --table
╒═════════╤══════════╤═══════════════════════════╤═══════════╕
│ type    │ name     │ path                      │ virtual   │
╞═════════╪══════════╪═══════════════════════════╪═══════════╡
│ dataset │ features │ datasets/features.csv     │ True      │
│ model   │ nn       │ models/neural-network.pkl │ False     │
│ model   │ rf       │ models/random-forest.pkl  │ False     │
╘═════════╧══════════╧═══════════════════════════╧═══════════╛

$ gto ls

$ gto ls --table
No artifacts found

Note that gto show output looks different now - it has versions and promotions:

$ gto show
╒══════════════╤═══════════╤══════════════════╤═══════════════╕
│ name         │ version   │ env/production   │ env/staging   │
╞══════════════╪═══════════╪══════════════════╪═══════════════╡
│ nn           │ v0.0.1    │ -                │ v0.0.1        │
│ rf           │ v1.0.1    │ v1.0.0           │ v1.0.1        │
│ features     │ -         │ -                │ -             │
╘══════════════╧═══════════╧══════════════════╧═══════════════╛

One thing to note here is that gto ls . --rev HEAD will show the same list of artifacts gto show.

We can add latest versions and promotions to gto ls, but while it does make sense for --rev HEAD, it doesn't make sense for omitting --rev (because gto ls should read from artifacts.yaml directly and skip using Git), and doesn't make sense for --rev some-previous-commit, because at that moment latest version and what was promoted was different (so should we also parse repo and try to reconstruct those?)

I would be careful with implementing this, because the actual state of repo could be different. E.g. you may delete some git tags, branches, etc, so while we're can say for sure about what was there in some specific reference (git tag or commit), we can't tell what the whole repo state was at that moment.

And even if we implement this, then still pure gto ls won't have anything about latest versions and stages, comparing to gto ls --rev something. That could look strange a bit.

So, while we don't plan to implement "tell me the repo state as of revision REV" functionality, I would keep gto ls and gto show separate. That also make sense from the command groups:

  1. there is a group about managing index (add, rm, ls),
  2. there is a group about managing versions/promotions (register, promote, deprecate),
  3. there is a group that manages this together (show, audit, history).

@omesser @shcheklein Considering this, do you think ls and show commands should be separate?

from gto.

aguschin avatar aguschin commented on September 27, 2024 1

I've implemented one more command that can give us perspective on the question

$ gto ls-versions pytest-cache/test_api0 rf
v1.2.3
v1.2.4

$ gto ls-versions pytest-cache/test_api0 rf --json
[
    {
        "artifact": {
            "type": "model",
            "name": "rf",
            "path": "models/random-forest.pkl",
            "virtual": false
        },
        "name": "v1.2.3",
        "creation_date": "2022-03-29T17:16:58",
        "author": "Alexander Guschin",
        "commit_hexsha": "1166bf9eb1c9a2b70e72b3e4be2514e43adca225",
        "deprecated_date": null
    },
    {
        "artifact": {
            "type": "model",
            "name": "rf",
            "path": "models/random-forest.pkl",
            "virtual": false
        },
        "name": "v1.2.4",
        "creation_date": "2022-03-29T17:17:00",
        "author": "Alexander Guschin",
        "commit_hexsha": "78593535bba6e043feec53cb26caec9e47bd1641",
        "deprecated_date": null
    }
]

$ gto ls-versions pytest-cache/test_api0 rf --table
╒════════════╤════════╤═════════════════════╤═══════════════════╤══════════════════════════════════════════╤═══════════════════╕
│ artifact   │ name   │ creation_date       │ author            │ commit_hexsha                            │ deprecated_date   │
╞════════════╪════════╪═════════════════════╪═══════════════════╪══════════════════════════════════════════╪═══════════════════╡
│ rf         │ v1.2.3 │ 2022-03-29 17:16:58 │ Alexander Guschin │ 1166bf9eb1c9a2b70e72b3e4be2514e43adca225 │ -                 │
│ rf         │ v1.2.4 │ 2022-03-29 17:17:00 │ Alexander Guschin │ 78593535bba6e043feec53cb26caec9e47bd1641 │ -                 │
╘════════════╧════════╧═════════════════════╧═══════════════════╧══════════════════════════════════════════╧═══════════════════╛

from gto.

aguschin avatar aguschin commented on September 27, 2024 1

Summary of our discussion with @shcheklein and @omesser:

  1. Some commands act upon a single revision, others on the registry (repo). Probably (?) one command (e.g. show shouldn't act in some cases on a revision, in others on the whole repo.
  2. If ls is a technical command, it may be better to use not that well-known name and use name ls for show.
  3. It's good to be similar to git CLI experience, but the most important - don't be controversial with it.
  4. Group commands by human-friendly groups (4)
  • manage index:
    add, rm, ls
  • manage register/promotion:
    register, promote,
  • view the registry state:
    audit, history, show, latest, which, print-stages
  • helper commands for CI and downstream systems:
    check-ref

So far I've keep ls to print index, and show to print registry state. To print artifact versions, show myartifact. I'll think this over a bit later.

from gto.

dberenbaum avatar dberenbaum commented on September 27, 2024 1

To print artifact versions, show myartifact. I'll think this over a bit later.

In case it's helpful, my take is that ls and show make sense, but I find the various history commands confusing. The differences between gto audit and gto history aren't obvious to me, and the command names don't help differentiate them. From what I can tell, I would expect gto audit myartifact to print artifact versions (and promotions).

from gto.

aguschin avatar aguschin commented on September 27, 2024 1
  1. In #109 I've added command groups
  2. audit is removed and replaced by history
  3. ls is removed for now as user don't need to list of enrichments for now. In future we can add this to show somehow or introduce this command once again.

other questions discussed here are also taken into account, I believe. Therefore closing the issue. Thanks everyone for great discussion!

from gto.

aguschin avatar aguschin commented on September 27, 2024

But we definetely need a command to show the index at some ref like $ gto show index HEAD.
So maybe current $ gto show should be $ gto show state?

from gto.

omesser avatar omesser commented on September 27, 2024

@aguschin

"They're not supposed to be exposed to the user,"

But they are very useful, I'm not sure it's necessary to hide it here tbh.
The tool is already "low level" compared to mlem for example

from gto.

shcheklein avatar shcheklein commented on September 27, 2024

It's good (better), but we are sacrificing one of the most common commands (list all available models + some additional info) by making it a subcommand, right? From my git habits I would expect:

gto show instead of gto audit (if I understand correctly what audit does)
gto list instead of gto show

actually is similar in nature - describing single object, not listing a collection of objects, right?

That's not how I see it. We have a collection that we are listing. Why collection? Because, for example, we have commands like add, delete - etc. So, it's a list of objects. (granted, you could consider the whole list as a single item, but in this case by introducing add, etc we are already exposing its structure).

displaying an internal object to the user

I'm not sure I understand this tbh. What is the intention of the command (I mean gto show, how would we be describing it in the docs?). Can it be described like: "Prints the list of all registered models"? Or is there another command that serves that purpose?

from gto.

omesser avatar omesser commented on September 27, 2024

@shcheklein I agree with everything you've written, but I think we're talking about different commands.
Adding gto list artifacts for example is not replacing (IMO) the need to show index/state, which is what this issue was about originally

from gto.

shcheklein avatar shcheklein commented on September 27, 2024

Hmm, not sure.

Are we still talking about the gto show primarily?

the need to show index/state

yes, what I suggest is to use gto list for that and do not use gto show at all.

from gto.

aguschin avatar aguschin commented on September 27, 2024

I checked out how MLflow does this, and their list is similar to gto show - because they're not bound to Git and revisions.

from gto.

aguschin avatar aguschin commented on September 27, 2024

while it does make sense for --rev HEAD

Just realized that I'm wrong, it doesn't make sense. Because --rev is about a single commit, not about the whole repo. So, here's the controversy: gto ls must have --rev option. But at the same time, showing latest versions and what's promoted means that it's not about any revision in particular, it's about all the revisions in repo. => gto ls cannot show latest versions and what's promoted.

from gto.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.