Hello, I am seeking to optimize the storage usage in my wandb accoun

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hey guys apologies you are seeing this behavior. To confirm, both <a class="user-menti

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

[Q] Questions about freeing up storage. about wandb HOT 16 CLOSED

nzl-thu commented on September 26, 2024

[Q] Questions about freeing up storage.

from wandb.

Comments (16)

Riccorl commented on September 26, 2024 3

Hey guys apologies you are seeing this behavior. To confirm, both @nzl-thu and @Adamits are deleting their data using the UI and @Riccorl is deleting it using the API?

@Riccorl, could you send me the toy script of you trying to delete your data?

Could you guys also all send me your username so I can look into your accounts as well, as potentially escalate this to our engineering team.

Here is the script I used:

import argparse
import wandb

if __name__ == "__main__":
    arg_parser = argparse.ArgumentParser()
    arg_parser.add_argument("project_name", type=str, help="Name of the project")
    arg_parser.add_argument(
        "--dry_run", action="store_true", help="If true, don't delete anything"
    )
    args = arg_parser.parse_args()

    dry_run = args.dry_run

    project_name = args.project_name

    api = wandb.Api(overrides={"project": project_name, "entity": "riccorl"})
    runs = api.runs(project_name)

    print("Deleting checkpoints and models in runs")
    for run in runs:
        if run.state != "finished":
            continue
        for f in run.files():
            if "ckpt" in f.name or "pt" in f.name or "hf_model" in f.name or "retriever" in f.name or "document_index" in f.name or "model" in f.name:
                print(f"DELETING {run.id}/{f.name}")
                if not dry_run:
                    f.delete()
                else:
                    print("DRY RUN: NOT DELETING")

    print("Deleting models in artifacts")
    project = api.project(project_name)
    for artifact_type in project.artifacts_types():
        for artifact_collection in artifact_type.collections():
            for version in api.artifacts(artifact_type.type, artifact_collection.name):
                if artifact_type.type == "model":
                    print(f"DELETING {version.name}")
                    if not dry_run:
                        version.delete(delete_aliases=True)
                    else:
                        print("DRY RUN: NOT DELETING")

Just FYI, I can see the prints ("DELETING ...") the first time I run the script on a project, but it doesn't print that line anymore after that.

from wandb.

ArtsiomWB commented on September 26, 2024 1

@Riccorl, apologies it's taking so long to resolve this, could you please write into [email protected], and I can potentially help you with that there? For us to talk privately about your account status?

from wandb.

Adamits commented on September 26, 2024

I am seeing exactly the same behavior. I would have thought that some caching mechanism could be causing this, but several days have passed and the app still says I have no more storage space.

This is a pretty large problem as it makes my account unusable.

from wandb.

nzl-thu commented on September 26, 2024

I am seeing exactly the same behavior. I would have thought that some caching mechanism could be causing this, but several days have passed and the app still says I have no more storage space.

This is a pretty large problem as it makes my account unusable.

Yes! This is frustrating...

from wandb.

Riccorl commented on September 26, 2024

I've been deleting files through the Python API for a week but still see no changes in the web UI. I want to access my runs at some point...

from wandb.

ArtsiomWB commented on September 26, 2024

Hey guys apologies you are seeing this behavior. To confirm, both @nzl-thu and @Adamits are deleting their data using the UI and @Riccorl is deleting it using the API?

@Riccorl, could you send me the toy script of you trying to delete your data?

Could you guys also all send me your username so I can look into your accounts as well, as potentially escalate this to our engineering team.

from wandb.

Adamits commented on September 26, 2024

Hi @ArtsiomWB

Mine seems to have started working somewhere in the last few hours. I still might have a script inadvertently synching model data, so I suspect I could need to mass delete artifacts again. In case it is still useful, my username is also adamits on W&B. My profile is at https://wandb.ai/adamits

Thanks!

from wandb.

nzl-thu commented on September 26, 2024

Hi @ArtsiomWB

My profile is at https://wandb.ai/thu-n.
Meanwhile, could you please answer my first question as well?

Thank you!

from wandb.

ArtsiomWB commented on September 26, 2024

Apologies for taking a long time to get back to you guys. Currently we are experiencing some unexpected behaviors regarding freeing up space, and we are sincerely sorry for the inconvenience. What happens right now is that after you free up your space the job gets added to the queue, and because of a very high number of people currently cleaning up their accounts, it takes longer than usual to update the storage that is displayed in the account.

Regarding @nzl-thu's question I just tried it out on my side and once you delete your run from that page, option a is the one that is happening:

a) the entire run is removed, including the logged data (e.g., training loss).
So no metrics or artifacts + media files are saved.

Since it has been sometime since I've gotten back to you guys, is everyone still seeing this behavior?

from wandb.

Riccorl commented on September 26, 2024

Thanks for the update!

Since it has been sometime since I've gotten back to you guys, is everyone still seeing this behavior?

Yes I still can't access my runs due to storage limits

from wandb.

nzl-thu commented on September 26, 2024

Hi @ArtsiomWB

Thank you for your response! In fact, a more urgent requirement for me is finding an efficient way to delete millions of saved images without impacting any logged data, such as training loss.

I initially considered using the web UI to quickly remove entire folders. However, since this approach also deletes logged data when removing a run folder, while iterating through all images using the Python API is frustratingly slow, I am now a little bit stucked.

Could you please suggest any possible solutions? Thank you!

from wandb.

ArtsiomWB commented on September 26, 2024

@Riccorl, looking at your code, to confirm you are trying to delete checkpoints in models in runs per a single project right?

from wandb.

ArtsiomWB commented on September 26, 2024

@nzl-thu , you could use a scrip like this:

import wandb

# Initialize the W&B API
api = wandb.Api()

# Replace <entity> with your actual entity name
entity = "<entity>"

# Define the file extensions you want to delete
image_extensions = [".png", ".jpg", ".jpeg", ".bmp", ".gif"]
media_extensions = [".mp4", ".mp3", ".wav", ".avi", ".mov"]
extensions_to_delete = image_extensions + media_extensions

# Iterate over all projects
for project in api.projects(entity):
    print(f"Processing project: {project.name}")
    
    # Iterate over all runs in the project
    for run in api.runs(f"{entity}/{project.name}"):
        print(f" - Processing run: {run.id}")
        
        # Get all files in the run
        files = run.files()
        
        # Delete files with the specified extensions
        for file in files:
            if any(file.name.endswith(ext) for ext in extensions_to_delete):
                print(f"   - Deleting file: {file.name}")
                file.delete()

Just be careful because it does go over every single project in your entity and delete all of the media files from it.

from wandb.

Riccorl commented on September 26, 2024

@Riccorl, looking at your code, to confirm you are trying to delete checkpoints in models in runs per a single project right?

Yep, I confirm

from wandb.

Riccorl commented on September 26, 2024

Given the current issues, isn't it possible to give run access in the meantime? I can't access my account for a month now.

from wandb.

Riccorl commented on September 26, 2024

@Riccorl, apologies it's taking so long to resolve this, could you please write into [email protected], and I can potentially help you with that there? For us to talk privately about your account status?

Sure, thanks for the help!

from wandb.

[Q] Questions about freeing up storage. about wandb HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent