I have several thousand runs in a project, and I'd like to download all their historie

[Q] How to download a lot of histories? about wandb HOT 3 OPEN

mbacvanski commented on July 18, 2024

[Q] How to download a lot of histories?

from wandb.

Comments (3)

exalate-issue-sync commented on July 18, 2024

Jason Davenport commented:
When dealing with a large number of runs, running into rate limits (HTTP 429 errors) is a common issue. Here are some strategies to handle this more efficiently:

Batch Requests: Instead of querying histories sequentially, use batching to minimize the number of API calls.
Retry Logic with Exponential Backoff: Implement a retry mechanism that waits for progressively longer periods before retrying a request.
Throttle Requests: Implement a throttle mechanism to ensure you stay within the API rate limits.

Here’s an example implementation:

import wandbimport timeimport pandas as pdfrom wandb.apis.public import Apifrom requests.exceptions import HTTPErrorInitialize W&B APIapi = Api()Function to fetch history of a single run with retries and exponential backoffdef fetch_run_history(run, max_retries=5, backoff_factor=1):for attempt in range(max_retries):try:return run.history()except HTTPError as e:if e.response.status_code == 429:# Too many requests, wait before retryingwait = backoff_factor * (2 ** attempt)print(f"Rate limit exceeded. Retrying in {wait} seconds…")time.sleep(wait)else:raise eraise Exception("Max retries exceeded")Function to fetch histories of all runs in a projectdef fetch_all_histories(project_name, max_retries=5, backoff_factor=1, batch_size=100):runs = api.runs(project_name)histories = []for i in range(0, len(runs), batch_size): batch = runs[i:i + batch_size] for run in batch: try: history = fetch_run_history(run, max_retries, backoff_factor) histories.append((run.name, history)) except Exception as e: print(f"Failed to fetch history for run {run.name}: {e}")return historiesFetch all histories for the projectproject_name = "your_project_name"histories = fetch_all_histories(project_name)Combine histories into a single DataFramecombined_histories = []for run_name, history in histories:history['run_name'] = run_namecombined_histories.append(history)df_combined = pd.concat(combined_histories, ignore_index=True)Save to CSV or handle as neededdf_combined.to_csv("combined_histories.csv", index=False)

The fetch_run_history function includes a retry mechanism with exponential backoff. If a rate limit error (HTTP 429) occurs, it waits for a progressively longer time before retrying. The fetch_all_histories function processes runs in batches to reduce the number of API calls made simultaneously. After fetching histories, they are combined into a single DataFrame.

This approach should help you download run histories more efficiently without excessively hitting API rate limits.

from wandb.

exalate-issue-sync commented on July 18, 2024

Jason Davenport commented:
Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

from wandb.

Recommend Projects

[Q] How to download a lot of histories? about wandb HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent