Comments (8)
I'm using Neptune 1.9.1 and Pytorch Lightning 1.9.5 with ModelCheckpoint
They use the version
property from NeptuneLogger
as in this line to construct the checkpoint path.
However, I also just realized that NeptuneLogger
is a wrapper class from Pytorch Lightning and not from Neptune.
Probably false alarm, thanks for your response. I think I need to look into their wrapper class to see where the bug comes from.
from neptune-client.
@SiddhantSadangi sorry for late reply, I've been busy the last few days upgrading my code to Lightning 2.2 which makes this issue no longer applicable to me as this happens with ModelCheckpoint in Lightning 1.9.x only.
Actually my workflow is the opposite, I disabled uploading model checkpoints to the cloud as my models are quite large. Also I don't need to store on the cloud because the checkpoints are mostly for local evaluation and deployment. Only configuration and results are needed to store on Neptune servers.
from neptune-client.
Perfect 🎉
Looks like the issue was on Lightning's end, not ours.
I am closing this thread, but please feel free to reach out if you need any further support 🤗
from neptune-client.
Hey @AlexTo 👋
Can you help me understand what the issue here is?
The folder created to store uploaded model checkpoints is always model/checkpoints
, and is not related to run_short_id
.
Also, the run ID is created only once the run has been initialized in the sync
/async
mode.
Is this not what you are expecting?
from neptune-client.
Thanks for the update!
As seen here, the path where model checkpoints are uploaded to Neptune is hardcoded to model/checkpoints
, so you should not be seeing the checkpoints being uploaded to the None
folder. Please let me know if this is the case though.
from neptune-client.
As mentioned above, I'm using ModelCheckpoint
so I guess it is a bit different. From the code snippet in my comment, here is how the ModelCheckpoint construct the checkpoint path
So, for me, the folder created is like this
.neptune/model_name/version_None/checkpoints
because trainer.loggers[0].version
which is NeptuneLogger.version
returns None
.
I'll debug the NeptuneLogger
in the next 1 or 2 days and report here
from neptune-client.
Oh, you are referring to the local folder, not the folder created in the Neptune web app! Sorry for the confusion.
Could you share a code snippet for me to reproduce the issue?
I'd preferably need the snippets where you initialize ModelCheckpoint
, NeptuneLogger
, and Trainer
from neptune-client.
Also, if you are syncing the runs with the Neptune servers, should it really matter where the models are saved locally pending upload?
Just curious
from neptune-client.
Related Issues (20)
- ZD745: Neptune synchronization throws Unauthorized error HOT 14
- Feature Request: axis formatting HOT 1
- Additional `development` model stage leve HOT 2
- Feature Request: Allow `startswith` & `endswith` filter types when browing tags HOT 1
- Feature Request: Disable neptune for local development HOT 5
- Feature Request: Inspect individual files in a FileSeries HOT 1
- BUG: cannot log metrics from different processes/threads to same run HOT 3
- BUG: GPL License Violation? HOT 3
- NPT-14150: Logging timestamps in milliseconds leads to no data getting logged for async mode and confusing `HTTPServiceUnavailable` error HOT 1
- BUG: NeptuneSSLVerificationError despite NEPTUNE_ALLOW_SELF_SIGNED_CERTIFICATE = True HOT 2
- NPT-14389: `.neptune` folder is not cleaned up if multiple PyTorch Dataloaders are used HOT 6
- BUG: kedro neptune init fail HOT 10
- Feature Request: Stop truncating text in project datasets HOT 6
- Huggingface Trainer closes run automatically after training HOT 3
- Feature Request: Display the actual run name on the tool tip while hovering over run link on left. HOT 2
- Add SECURITY.md HOT 1
- NPT-14719: Offline mode messes up plots HOT 19
- NPT-14525: Neptune reports "step must be strictly increasing" error if lightning logs in training and validation step HOT 1
- BUG: TypeError: neptune.metadata_containers.run.Run() got multiple values for keyword argument 'with_id' HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from neptune-client.