Comments (5)
For the free Kolab K80 put a batch sise 8, 3-4 hours train, then breaks off. How would we continue?
from morpheus.
@Landers125 Great question!!! I was thinking about implementing state save so that training can be continued after the failure. It would be quite useful indeed.
Unfortunately, it is not very straightforward to do so atm it is not possible AFAIK. Sorry :(
I will definitely post an update to the implementation if I will ever do it...
For now, you can try using paperspace.com (they give like 6 hour free runtimes), use a smaller dataset/model size, and you can also find an inexpensive GPU plan or something like that...
Hope this answers your questions.
Alex
from morpheus.
@Landers125 Great question!!! I was thinking about implementing state save so that training can be continued after the failure. It would be quite useful indeed.
Unfortunately, it is not very straightforward to do so atm it is not possible AFAIK. Sorry :(
I will definitely post an update to the implementation if I will ever do it...
For now, you can try using paperspace.com (they give like 6 hour free runtimes), use a smaller dataset/model size, and you can also find an inexpensive GPU plan or something like that...
Hope this answers your questions.
Alex
number_of_batches = 14
Thanks! Kaggle has launched a training session.
You are doing a very useful thing!
from morpheus.
@Landers125 Thank you. I am happy that you enjoy my work. It means a lot to me :)
Yes, Kaggle and some other companies like paperspace offer GPU plans/free GPUs that are better than Google. I am happy you found a good solution for your needs.
Alex
from morpheus.
@Landers125 Btw, you can technically restart the training after failure by loading the last checkpoint and the original dataset.
You can even set the final learning rate in the training section of the code.
The problem is that it will start training from the beginning of the dataset which will be kinda redundant and not very effective.
I will look into it some more soon I hope and I will add it to the implementation if it will be possible.
from morpheus.
Related Issues (2)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from morpheus.