Comments (6)
I'm assuming that you mean running the model as a service in the cloud.
Due to the shear magnitude of the GPT-2 model and the lack of multi-gpu support, I don't think it's feasible to do a cloud deployment like that. Hell, we had to start serving the model over torrents instead of a CDN download because it was too expensive. (See #41 )
The GPT-2 Large model requires 12 GB of vram in order to run on a gpu. So a cloud deployment would probably involve multiple instances of the application, all with their own gpu with at least 12 GB vram.
from aidungeon.
@dyc3 Thanks. Yeah, running it as a service. Having an API endpoint, or an AWS lambda, that can send a command to the model and receive a response.
I have been following some of the discussion about the size of the model and the issues with downloading it. Was curious what running requires. 12GB of vram is definitely cost prohibitive for running this on a server as an API. Running even one instance of that size 24/7 would range in the thousands of dollars a month.
Does it run decently on multicore CPU? I saw someone mentioning it not yet being optimized for CPU yet. Multicore CPU instances are definitely more affordable than instances with dedicated vram.
Any idea what is currently being considered for deploying and scaling this?
Thanks!
from aidungeon.
I've run it locally on a Ryzen 7 2700X. It's significantly slower that running it on a GPU.
from aidungeon.
I'm running think on a SkySilk VPS and wrote a discord bot wrapper (so people in the server can interactively play AI Dungeon 2 together), and it works fine on a 2VCPU server with 8GB of RAM. It's definitely not fast though. Each response generally takes somewhere between 1-2 minutes, but it gets the job done for what I'm using it for.
from aidungeon.
I have a colleague running it locally with an 24 thread AMD cpu, and he gets a result in a few seconds, definitely playable at that point. Running it on my 8700k, I get a reply in about 40-50 secs. It uses about 8-10 gig of RAM depending on the length of the story.
from aidungeon.
@ethanspitz That is good to know. I have been told that they are working on a multicore CPU solution, which should open up scalable deployment solutions.
@JorgeER That is good to hear. For now, I would be happy with a response rate like that, even just for my development process before it would hit production. I wonder about the scalability though. Does each individual story get stored in RAM, ie does each user require 8-10gigs of ram to run? That would be a little crazy.
Thanks again for the responses! Very helpful for me.
from aidungeon.
Related Issues (20)
- [FEAT] Add /forget, /update commands HOT 4
- [BUG] Stories disappear too quickly
- [BUG] When switching to another app on Android, game restarts HOT 4
- [BUG] Use balking retries to handle "The AI is a little overloaded"
- [BUG] Using back button on web browser logs me out HOT 1
- [BUG] Buffer overflow when text+remember_text gets too long? HOT 1
- [FEAT] Translation HOT 2
- [FEAT] Add sorting the search results HOT 3
- [BUG] AI never uses double quotes, even when direct dialog is enabled HOT 3
- [BUG] Starting custom prompts with capital letters clears input
- [Q/A]
- [BUG] White screen HOT 1
- [Q/A] Why does the subscriptions websocket returns the entire story text every time? Why not just the diff? HOT 1
- [BUG] Bottom bar on phone overlaps the button
- [DOC] re-training on custom data q HOT 1
- [FEAT] Solve the input and output confusion and other functional requirements of multiplayer mode.
- [FEAT] Add a Pin/Favourite button to Stories
- [BUG] Quests disappear after playing a few turns HOT 1
- [FEAT] nsfw toggle in Explore
- [BUG] Flickering pixel column on mobile app HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aidungeon.