Hey, Awesome project here! I have been wanting to get better with my

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Cloud Deployment Specs about aidungeon HOT 6 OPEN

latitudegames commented on August 28, 2024

Cloud Deployment Specs

from aidungeon.

Comments (6)

dyc3 commented on August 28, 2024

I'm assuming that you mean running the model as a service in the cloud.

Due to the shear magnitude of the GPT-2 model and the lack of multi-gpu support, I don't think it's feasible to do a cloud deployment like that. Hell, we had to start serving the model over torrents instead of a CDN download because it was too expensive. (See #41 )

The GPT-2 Large model requires 12 GB of vram in order to run on a gpu. So a cloud deployment would probably involve multiple instances of the application, all with their own gpu with at least 12 GB vram.

from aidungeon.

michaelsharpe commented on August 28, 2024

@dyc3 Thanks. Yeah, running it as a service. Having an API endpoint, or an AWS lambda, that can send a command to the model and receive a response.

I have been following some of the discussion about the size of the model and the issues with downloading it. Was curious what running requires. 12GB of vram is definitely cost prohibitive for running this on a server as an API. Running even one instance of that size 24/7 would range in the thousands of dollars a month.

Does it run decently on multicore CPU? I saw someone mentioning it not yet being optimized for CPU yet. Multicore CPU instances are definitely more affordable than instances with dedicated vram.

Any idea what is currently being considered for deploying and scaling this?

Thanks!

from aidungeon.

dyc3 commented on August 28, 2024

I've run it locally on a Ryzen 7 2700X. It's significantly slower that running it on a GPU.

from aidungeon.

ethanspitz commented on August 28, 2024

I'm running think on a SkySilk VPS and wrote a discord bot wrapper (so people in the server can interactively play AI Dungeon 2 together), and it works fine on a 2VCPU server with 8GB of RAM. It's definitely not fast though. Each response generally takes somewhere between 1-2 minutes, but it gets the job done for what I'm using it for.

from aidungeon.

fumbleforce commented on August 28, 2024

I have a colleague running it locally with an 24 thread AMD cpu, and he gets a result in a few seconds, definitely playable at that point. Running it on my 8700k, I get a reply in about 40-50 secs. It uses about 8-10 gig of RAM depending on the length of the story.

from aidungeon.

michaelsharpe commented on August 28, 2024

@ethanspitz That is good to know. I have been told that they are working on a multicore CPU solution, which should open up scalable deployment solutions.

@JorgeER That is good to hear. For now, I would be happy with a response rate like that, even just for my development process before it would hit production. I wonder about the scalability though. Does each individual story get stored in RAM, ie does each user require 8-10gigs of ram to run? That would be a little crazy.

Thanks again for the responses! Very helpful for me.

from aidungeon.

Cloud Deployment Specs about aidungeon HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent