Giter VIP home page Giter VIP logo

Comments (6)

dyc3 avatar dyc3 commented on August 28, 2024

I'm assuming that you mean running the model as a service in the cloud.

Due to the shear magnitude of the GPT-2 model and the lack of multi-gpu support, I don't think it's feasible to do a cloud deployment like that. Hell, we had to start serving the model over torrents instead of a CDN download because it was too expensive. (See #41 )

The GPT-2 Large model requires 12 GB of vram in order to run on a gpu. So a cloud deployment would probably involve multiple instances of the application, all with their own gpu with at least 12 GB vram.

from aidungeon.

michaelsharpe avatar michaelsharpe commented on August 28, 2024

@dyc3 Thanks. Yeah, running it as a service. Having an API endpoint, or an AWS lambda, that can send a command to the model and receive a response.

I have been following some of the discussion about the size of the model and the issues with downloading it. Was curious what running requires. 12GB of vram is definitely cost prohibitive for running this on a server as an API. Running even one instance of that size 24/7 would range in the thousands of dollars a month.

Does it run decently on multicore CPU? I saw someone mentioning it not yet being optimized for CPU yet. Multicore CPU instances are definitely more affordable than instances with dedicated vram.

Any idea what is currently being considered for deploying and scaling this?

Thanks!

from aidungeon.

dyc3 avatar dyc3 commented on August 28, 2024

I've run it locally on a Ryzen 7 2700X. It's significantly slower that running it on a GPU.

from aidungeon.

ethanspitz avatar ethanspitz commented on August 28, 2024

I'm running think on a SkySilk VPS and wrote a discord bot wrapper (so people in the server can interactively play AI Dungeon 2 together), and it works fine on a 2VCPU server with 8GB of RAM. It's definitely not fast though. Each response generally takes somewhere between 1-2 minutes, but it gets the job done for what I'm using it for.

from aidungeon.

fumbleforce avatar fumbleforce commented on August 28, 2024

I have a colleague running it locally with an 24 thread AMD cpu, and he gets a result in a few seconds, definitely playable at that point. Running it on my 8700k, I get a reply in about 40-50 secs. It uses about 8-10 gig of RAM depending on the length of the story.

from aidungeon.

michaelsharpe avatar michaelsharpe commented on August 28, 2024

@ethanspitz That is good to know. I have been told that they are working on a multicore CPU solution, which should open up scalable deployment solutions.

@JorgeER That is good to hear. For now, I would be happy with a response rate like that, even just for my development process before it would hit production. I wonder about the scalability though. Does each individual story get stored in RAM, ie does each user require 8-10gigs of ram to run? That would be a little crazy.

Thanks again for the responses! Very helpful for me.

from aidungeon.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.