viperx7 / alpaca-turbo Goto Github PK

Web UI to run alpaca model locally

License: GNU Affero General Public License v3.0

Python 82.44% HTML 4.35% Dockerfile 0.54% Shell 0.03% TypeScript 8.05% CSS 4.58%

alpaca-turbo's Introduction

Alpaca-Turbo

Alpaca-Turbo is a frontend to use large language models that can be run locally without much setup required. It is a user-friendly web UI for the llama.cpp , with unique features that make it stand out from other implementations. The goal is to provide a seamless chat experience that is easy to configure and use, without sacrificing speed or functionality.

📝 Example views

demo.mp4

📦 Installation Steps

📺 Video Instructions

ToDo
ToDo

🐳 Using Docker (only Linux is supported with docker)

ToDo

🪟 Using Windows (standalone or miniconda) AND Mac M1/M2 (using miniconda)

For Windows users we have a oneclick standalone launcher - Alpaca-Turbo.exe.

Links for installing miniconda:
- Windows
- Mac M1/M2
  - Install for all users
  - Make sure to add c:\ProgramData\miniconda3\condabin to your environment variables
Download the latest alpaca-turbo.zip from the release page.
Extract Alpaca-Turbo.zip to Alpaca-Turbo

Make sure you have enough space for the models in the extracted location

Copy your alpaca models to alpaca-turbo/models/ directory.
Open cmd as Admin and type
```
conda init
```
close that window

open a new cmd window in your Alpaca-Turbo dir and type

conda create -n alpaca_turbo python=3.10 -y
conda activate alpaca_turbo
pip install -r requirements.txt
python app.py

ready to interact

Directly installing with Pip

just get the latest release unzip and then run

pip install -r requirements.txt
python app.py

💁 Contributing

As an open source project in a rapidly developing field, I am open to contributions, whether it be in the form of a new feature, improved infra, or better documentation.

For detailed information on how to contribute.

🙌 Credits

ggerganov/LLaMA.cpp For their amazing cpp library
antimatter15/alpaca.cpp For initial versions of their chat app
cocktailpeanut/dalai For the Inspiration
MetaAI for the LLaMA models
Stanford for the alpaca models

🌟 History

alpaca-turbo's People

Contributors

Stargazers

Watchers

Forkers

beqa777 atorresmi nisaaragharia henkster72 soumyakants4 rowbot1 sotokisehiro mindrages alexanderatallah b1sounours pratik-behera razor950 jonmatthis mravtechinfo awmalka nemtos jblane1000 rohanmuz2 birdup000 melodysdreamj nasterfy digitspinner rafaelestevamreis ptonewreckin cossackx msgpo ehottak shuidong conerwei samni728 etherealheim oblik-io bnodnarb shangobashi khantmgmg rocarodev93 plqminh timurista frontenda nkaligin ken88ling techthiyanes deniska83 kfdslsope crowlsyong kandy22 janzxgit techventurebuilder lyrl tanawootc jesusoctavioas bendeguzszkalka octag0no fblgit kavasdavas quanmeta guidolbrrcn hhy5277 nick777-pixel utkarshx lucoo01 frostkiwi deceptivescorpion phanindhraios chrissc6 jmolinahn kennethjefferson alex-vonallmen eudaimoniatech lucheezou lightningralf vu2lid heyimkyu wandernei itsvrl eaagteam talesmousinho ekmixon metantonio razaci openeli kishan-sin1 charleneleong-ai webclinic017 jfontestad max-nderitu moustapha00 sky-dust-intelligence oscar810429 bitewell alitrack diegotluz hoscousa regholl2023

alpaca-turbo's Issues

Why running executable binaries?

This app, which is supposed to be a Flask app, is trying to run an executable binary named "main". It has versions for Windows (main.exe) and Linux. This seems suspect. Could you elaborate on why such a file is in the zipped distribution? Why are there no instructions to install it with git?

wich model i should download?

Support reverse prompting (-r flag in llama.cpp)

Particularly for Vicuna you need to be able to set the reverse prompt so it doesn't continue blathering on after it's already answered you. Default llama.cpp instruct mode (-ins) uses "### Instruction:
" as reverse prompt, but Vicuna needs "### Human:".

I'm not sure that reverse prompts are implemented at all right now, please correct if wrong and I'm not using it right. If author is not interested in doing it, I will have 3 weeks of free time coming up and I'd be more than happy to do a PR when I can dig into it.

Localhost Only?

Trying to figure out how to use this without it launching for anyone online to use.

Prompts not fully working

So i was playing with prompts and seems like if i add one inside the prompts.json it ignore all. It works if i change the prompt inside alpaca_turbo.py tho

Copied the release Alpaca-Turbo-beta_v0.6

copied all files from Alpaca-Turbo-beta_v0.6 to Alpaca-Turbo
downloaded ggml-vicuna-13b-1.1 to Alpaca-Turbo/models folder inside have (blobs,refs,snapshots) files
also tried download ggml-alpaca-7b-q4.bin into models folder as well.
run python api.py

[on the web localhost:7887 ]
Ask the bot:
aaa
Submit
Bot response:

History:

console
127.0.0.1 - - [15/Apr/2023 16:13:49] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [15/Apr/2023 16:13:58] "POST /completions HTTP/1.1" 405 -< upon click submit error
also didn't have the select model page.
tried run in gradio_impl folder, also tried run alpaca_turbo.py. found downloadH and fixed. but still have above error 405.
Noticed , it is much harder to get it running on Window compared on ubuntu (only couple of tried and fix can run)
Thank for the guide.

How to load leaked LLaMA weights?

Hi,

Your project looks very promising. I'm curious how I can leverage the leaked LLaMA weights with Alpaca-Turbo, specifically the 65B model. Does anyone have any idea what the correct process is?

I have the following files:

tokenizer_checklist.chk
tokenizer.model
65B/params.json
65B/checklist.chk
65B/consolidated.0X.pth (8 consolidated .pth files, numbered 00-07)

Thanks in advance,

Minor response issues

It might just be the nature of the model but based on the initial Dalai release they had similar issues with random characters. Here it is similar when the thread gets longer and the history grows there seems to be a tendancy for random incoherant charecters to appear

Additionally take note at the ignored hello message initially.

Minor UI/UX optimization

Scaling / zoom is cumbersom. The chat box is abnormally large. Zoom to 70% is needed so the screen can properly display all elements.

UI is weird in dark mode

as title says the bottom bar is white when in dark mode :

Not able to locate model

Error Saving Settings
Can't locate the model @ /Users/nikunj.goel/dalai/alpaca/models/7B/ggml-model-q4_0.bin
Set the model path in settings
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
starting with ['bin/mac', '--seed', '888777', '-t', '4', '--top_k', '40', '--top_p', '0.9', '--repeat_last_n', '64', '--repeat_penalty', '1.3', '-m', '/Users/nikunj.goel/dalai/alpaca/models/7B/', '--interactive-start']
Traceback (most recent call last):
  File "/Users/nikunj.goel/anaconda3/lib/python3.8/site-packages/gradio/routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "/Users/nikunj.goel/anaconda3/lib/python3.8/site-packages/gradio/blocks.py", line 1069, in process_api
    result = await self.call_function(
  File "/Users/nikunj.goel/anaconda3/lib/python3.8/site-packages/gradio/blocks.py", line 892, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/Users/nikunj.goel/anaconda3/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/Users/nikunj.goel/anaconda3/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/Users/nikunj.goel/anaconda3/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/Users/nikunj.goel/anaconda3/lib/python3.8/site-packages/gradio/utils.py", line 549, in async_iteration
    return next(iterator)
  File "webui.py", line 52, in bot
    for out in resp:
  File "/Users/nikunj.goel/Alpaca-Turbo/alpaca_turbo.py", line 213, in ask_bot
    _ = self.prep_model() if not self.is_ready else None
  File "/Users/nikunj.goel/Alpaca-Turbo/alpaca_turbo.py", line 202, in prep_model
    self.program = process(self.command, timeout=600)
  File "/Users/nikunj.goel/Alpaca-Turbo/interact.py", line 26, in __init__
    super().__init__(
  File "/Users/nikunj.goel/anaconda3/lib/python3.8/site-packages/pexpect/popen_spawn.py", line 53, in __init__
    self.proc = subprocess.Popen(cmd, **kwargs)
  File "/Users/nikunj.goel/anaconda3/lib/python3.8/subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/Users/nikunj.goel/anaconda3/lib/python3.8/subprocess.py", line 1704, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 86] Bad CPU type in executable: 'bin/mac'

I can confirm that model is at that location.

running with GPU

Hi, is it possible to run the program using the GPU? It would be a very nice feature.

support google colab

can you make a google colab version ? i don't have a good pc to install the Alpaca Turbo

Alpaca Install EXE detected as a trojan

Windows defender points out that the one click install contains "win32/Wacatac.B!ml"

No models shown in the 'Choose a model' dropdown

When I access the server from http://127.0.0.1:5000 I can see all the models in the choose a model dropdown.

When I access the server from another device on my network, e.g from http://192.168.0.8:5000, I cannot see any models in the dropdown.

However, when I access http://192.168.0.8:5000/list_models all the models are listed AND http://192.168.0.8:5000/load_model/4 (the model I want to use sends back a success response - but no model appears to be loaded...

Where to download models?

title says...

character encoding of non-english characters seem to be faulty

while using the "translator" mode, translating some english into german, i noticed that german special characters (ö, ä, ü, ß) show up "garbled" in the output of the UI.

for example, translating the following short text taken from a cnn.com article:

The State Emergency Service for the Kyiv region has told CNN that the number of people killed in a Russian drone strike that hit a residential building in the town of Rzhyshchiv Tuesday night has risen to seven.

results in this garbled output:

Der Staatliche Notfalldienst fÃ¼r die Region Kiew hat CNN mitgeteilt, dass der Anzahl an Menschen getÃ¶tet durch einen russischen Drohnenangriff auf ein WohngebÃ¤ude in dem Ort Rzhyshchiw gestern Abend auf sieben angestiegen ist.

the correct display would be:

Der Staatliche Notfalldienst für die Region Kiew hat CNN mitgeteilt, dass der Anzahl an Menschen getötet durch einen russischen Drohnenangriff auf ein Wohngebäude in dem Ort Rzhyshchiw gestern Abend auf sieben angestiegen ist.

it's probably some character encoding related issue. the "garbled" presentation looks to me like a UTF8 encoded string that is not decoded using UTF8 but some other encoding, maybe ANSI (because i'm on a german Windows machine and maybe the UI uses some sort of system default if not explicitly told what encoding to use?)

No model is shown in web-ui with docker

OS: 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
Docker version:

Client:
 Version:           20.10.5+dfsg1
 API version:       1.41
 Go version:        go1.15.15
 Git commit:        55c4c88
 Built:             Mon May 30 18:34:49 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.5+dfsg1
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.15.15
  Git commit:       363e9a8
  Built:            Mon May 30 18:34:49 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.13~ds1
  GitCommit:        1.4.13~ds1-1~deb11u3
 runc:
  Version:          1.0.0~rc93+ds1
  GitCommit:        1.0.0~rc93+ds1-5+deb11u2
 docker-init:
  Version:          0.19.0
  GitCommit:

I've followed the installation guide for docker, and tried adding the following models in the models directory:

But in the webui no model is shown
Image webui

No error log with docker logs

Fade text in? (Suggestion)

It would be nice to have the text fade in as it is processing? Unless multiple scripts are being written and the best is chosen. Otherwise waiting on a blank loading screen is significantly less engaging than watching the words appear.

On mac: wrong architecture?

I'm getting a couple of errors off the get go when trying to run on Mac:
First warns on start:

❯ python webui.py
/Users/lanski/Documents/Code/alpacaGPT/Alpaca-Turbo/venv/lib/python3.9/site-packages/gradio/utils.py:951: UserWarning: Expected 1 arguments for function <bound method ChatBotUI.on_select of <UI.ChatBotUI object at 0x10ad2ed60>>, received 0.
  warnings.warn(
/Users/lanski/Documents/Code/alpacaGPT/Alpaca-Turbo/venv/lib/python3.9/site-packages/gradio/utils.py:955: UserWarning: Expected at least 1 arguments for function <bound method ChatBotUI.on_select of <UI.ChatBotUI object at 0x10ad2ed60>>, received 0.
  warnings.warn(
/Users/lanski/Documents/Code/alpacaGPT/Alpaca-Turbo/venv/lib/python3.9/site-packages/gradio/utils.py:951: UserWarning: Expected 1 arguments for function <bound method ChatBotUI.on_select of <UI.PromptPlayUI object at 0x1230ac0d0>>, received 0.
  warnings.warn(
/Users/lanski/Documents/Code/alpacaGPT/Alpaca-Turbo/venv/lib/python3.9/site-packages/gradio/utils.py:955: UserWarning: Expected at least 1 arguments for function <bound method ChatBotUI.on_select of <UI.PromptPlayUI object at 0x1230ac0d0>>, received 0.
  warnings.warn(
Running on local URL:  http://127.0.0.1:7860

And then error directly when prompting anything:

starting with ['bin/mac', '--seed', '888777', '-t', '4', '--top_k', '40', '--top_p', '0.9', '--repeat_last_n', '64', '--repeat_penalty', '1.3', '-m', '/Users/lanski/dalai/alpaca/models/7B/ggml-model-q4_0.bin', 
'--interactive-start']
Traceback (most recent call last):
  File "/Users/lanski/Documents/Code/alpacaGPT/Alpaca-Turbo/venv/lib/python3.9/site-packages/gradio/routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "/Users/lanski/Documents/Code/alpacaGPT/Alpaca-Turbo/venv/lib/python3.9/site-packages/gradio/blocks.py", line 1069, in process_api
    result = await self.call_function(
  File "/Users/lanski/Documents/Code/alpacaGPT/Alpaca-Turbo/venv/lib/python3.9/site-packages/gradio/blocks.py", line 892, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/Users/lanski/Documents/Code/alpacaGPT/Alpaca-Turbo/venv/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/Users/lanski/Documents/Code/alpacaGPT/Alpaca-Turbo/venv/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/Users/lanski/Documents/Code/alpacaGPT/Alpaca-Turbo/venv/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/Users/lanski/Documents/Code/alpacaGPT/Alpaca-Turbo/venv/lib/python3.9/site-packages/gradio/utils.py", line 549, in async_iteration
    return next(iterator)
  File "/Users/lanski/Documents/Code/alpacaGPT/Alpaca-Turbo/UI.py", line 115, in bot
    for out in resp:
  File "/Users/lanski/Documents/Code/alpacaGPT/Alpaca-Turbo/alpaca_turbo.py", line 198, in ask_bot
    _ = self.prep_model() if not self.is_ready else None
  File "/Users/lanski/Documents/Code/alpacaGPT/Alpaca-Turbo/alpaca_turbo.py", line 178, in prep_model
    self.program = process(self.command, timeout=600)
  File "/Users/lanski/Documents/Code/alpacaGPT/Alpaca-Turbo/interact.py", line 26, in __init__
    super().__init__(
  File "/Users/lanski/Documents/Code/alpacaGPT/Alpaca-Turbo/venv/lib/python3.9/site-packages/pexpect/popen_spawn.py", line 53, in __init__
    self.proc = subprocess.Popen(cmd, **kwargs)
  File "/usr/local/Cellar/[email protected]/3.9.16/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/local/Cellar/[email protected]/3.9.16/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 1821, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 86] Bad CPU type in executable: 'bin/mac'

I asked chatGPT and his response was:

you can ignore the warns
However, the second error is more critical and indicates that there is a problem with the subprocess.Popen() function call. Specifically, the error message indicates that the CPU type in the executable is bad, which suggests that the program is trying to execute a binary file that was compiled for a different CPU architecture.

Based on the error message, it seems that the program is trying to execute the file bin/mac which is likely a compiled binary file for macOS. However, the CPU architecture of your machine might not be compatible with this binary file. One possible solution is to recompile the program from source code on your machine, which should generate a binary file that is compatible with your CPU architecture. Alternatively, you can try to obtain a compatible binary file for your machine or modify the program to use a different binary file that is compatible with your machine's CPU architecture.

My mac is an Intel. Is the mac architecture for M1/2?

Error while starting the webui

Running "python webui.py" gives the following error:

Traceback (most recent call last):
File "E:\Alpaca-Turbo\webui.py", line 13, in
gptui = ChatBotUI(ASSISTANT)
File "E:\Alpaca-Turbo\UI.py", line 24, in init
self.personas = Personas("./prompts.json")
File "E:\Alpaca-Turbo\prompts.py", line 14, in init
self.load()
File "E:\Alpaca-Turbo\prompts.py", line 18, in load
self.bots = json.load(f)
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\json_init.py", line 293, in load
return loads(fp.read(),
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\encodings\cp1251.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 452: character maps to

Stuck at loading?

can't load the settings file continuing with defaults
Loading Model ━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7% -:--:--

Clarify Open Source License

Can you clarify under what license this code is provided?😃
Thanks

win.exe crash at some point. web-ui show endless "thinking"

At first I used 30B model, and think that is is because of not enough RAM.
But 7b model crash too.

Problem installing

Installation on Ubuntu 22.04

Install runs without errors, but when accessing the interface there's no reaction to 'hi', the webui.py script gives the following console output.

Any idea what's wrong?

[]
[]
[]
/home/alpaca/.local/lib/python3.10/site-packages/gradio/utils.py:951: UserWarning: Expected 1 arguments for function <bound method ChatBotUI.opast_chat_select of <UI.ChatBotUI object at 0x7ff58723ad10>>, received 0.
  warnings.warn(
/home/alpaca/.local/lib/python3.10/site-packages/gradio/utils.py:955: UserWarning: Expected at least 1 arguments for function <bound method ChatBotUI.opast_chat_select of <UI.ChatBotUI object at 0x7ff58723ad10>>, received 0.
  warnings.warn(
/home/alpaca/.local/lib/python3.10/site-packages/gradio/utils.py:951: UserWarning: Expected 1 arguments for function <bound method ChatBotUI.opast_chat_select of <UI.PromptPlayUI object at 0x7ff587260c10>>, received 0.
  warnings.warn(
/home/alpaca/.local/lib/python3.10/site-packages/gradio/utils.py:955: UserWarning: Expected at least 1 arguments for function <bound method ChatBotUI.opast_chat_select of <UI.PromptPlayUI object at 0x7ff587260c10>>, received 0.
  warnings.warn(
Running on local URL:  http://127.0.0.1:7860
Running on public URL: https://3932fadb3f87e656d9.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
[]
Set the model path in settings
Traceback (most recent call last):
  File "/home/alpaca/.local/lib/python3.10/site-packages/gradio/routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/alpaca/.local/lib/python3.10/site-packages/gradio/blocks.py", line 1069, in process_api
    result = await self.call_function(
  File "/home/alpaca/.local/lib/python3.10/site-packages/gradio/blocks.py", line 892, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/alpaca/.local/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/alpaca/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/alpaca/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/alpaca/.local/lib/python3.10/site-packages/gradio/utils.py", line 549, in async_iteration
    return next(iterator)
  File "/home/alpaca/Alpaca-Turbo/UI.py", line 150, in bot
    for out in resp:
  File "/home/alpaca/Alpaca-Turbo/alpaca_turbo.py", line 233, in ask_bot
    for char in opt_stream:
  File "/home/alpaca/Alpaca-Turbo/alpaca_turbo.py", line 209, in streamer
    self.program.recvuntil(">")
AttributeError: 'Assistant' object has no attribute 'program'

TemplateNotFound jinja2.exceptions.TemplateNotFound: index.html

in Window Server 2019 environment, running as admin.

File "C:\ProgramData\miniconda3\envs\alpaca_turbo\lib\site-packages\flask\app.py", line 2551, in call
return self.wsgi_app(environ, start_response)
File "C:\ProgramData\miniconda3\envs\alpaca_turbo\lib\site-packages\flask_socketio_init_.py", line 43, in call
return super(_SocketIOMiddleware, self).call(environ,
File "C:\ProgramData\miniconda3\envs\alpaca_turbo\lib\site-packages\engineio\middleware.py", line 74, in call
return self.wsgi_app(environ, start_response)
File "C:\ProgramData\miniconda3\envs\alpaca_turbo\lib\site-packages\flask\app.py", line 2531, in wsgi_app
response = self.handle_exception(e)
File "C:\ProgramData\miniconda3\envs\alpaca_turbo\lib\site-packages\flask_cors\extension.py", line 165, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "C:\ProgramData\miniconda3\envs\alpaca_turbo\lib\site-packages\flask\app.py", line 2528, in wsgi_app
response = self.full_dispatch_request()
File "C:\ProgramData\miniconda3\envs\alpaca_turbo\lib\site-packages\flask\app.py", line 1825, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\ProgramData\miniconda3\envs\alpaca_turbo\lib\site-packages\flask_cors\extension.py", line 165, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "C:\ProgramData\miniconda3\envs\alpaca_turbo\lib\site-packages\flask\app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
File "C:\ProgramData\miniconda3\envs\alpaca_turbo\lib\site-packages\flask\app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "C:\Users\Administrator\Alpaca-Turbo\api.py", line 234, in index
return render_template("index.html")
File "C:\ProgramData\miniconda3\envs\alpaca_turbo\lib\site-packages\flask\templating.py", line 146, in render_template
template = app.jinja_env.get_or_select_template(template_name_or_list)
File "C:\ProgramData\miniconda3\envs\alpaca_turbo\lib\site-packages\jinja2\environment.py", line 1081, in get_or_select_template
return self.get_template(template_name_or_list, parent, globals)
File "C:\ProgramData\miniconda3\envs\alpaca_turbo\lib\site-packages\jinja2\environment.py", line 1010, in get_template
return self._load_template(name, globals)
File "C:\ProgramData\miniconda3\envs\alpaca_turbo\lib\site-packages\jinja2\environment.py", line 969, in _load_template
template = self.loader.load(self, name, self.make_globals(globals))
File "C:\ProgramData\miniconda3\envs\alpaca_turbo\lib\site-packages\jinja2\loaders.py", line 126, in load
source, filename, uptodate = self.get_source(environment, name)
File "C:\ProgramData\miniconda3\envs\alpaca_turbo\lib\site-packages\flask\templating.py", line 62, in get_source
return self._get_source_fast(environment, template)
File "C:\ProgramData\miniconda3\envs\alpaca_turbo\lib\site-packages\flask\templating.py", line 98, in _get_source_fast
raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: index.html
The debugger caught an exception in your WSGI application. You can now look at the traceback which led to the error.
To switch between the interactive traceback and the plaintext one, you can click on the "Traceback" headline. From the text traceback you can also create a paste of it. For code execution mouse-over the frame you want to debug and click on the console icon on the right side.

You can execute arbitrary Python code in the stack frames and there are some extra helpers available for introspection:

dump() shows all variables in the frame
dump(obj) dumps all that's known about the object

When win.exe finished (cpu load is 0%) python webui.py is at 100% one thread and text appears slowly

When win.exe finished (cpu load is 0%) python webui is 100% and text appear slowly.

Also python.exe webui.py even after answer, in idle mode takes othe thread 100%

Error Loading model

CONTEXT:
I'm running in zorinOS(an ubuntu spinoff but what isn't these days..) installation was successful and the web ui is responsive on 127.1:7887

I've downloaded: https://huggingface.co/lmsys/vicuna-13b-delta-v1.1/blob/main/pytorch_model-00001-of-00003.bin
and copied the .bin to the appropriate folder, which shows in the gui under "Load model"

When I click submit I receive an error in terminal:

alpaca_1  | [
alpaca_1  |     '/main',
alpaca_1  |     '-i',
alpaca_1  |     '--seed',
alpaca_1  |     '888777',
alpaca_1  |     '-ins',
alpaca_1  |     '-t',
alpaca_1  |     '4',
alpaca_1  |     '-b',
alpaca_1  |     '256',
alpaca_1  |     '--top_k',
alpaca_1  |     '200',
alpaca_1  |     '--top_p',
alpaca_1  |     '0.99',
alpaca_1  |     '--repeat_last_n',
alpaca_1  |     '512',
alpaca_1  |     '--repeat_penalty',
alpaca_1  |     '1',
alpaca_1  |     '--temp',
alpaca_1  |     '0.7',
alpaca_1  |     '--n_predict',
alpaca_1  |     '1000',
alpaca_1  |     '-m',
alpaca_1  |     'models/pytorch_model-00001-of-00003.bin',
alpaca_1  |     '--interactive-first'
alpaca_1  | ]
alpaca_1  | ERRoRERRoRERRoRERRoRERRoRERRoRERRoRERRoRERRoRERRoRERRoRERRoRERRoRERRoRERRoRERRoRERRoRERRoRERRoRERRoR

any feedback or insights would be greatly appreciated please, or an alternate model for me to attempt please

can't not load model..

wait 1 hours, but did not load model..

in windows10

Interpreting HTML

Hello! I was looking to have my AI write HTML code for me and it appears that Alpaca-Turbo is interpreting the code, I think at least.

Within the CMD it provided this response to make a calculator:
HTML:

<div>
  <button>7</button>
  <button>8</button>
  <button>9</button>
  <button>Clear</button>
  <input type="text" name="result">
</div>

CSS:

.calculator {
  display: flex;
  justify-content: center;
  align-items: center;
}

.button {
  width: 30px;
  height: 30px;
  border: none;
  background: #000000;
  color: #

However displayed was th
![HTML code](https://user-images.githubusercontent.com/127322566/230821995-517c27ee-29da-448e-a344-cb1786c6e9e6.png)
is:

Can't run in Hyper-V VM

Thanks in advance.
Running on home PC just fine, followed instructions, no issues besides the normal learning curve.
I've got a Dell Poweredge R720 with Windows Server 2016 Standard with the Hyper-V role installed.
The guest OS is currently Windows 11 Pro w/ 48 cores & 32 GB RAM allocated. 150GB VHDX file for storage. (NVMe drive on server) Fully updated. Already tried with Windows Server 2019 and Windows Server 2022, no luck, I thought it was an OS issue, but everything works just fine on home PC running Windows 11 Pro.
Fresh Install on guest VM - Win 11 Pro. Unzipped Alpaca-Turbo to C:\AI\Alpaca-Turbo. Grabbed ggml-model-q4_1.bin from Huggingface. Pi3141/alpaca-7b-native-enhanced, specifically. I installed Python 3.8 from 64-bit installer, enabled long path. Installed Miniconda3 Windows 64-bit. Conda init, added condabin to environmental variables, created environment, etc, followed instructions, no errors. The web interface loads fine. I load model, ggml-model-q4_1.bin is in the dropdown list. I click load and it just sits there. In task manager, the command window is using about 2% CPU. There's disk activity. It slowly ramps up to about 62MB RAM usage and stops. (Home PC only uses 30MB of RAM while model is fully loaded) CPU and disk activity stay at 2%. One error in the command prompt window that says: "127.0.0.1 - - [18/Apr/2023 11:42:31] "GET /load_model/undefined HTTP/1.1" 404 -" Then just never ending rows of ""GET /status HTTP/1.1" 200 -"
I'm pretty sure it has something to do with the fact that it's running in Hyper-V. All other variables are accounted for.
Possibilities:

Miniconda is a virtualized environment and a VM can't run inside another VM?
I installed Python by downloading the install .exe instead of letting Conda get it?
The guest OS does not have graphics acceleration.
That's all I got. One note, running on home PC, there is no Python item below the command prompt in task manager, whereas in the VM, there is two. One dormant, and one using 2-4% CPU. Did I mess up by installing Python manually instead of letting Conda install it?

Unasked questions and responses

Why this appearing?

Stuck at loading model...

Just installed Alpaca-Turbo and compiled alpaca.cpp for my Intel Mac, all smooth.

Stuck at loading model:

Tried couple from huggingface:

alpaca-7b-native-enhanced-ggml
alpaca-lora-7B-ggml
gpt4-x-alpaca-native-13B-ggml
gpt4all-lora-quantized.bin

No success. No indication (cpu usage wise) that something is happening.

Works great on my Win10 setup.

Your instructions are completely wrong

its all messed up

Wrong port in docker-compose.yml file

Alpaca-Turbo/docker-compose.yaml

Line 6 in 25b3218

- 5000:5000

should be 7887:7887, not 5000:5000

win.exe crashes when trying to load a model

Problem:

After UI is launched I tried to load alpaca 7b model, but when it's 100%, win.exe crashes and I can't generate anything.

My specs:

Windows 11 x64
AMD A8-6500
RAM 16GB 1666

alpaca_turbo.py typo on line 224 directoryh

in 0.6 release from an hour ago, in alpaca_turbo.py line 224:

there is a typo here:
return f"loaded successfully {self.list_available_models(self.models_directoryh)[self.model_idx]}"

fix seems working:
return f"loaded successfully {self.list_available_models(self.models_directory)[self.model_idx]}"

thanks for making this software and releasing it, appreciate it.

No option to clear thread

On the build no option to clear thread and history as earlier builds.

jinja2.exceptions.TemplateNotFound: index.html

Running the docker-compose version. After installing and accessing the URL http://127.0.0.1:7887/

alpaca_1  | 172.18.0.1 - - [21/Apr/2023 02:01:36] "GET / HTTP/1.1" 500 -
alpaca_1  | Traceback (most recent call last):
alpaca_1  |   File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2551, in __call__
alpaca_1  |     return self.wsgi_app(environ, start_response)
alpaca_1  |   File "/usr/local/lib/python3.8/site-packages/flask_socketio/__init__.py", line 43, in __call__
alpaca_1  |     return super(_SocketIOMiddleware, self).__call__(environ,
alpaca_1  |   File "/usr/local/lib/python3.8/site-packages/engineio/middleware.py", line 74, in __call__
alpaca_1  |     return self.wsgi_app(environ, start_response)
alpaca_1  |   File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2531, in wsgi_app
alpaca_1  |     response = self.handle_exception(e)
alpaca_1  |   File "/usr/local/lib/python3.8/site-packages/flask_cors/extension.py", line 165, in wrapped_function
alpaca_1  |     return cors_after_request(app.make_response(f(*args, **kwargs)))
alpaca_1  |   File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2528, in wsgi_app
alpaca_1  |     response = self.full_dispatch_request()
alpaca_1  |   File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1825, in full_dispatch_request
alpaca_1  |     rv = self.handle_user_exception(e)
alpaca_1  |   File "/usr/local/lib/python3.8/site-packages/flask_cors/extension.py", line 165, in wrapped_function
alpaca_1  |     return cors_after_request(app.make_response(f(*args, **kwargs)))
alpaca_1  |   File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1823, in full_dispatch_request
alpaca_1  |     rv = self.dispatch_request()
alpaca_1  |   File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1799, in dispatch_request
alpaca_1  |     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
alpaca_1  |   File "/app/api.py", line 234, in index
alpaca_1  |     return render_template("index.html")
alpaca_1  |   File "/usr/local/lib/python3.8/site-packages/flask/templating.py", line 146, in render_template
alpaca_1  |     template = app.jinja_env.get_or_select_template(template_name_or_list)
alpaca_1  |   File "/usr/local/lib/python3.8/site-packages/jinja2/environment.py", line 1081, in get_or_select_template
alpaca_1  |     return self.get_template(template_name_or_list, parent, globals)
alpaca_1  |   File "/usr/local/lib/python3.8/site-packages/jinja2/environment.py", line 1010, in get_template
alpaca_1  |     return self._load_template(name, globals)
alpaca_1  |   File "/usr/local/lib/python3.8/site-packages/jinja2/environment.py", line 969, in _load_template
alpaca_1  |     template = self.loader.load(self, name, self.make_globals(globals))
alpaca_1  |   File "/usr/local/lib/python3.8/site-packages/jinja2/loaders.py", line 126, in load
alpaca_1  |     source, filename, uptodate = self.get_source(environment, name)
alpaca_1  |   File "/usr/local/lib/python3.8/site-packages/flask/templating.py", line 62, in get_source
alpaca_1  |     return self._get_source_fast(environment, template)
alpaca_1  |   File "/usr/local/lib/python3.8/site-packages/flask/templating.py", line 98, in _get_source_fast
alpaca_1  |     raise TemplateNotFound(template)
alpaca_1  | jinja2.exceptions.TemplateNotFound: index.html

Long generations are cut off in the webui

with standard settings and both models I have - alpaca 7b and 13b-gpt-x longer generations are cut off in the webui.

after a while text stops appearing, the debug console shows only status messages, no more "polling" messages, but CPU usage stays up and the UI also shows the "stop generating" button

upon pressing that button a minute later, the console shows the much longer message (it was still generating), the message is not shown in the web interface

I followed all the settings on my m1 mac, but I can't select the model.

windows 11 problem

Fresh gitclone outputs this error

Unsafe PID removal

Hi,

After an ungraceful exit (Restart of the computer or ), the pid file stays in the path, but there is no process under the PID.
When attempting to start alpaca_turbo, it tries to kill the non-existent PID in the file and crashes due to unhandled exception.

suggest the following fix for alpaca_turbo.py starting with line 310:

    if os.path.exists("./pid"):
        log.fatal("Already running another instance or dirty exit last time")
        with open("./pid") as file:
            pid = int(file.readline())
        log.info("Attempting to kill the process")
        try:
            os.kill(pid, signal.SIGTERM)
        except OSError:
            pass
        os.remove("./pid")
        log.info("Fixed the Issue Now Retry running")

This will make sure that the PID file is removed even when the process was not killed/available.
Alternately, you can first check for the PID by sending

    try:
        os.kill(pid, signal.SIGTERM)
    except OSError:
        os.remove("./pid")

And then try to kill the PID.

Sorry for the smartassing - love your project :)

Can you add some more how to setup with like Portainer or in proxmox

Is there anyone that can add some more guides how to set this up in other environments for us that is not that good at docker?
I have for example a proxmox server setup and a docker/portainer machine running in that environment and would really like to get Alpaca Turbo setup.

Extremely slow respone

After getting rid of all the other issues (see other issue tickets for "models could not be loaded due to localhost issue" and "only a specific model can be used") I finally managed to get alpaca-turbo running.

But if I type I question, it takes over 130 seconds to reply with only the fraction of a word. After around 210 seconds the first sentence was finally completed.

The docker image is running on a server with 32GB of RAM and 16 CPU cores. They are far from being stressed or so. (RAM usage 2,9 GB, CPU 25%).

What could be the issue?

alpaca_turbo.exe got virus when download and scan by antivirus

virus detected

Unable to Find index.html file on Running Application through Docker Compose Up

Problem Description

Upon running the application through docker compose up, the application is unable to find the index.html file. The following traceback error is generated:

alpaca-turbo-alpaca-1  | Traceback (most recent call last):
alpaca-turbo-alpaca-1  |   File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2551, in __call__
alpaca-turbo-alpaca-1  |     return self.wsgi_app(environ, start_response)
alpaca-turbo-alpaca-1  |   File "/usr/local/lib/python3.8/site-packages/flask_socketio/__init__.py", line 43, in __call__
alpaca-turbo-alpaca-1  |     return super(_SocketIOMiddleware, self).__call__(environ,
alpaca-turbo-alpaca-1  |   File "/usr/local/lib/python3.8/site-packages/engineio/middleware.py", line 74, in __call__
alpaca-turbo-alpaca-1  |     return self.wsgi_app(environ, start_response)
alpaca-turbo-alpaca-1  |   File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2531, in wsgi_app
alpaca-turbo-alpaca-1  |     response = self.handle_exception(e)
alpaca-turbo-alpaca-1  |   File "/usr/local/lib/python3.8/site-packages/flask_cors/extension.py", line 165, in wrapped_function
alpaca-turbo-alpaca-1  |     return cors_after_request(app.make_response(f(*args, **kwargs)))
alpaca-turbo-alpaca-1  |   File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2528, in wsgi_app
alpaca-turbo-alpaca-1  |     response = self.full_dispatch_request()
alpaca-turbo-alpaca-1  |   File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1825, in full_dispatch_request
alpaca-turbo-alpaca-1  |     rv = self.handle_user_exception(e)
alpaca-turbo-alpaca-1  |   File "/usr/local/lib/python3.8/site-packages/flask_cors/extension.py", line 165, in wrapped_function
alpaca-turbo-alpaca-1  |     return cors_after_request(app.make_response(f(*args, **kwargs)))
alpaca-turbo-alpaca-1  |   File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1823, in full_dispatch_request
alpaca-turbo-alpaca-1  |     rv = self.dispatch_request()
alpaca-turbo-alpaca-1  |   File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1799, in dispatch_request
alpaca-turbo-alpaca-1  |     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
alpaca-turbo-alpaca-1  |   File "/app/api.py", line 218, in index
alpaca-turbo-alpaca-1  |     return render_template("index.html")
alpaca-turbo-alpaca-1  |   File "/usr/local/lib/python3.8/site-packages/flask/templating.py", line 146, in render_template
alpaca-turbo-alpaca-1  |     template = app.jinja_env.get_or_select_template(template_name_or_list)
alpaca-turbo-alpaca-1  |   File "/usr/local/lib/python3.8/site-packages/jinja2/environment.py", line 1081, in get_or_select_template
alpaca-turbo-alpaca-1  |     return self.get_template(template_name_or_list, parent, globals)
alpaca-turbo-alpaca-1  |   File "/usr/local/lib/python3.8/site-packages/jinja2/environment.py", line 1010, in get_template
alpaca-turbo-alpaca-1  |     return self._load_template(name, globals)
alpaca-turbo-alpaca-1  |   File "/usr/local/lib/python3.8/site-packages/jinja2/environment.py", line 969, in _load_template
alpaca-turbo-alpaca-1  |     template = self.loader.load(self, name, self.make_globals(globals))
alpaca-turbo-alpaca-1  |   File "/usr/local/lib/python3.8/site-packages/jinja2/loaders.py", line 126, in load
alpaca-turbo-alpaca-1  |     source, filename, uptodate = self.get_source(environment, name)
alpaca-turbo-alpaca-1  |   File "/usr/local/lib/python3.8/site-packages/flask/templating.py", line 62, in get_source
alpaca-turbo-alpaca-1  |     return self._get_source_fast(environment, template)
alpaca-turbo-alpaca-1  |   File "/usr/local/lib/python3.8/site-packages/flask/templating.py", line 98, in _get_source_fast
alpaca-turbo-alpaca-1  |     raise TemplateNotFound(template)
alpaca-turbo-alpaca-1  | jinja2.exceptions.TemplateNotFound: index.html

Solution

The issue is caused by the application's inability to locate the index.html file. To resolve this issue, please ensure that the index.html file is located in the correct file path and is accessible by the application.

Error using docker-compose

When trying the docker-compose, I get this:

 => [internal] load build definition from Dockerfile                                                                                                        0.0s
 => => transferring dockerfile: 668B                                                                                                                        0.0s
 => [internal] load .dockerignore                                                                                                                           0.0s
 => => transferring context: 2B                                                                                                                             0.0s
 => [internal] load metadata for docker.io/library/python:3.8-slim                                                                                          0.3s
 => [ 1/10] FROM docker.io/library/python:3.8-slim@sha256:f4efb39d02df8cdc44485a0956ea62e63aab6bf2a1dcfb12fb5710bf95583e72                                  0.0s
 => [internal] load build context                                                                                                                           0.0s
 => => transferring context: 37B                                                                                                                            0.0s
 => CACHED [ 2/10] RUN apt-get update &&     apt-get install -y --no-install-recommends cmake &&     apt-get clean                                          0.0s
 => CACHED [ 3/10] RUN apt-get install -y --no-install-recommends curl wget vim git gcc make libc6-dev g++ unzip                                            0.0s
 => CACHED [ 4/10] RUN mkdir -p /app/models                                                                                                                 0.0s
 => CACHED [ 5/10] RUN git clone https://github.com/ViperX7/llama.cpp /llama.cpp                                                                            0.0s
 => CACHED [ 6/10] RUN cd /llama.cpp && make                                                                                                                0.0s
 => ERROR [ 7/10] RUN mv /llama.cpp/main /main                                                                                                              0.3s
------
 > [ 7/10] RUN mv /llama.cpp/main /main:
#0 0.272 mv: cannot move '/llama.cpp/main' to a subdirectory of itself, '/main'
------
failed to solve: process "/bin/sh -c mv /llama.cpp/main /main" did not complete successfully: exit code: 1
root@docker-containers:~/alpaca#

It's running Debian 11(updated)

Short responses

I'm getting short and truncated responses. The generated answer stops in the middle of a sentence, usually after 60 - 90 words. CPU usage drops to 0, and I can see in the log that the response in the web interface is indeed identical to the output generated. I'm using the 13B model and I'm running on windows 10. So far, I have tried to change the temperature to 0.8, and the self_repeat to 128, with no apparent effect on the behaviour.
I've got 32gb of RAM in this machine, and the memory isn't full when the generator is running.

Anyone got a hint for me how to fix this? TIA.