Comments (6)
This should be fixed on main
and the latest versions β note that running larger than 7GB models on will still mostly likely encounter an issue on 8GB Macs
from ollama.
Mostly because it's out of memory?
from ollama.
Thanks @chsasank for submitting this. May I ask which model you were running? It does look like there isn't enough memory, and Ollama tried to allocate more memory
from ollama.
Ran llama2
from ollama.
Orca worked fine though.
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 3200
llama_model_load_internal: n_mult = 240
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 26
llama_model_load_internal: n_rot = 100
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 8640
llama_model_load_internal: model size = 3B
llama_model_load_internal: ggml ctx size = 0.06 MB
llama_model_load_internal: mem required = 2862.72 MB (+ 682.00 MB per state)
llama_new_context_with_model: kv self size = 650.00 MB
ggml_metal_init: allocating
ggml_metal_init: using MPS
ggml_metal_init: loading '/Users/sasank/code/llama/ollama/ggml-metal.metal'
ggml_metal_init: loaded kernel_add 0x157107260
ggml_metal_init: loaded kernel_mul 0x157107900
ggml_metal_init: loaded kernel_mul_row 0x157107f30
ggml_metal_init: loaded kernel_scale 0x157108450
ggml_metal_init: loaded kernel_silu 0x157108970
ggml_metal_init: loaded kernel_relu 0x157108e90
ggml_metal_init: loaded kernel_gelu 0x1571093b0
ggml_metal_init: loaded kernel_soft_max 0x157109a60
ggml_metal_init: loaded kernel_diag_mask_inf 0x15710a0c0
ggml_metal_init: loaded kernel_get_rows_f16 0x15710a740
ggml_metal_init: loaded kernel_get_rows_q4_0 0x15710adc0
ggml_metal_init: loaded kernel_get_rows_q4_1 0x15710b5b0
ggml_metal_init: loaded kernel_get_rows_q2_K 0x15710bc30
ggml_metal_init: loaded kernel_get_rows_q3_K 0x15710c2b0
ggml_metal_init: loaded kernel_get_rows_q4_K 0x15710c930
ggml_metal_init: loaded kernel_get_rows_q5_K 0x15710cfb0
ggml_metal_init: loaded kernel_get_rows_q6_K 0x155f04a60
ggml_metal_init: loaded kernel_rms_norm 0x155f05310
ggml_metal_init: loaded kernel_norm 0x155f059c0
ggml_metal_init: loaded kernel_mul_mat_f16_f32 0x155f064b0
ggml_metal_init: loaded kernel_mul_mat_q4_0_f32 0x15710d570
ggml_metal_init: loaded kernel_mul_mat_q4_1_f32 0x15710dc50
ggml_metal_init: loaded kernel_mul_mat_q2_K_f32 0x15710e330
ggml_metal_init: loaded kernel_mul_mat_q3_K_f32 0x15710ebb0
ggml_metal_init: loaded kernel_mul_mat_q4_K_f32 0x15710f290
ggml_metal_init: loaded kernel_mul_mat_q5_K_f32 0x15710f970
ggml_metal_init: loaded kernel_mul_mat_q6_K_f32 0x157110050
ggml_metal_init: loaded kernel_rope 0x157110b40
ggml_metal_init: loaded kernel_alibi_f32 0x157111400
ggml_metal_init: loaded kernel_cpy_f32_f16 0x157111c90
ggml_metal_init: loaded kernel_cpy_f32_f32 0x157112520
ggml_metal_init: loaded kernel_cpy_f16_f16 0x157112db0
ggml_metal_init: recommendedMaxWorkingSetSize = 5461.34 MB
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: maxTransferRate = built-in GPU
llama_new_context_with_model: max tensor size = 54.93 MB
ggml_metal_add_buffer: allocated 'data ' buffer, size = 1839.12 MB, ( 1839.52 / 5461.34)
ggml_metal_add_buffer: allocated 'eval ' buffer, size = 512.00 MB, ( 2351.52 / 5461.34)
ggml_metal_add_buffer: allocated 'kv ' buffer, size = 652.00 MB, ( 3003.52 / 5461.34)
ggml_metal_add_buffer: allocated 'scr0 ' buffer, size = 256.00 MB, ( 3259.52 / 5461.34)
ggml_metal_add_buffer: allocated 'scr1 ' buffer, size = 256.00 MB, ( 3515.52 / 5461.34)
llama_print_timings: load time = 5199.96 ms
llama_print_timings: sample time = 6.94 ms / 31 runs ( 0.22 ms per token, 4465.57 tokens per second)
llama_print_timings: prompt eval time = 1579.53 ms / 39 tokens ( 40.50 ms per token, 24.69 tokens per second)
llama_print_timings: eval time = 1119.88 ms / 30 runs ( 37.33 ms per token, 26.79 tokens per second)
llama_print_timings: total time = 2748.18 ms
ggml_metal_free: deallocating
[GIN] 2023/07/19 - 12:09:46 | 200 | 7.978695084s | 127.0.0.1 | POST "/api/generate"
from ollama.
Same happening to me at M2 Air 8GB. Probably needs more ram.
./ollama serve
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.
[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
- using env: export GIN_MODE=release
- using code: gin.SetMode(gin.ReleaseMode)
[GIN-debug] GET / --> github.com/jmorganca/ollama/server.Serve.func1 (4 handlers)
[GIN-debug] POST /api/pull --> github.com/jmorganca/ollama/server.PullModelHandler (4 handlers)
[GIN-debug] POST /api/generate --> github.com/jmorganca/ollama/server.GenerateHandler (4 handlers)
[GIN-debug] POST /api/create --> github.com/jmorganca/ollama/server.CreateModelHandler (4 handlers)
[GIN-debug] POST /api/push --> github.com/jmorganca/ollama/server.PushModelHandler (4 handlers)
[GIN-debug] GET /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (4 handlers)
[GIN-debug] DELETE /api/delete --> github.com/jmorganca/ollama/server.DeleteModelHandler (4 handlers)
2023/07/23 13:59:25 routes.go:260: Listening on 127.0.0.1:11434
llama.cpp: loading model from /Users/gabriel/.ollama/models/blobs/sha256:8daa9615cce30c259a9555b1cc250d461d1bc69980a274b44d7eda0be78076d8
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: freq_base = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.08 MB
llama_model_load_internal: mem required = 5287.72 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size = 1024.00 MB
ggml_metal_init: allocating
ggml_metal_init: using MPS
ggml_metal_init: loading '/Users/gabriel/Documents/Llama2/ollama/ggml-metal.metal'
ggml_metal_init: loaded kernel_add 0x13c610890
ggml_metal_init: loaded kernel_mul 0x13c611de0
ggml_metal_init: loaded kernel_mul_row 0x13c613300
ggml_metal_init: loaded kernel_scale 0x13c613600
ggml_metal_init: loaded kernel_silu 0x13c613e00
ggml_metal_init: loaded kernel_relu 0x13c612380
ggml_metal_init: loaded kernel_gelu 0x13c6147d0
ggml_metal_init: loaded kernel_soft_max 0x13c615750
ggml_metal_init: loaded kernel_diag_mask_inf 0x13c616ab0
ggml_metal_init: loaded kernel_get_rows_f16 0x13c616d10
ggml_metal_init: loaded kernel_get_rows_q4_0 0x13c615fc0
ggml_metal_init: loaded kernel_get_rows_q4_1 0x13c6176d0
ggml_metal_init: loaded kernel_get_rows_q2_K 0x13c618a40
ggml_metal_init: loaded kernel_get_rows_q3_K 0x13c617f20
ggml_metal_init: loaded kernel_get_rows_q4_K 0x13c619180
ggml_metal_init: loaded kernel_get_rows_q5_K 0x13c619ac0
ggml_metal_init: loaded kernel_get_rows_q6_K 0x13c61a400
ggml_metal_init: loaded kernel_rms_norm 0x13c61aee0
ggml_metal_init: loaded kernel_norm 0x13c61b9b0
ggml_metal_init: loaded kernel_mul_mat_f16_f32 0x13c61cc80
ggml_metal_init: loaded kernel_mul_mat_q4_0_f32 0x13c61d680
ggml_metal_init: loaded kernel_mul_mat_q4_1_f32 0x13c61e080
ggml_metal_init: loaded kernel_mul_mat_q2_K_f32 0x13c61ea90
ggml_metal_init: loaded kernel_mul_mat_q3_K_f32 0x13c61f4b0
ggml_metal_init: loaded kernel_mul_mat_q4_K_f32 0x13c61ffd0
ggml_metal_init: loaded kernel_mul_mat_q5_K_f32 0x13c620ab0
ggml_metal_init: loaded kernel_mul_mat_q6_K_f32 0x13c6214a0
ggml_metal_init: loaded kernel_rope 0x13c621c90
ggml_metal_init: loaded kernel_alibi_f32 0x13c6229c0
ggml_metal_init: loaded kernel_cpy_f32_f16 0x13c623820
ggml_metal_init: loaded kernel_cpy_f32_f32 0x13c6243a0
ggml_metal_init: loaded kernel_cpy_f16_f16 0x13c624f00
ggml_metal_init: recommendedMaxWorkingSetSize = 5461.34 MB
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: maxTransferRate = built-in GPU
llama_new_context_with_model: max tensor size = 70.31 MB
ggml_metal_add_buffer: allocated 'data ' buffer, size = 3616.08 MB, ( 3616.53 / 5461.34)
ggml_metal_add_buffer: allocated 'eval ' buffer, size = 776.00 MB, ( 4392.53 / 5461.34)
ggml_metal_add_buffer: allocated 'kv ' buffer, size = 1026.00 MB, ( 5418.53 / 5461.34)
ggml_metal_add_buffer: allocated 'scr0 ' buffer, size = 384.00 MB, ( 5802.53 / 5461.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_add_buffer: allocated 'scr1 ' buffer, size = 512.00 MB, ( 6314.53 / 5461.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_graph_compute: command buffer 0 failed with status 5
GGML_ASSERT: ggml-metal.m:1023: false
SIGABRT: abort
PC=0x198a84724 m=3 sigcode=0
signal arrived during cgo execution
goroutine 19 [syscall]:
runtime.cgocall(0x10286948c, 0x14000123278)
/usr/local/go/src/runtime/cgocall.go:157 +0x54 fp=0x14000123240 sp=0x14000123200 pc=0x1023556a4
github.com/jmorganca/ollama/llama._Cfunc_llama_eval(0x13d01b000, 0x14000409ef8, 0x1, 0x0, 0x8)
_cgo_gotypes.go:210 +0x38 fp=0x14000123270 sp=0x14000123240 pc=0x102856388
github.com/jmorganca/ollama/llama.New.func4(0x102a92f00?, {0x14000409ef8, 0x1, 0x0?}, {0xffffffffffffffff, 0x0, 0x800, 0x200, 0x1, 0x0, ...})
/Users/gabriel/Documents/Llama2/ollama/llama/llama.go:141 +0x7c fp=0x140001232c0 sp=0x14000123270 pc=0x1028571ac
github.com/jmorganca/ollama/llama.New({0x14000224e00, 0x6b}, {0xffffffffffffffff, 0x0, 0x800, 0x200, 0x1, 0x0, 0x0, 0x1, ...})
/Users/gabriel/Documents/Llama2/ollama/llama/llama.go:141 +0x288 fp=0x14000123480 sp=0x140001232c0 pc=0x102856f68
github.com/jmorganca/ollama/server.GenerateHandler(0x14000432500)
/Users/gabriel/Documents/Llama2/ollama/server/routes.go:56 +0x5c0 fp=0x140001236e0 sp=0x14000123480 pc=0x102862c40
github.com/gin-gonic/gin.(*Context).Next(...)
/Users/gabriel/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174
github.com/gin-gonic/gin.CustomRecoveryWithWriter.func1(0x14000432500)
/Users/gabriel/go/pkg/mod/github.com/gin-gonic/[email protected]/recovery.go:102 +0x7c fp=0x14000123730 sp=0x140001236e0 pc=0x10284b40c
github.com/gin-gonic/gin.(*Context).Next(...)
/Users/gabriel/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174
github.com/gin-gonic/gin.LoggerWithConfig.func1(0x14000432500)
/Users/gabriel/go/pkg/mod/github.com/gin-gonic/[email protected]/logger.go:240 +0xac fp=0x140001238e0 sp=0x14000123730 pc=0x10284a68c
github.com/gin-gonic/gin.(*Context).Next(...)
/Users/gabriel/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174
github.com/gin-gonic/gin.(*Engine).handleHTTPRequest(0x140003e4d00, 0x14000432500)
/Users/gabriel/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:620 +0x54c fp=0x14000123a70 sp=0x140001238e0 pc=0x10284979c
github.com/gin-gonic/gin.(*Engine).ServeHTTP(0x140003e4d00, {0x102ae52a0?, 0x140003ed420}, 0x14000432400)
/Users/gabriel/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:576 +0x1d4 fp=0x14000123ab0 sp=0x14000123a70 pc=0x1028490a4
net/http.serverHandler.ServeHTTP({0x102ae3230?}, {0x102ae52a0, 0x140003ed420}, 0x14000432400)
/usr/local/go/src/net/http/server.go:2936 +0x2d8 fp=0x14000123b60 sp=0x14000123ab0 pc=0x1025d2dd8
net/http.(*conn).serve(0x140001387e0, {0x102ae5918, 0x14000434240})
/usr/local/go/src/net/http/server.go:1995 +0x560 fp=0x14000123fa0 sp=0x14000123b60 pc=0x1025cead0
net/http.(*Server).Serve.func3()
/usr/local/go/src/net/http/server.go:3089 +0x30 fp=0x14000123fd0 sp=0x14000123fa0 pc=0x1025d3600
runtime.goexit()
/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0x14000123fd0 sp=0x14000123fd0 pc=0x1023b8b24
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:3089 +0x520
goroutine 1 [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:381 +0xe4 fp=0x1400031f700 sp=0x1400031f6e0 pc=0x102388924
runtime.netpollblock(0x1400031f798?, 0x243c754?, 0x1?)
/usr/local/go/src/runtime/netpoll.go:527 +0x158 fp=0x1400031f740 sp=0x1400031f700 pc=0x102381e48
internal/poll.runtime_pollWait(0x12a568b18, 0x72)
/usr/local/go/src/runtime/netpoll.go:306 +0xa0 fp=0x1400031f770 sp=0x1400031f740 pc=0x1023b26f0
internal/poll.(*pollDesc).wait(0x14000412600?, 0x0?, 0x0)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400031f7a0 sp=0x1400031f770 pc=0x102437d98
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x14000412600)
/usr/local/go/src/internal/poll/fd_unix.go:614 +0x250 fp=0x1400031f850 sp=0x1400031f7a0 pc=0x10243c840
net.(*netFD).accept(0x14000412600)
/usr/local/go/src/net/fd_unix.go:172 +0x28 fp=0x1400031f910 sp=0x1400031f850 pc=0x10247bda8
net.(*TCPListener).accept(0x140000c6d38)
/usr/local/go/src/net/tcpsock_posix.go:148 +0x28 fp=0x1400031f940 sp=0x1400031f910 pc=0x1024913a8
net.(*TCPListener).Accept(0x140000c6d38)
/usr/local/go/src/net/tcpsock.go:297 +0x2c fp=0x1400031f980 sp=0x1400031f940 pc=0x10249051c
net/http.(*onceCloseListener).Accept(0x140001387e0?)
<autogenerated>:1 +0x30 fp=0x1400031f9a0 sp=0x1400031f980 pc=0x1025f6d80
net/http.(*Server).Serve(0x14000338ff0, {0x102ae5090, 0x140000c6d38})
/usr/local/go/src/net/http/server.go:3059 +0x304 fp=0x1400031fad0 sp=0x1400031f9a0 pc=0x1025d32a4
github.com/jmorganca/ollama/server.Serve({0x102ae5090, 0x140000c6d38})
/Users/gabriel/Documents/Llama2/ollama/server/routes.go:265 +0x4e0 fp=0x1400031fca0 sp=0x1400031fad0 pc=0x102864e40
github.com/jmorganca/ollama/cmd.RunServer(0x140003c7200?, {0x1028ba248?, 0x0?, 0x0?})
/Users/gabriel/Documents/Llama2/ollama/cmd/cmd.go:406 +0x114 fp=0x1400031fd20 sp=0x1400031fca0 pc=0x1028685f4
github.com/spf13/cobra.(*Command).execute(0x140003c7200, {0x102f5e450, 0x0, 0x0})
/Users/gabriel/go/pkg/mod/github.com/spf13/[email protected]/command.go:940 +0x5c8 fp=0x1400031fe60 sp=0x1400031fd20 pc=0x102679528
github.com/spf13/cobra.(*Command).ExecuteC(0x140003c6900)
/Users/gabriel/go/pkg/mod/github.com/spf13/[email protected]/command.go:1068 +0x35c fp=0x1400031ff20 sp=0x1400031fe60 pc=0x102679c7c
github.com/spf13/cobra.(*Command).Execute(...)
/Users/gabriel/go/pkg/mod/github.com/spf13/[email protected]/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(0x1400005c768?, {0x102ae58a8?, 0x140000b8010?})
/Users/gabriel/go/pkg/mod/github.com/spf13/[email protected]/command.go:985 +0x50 fp=0x1400031ff40 sp=0x1400031ff20 pc=0x102679810
main.main()
/Users/gabriel/Documents/Llama2/ollama/main.go:10 +0x34 fp=0x1400031ff70 sp=0x1400031ff40 pc=0x102869254
runtime.main()
/usr/local/go/src/runtime/proc.go:250 +0x248 fp=0x1400031ffd0 sp=0x1400031ff70 pc=0x1023884f8
runtime.goexit()
/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0x1400031ffd0 sp=0x1400031ffd0 pc=0x1023b8b24
goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:381 +0xe4 fp=0x1400005cfa0 sp=0x1400005cf80 pc=0x102388924
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:387
runtime.forcegchelper()
/usr/local/go/src/runtime/proc.go:305 +0xb8 fp=0x1400005cfd0 sp=0x1400005cfa0 pc=0x102388768
runtime.goexit()
/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0x1400005cfd0 sp=0x1400005cfd0 pc=0x1023b8b24
created by runtime.init.6
/usr/local/go/src/runtime/proc.go:293 +0x24
goroutine 3 [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:381 +0xe4 fp=0x1400005d760 sp=0x1400005d740 pc=0x102388924
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:387
runtime.bgsweep(0x0?)
/usr/local/go/src/runtime/mgcsweep.go:278 +0xa4 fp=0x1400005d7b0 sp=0x1400005d760 pc=0x102375604
runtime.gcenable.func1()
/usr/local/go/src/runtime/mgc.go:178 +0x28 fp=0x1400005d7d0 sp=0x1400005d7b0 pc=0x10236a118
runtime.goexit()
/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0x1400005d7d0 sp=0x1400005d7d0 pc=0x1023b8b24
created by runtime.gcenable
/usr/local/go/src/runtime/mgc.go:178 +0x74
goroutine 4 [GC scavenge wait]:
runtime.gopark(0x14000038070?, 0x1029b8678?, 0x1?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:381 +0xe4 fp=0x1400005df50 sp=0x1400005df30 pc=0x102388924
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:387
runtime.(*scavengerState).park(0x102ea2f20)
/usr/local/go/src/runtime/mgcscavenge.go:400 +0x5c fp=0x1400005df80 sp=0x1400005df50 pc=0x10237347c
runtime.bgscavenge(0x0?)
/usr/local/go/src/runtime/mgcscavenge.go:628 +0x44 fp=0x1400005dfb0 sp=0x1400005df80 pc=0x1023739f4
runtime.gcenable.func2()
/usr/local/go/src/runtime/mgc.go:179 +0x28 fp=0x1400005dfd0 sp=0x1400005dfb0 pc=0x10236a0b8
runtime.goexit()
/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0x1400005dfd0 sp=0x1400005dfd0 pc=0x1023b8b24
created by runtime.gcenable
/usr/local/go/src/runtime/mgc.go:179 +0xb8
goroutine 18 [finalizer wait]:
runtime.gopark(0x1a0?, 0x102ea3960?, 0x80?, 0x26?, 0x0?)
/usr/local/go/src/runtime/proc.go:381 +0xe4 fp=0x1400005c580 sp=0x1400005c560 pc=0x102388924
runtime.runfinq()
/usr/local/go/src/runtime/mfinal.go:193 +0x10c fp=0x1400005c7d0 sp=0x1400005c580 pc=0x1023691ac
runtime.goexit()
/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0x1400005c7d0 sp=0x1400005c7d0 pc=0x1023b8b24
created by runtime.createfing
/usr/local/go/src/runtime/mfinal.go:163 +0x84
goroutine 20 [IO wait]:
runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x1023cb340?)
/usr/local/go/src/runtime/proc.go:381 +0xe4 fp=0x14000058540 sp=0x14000058520 pc=0x102388924
runtime.netpollblock(0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/netpoll.go:527 +0x158 fp=0x14000058580 sp=0x14000058540 pc=0x102381e48
internal/poll.runtime_pollWait(0x12a568a28, 0x72)
/usr/local/go/src/runtime/netpoll.go:306 +0xa0 fp=0x140000585b0 sp=0x14000058580 pc=0x1023b26f0
internal/poll.(*pollDesc).wait(0x14000412800?, 0x14000434341?, 0x0)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x140000585e0 sp=0x140000585b0 pc=0x102437d98
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x14000412800, {0x14000434341, 0x1, 0x1})
/usr/local/go/src/internal/poll/fd_unix.go:167 +0x200 fp=0x14000058680 sp=0x140000585e0 pc=0x102439100
net.(*netFD).Read(0x14000412800, {0x14000434341?, 0x0?, 0x0?})
/usr/local/go/src/net/fd_posix.go:55 +0x28 fp=0x140000586d0 sp=0x14000058680 pc=0x10247a108
net.(*conn).Read(0x140000c8d08, {0x14000434341?, 0x0?, 0x0?})
/usr/local/go/src/net/net.go:183 +0x34 fp=0x14000058720 sp=0x140000586d0 pc=0x102488714
net.(*TCPConn).Read(0x0?, {0x14000434341?, 0x0?, 0x0?})
<autogenerated>:1 +0x2c fp=0x14000058750 sp=0x14000058720 pc=0x10249ac5c
net/http.(*connReader).backgroundRead(0x14000434330)
/usr/local/go/src/net/http/server.go:674 +0x44 fp=0x140000587b0 sp=0x14000058750 pc=0x1025c8f84
net/http.(*connReader).startBackgroundRead.func2()
/usr/local/go/src/net/http/server.go:670 +0x28 fp=0x140000587d0 sp=0x140000587b0 pc=0x1025c8ea8
runtime.goexit()
/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0x140000587d0 sp=0x140000587d0 pc=0x1023b8b24
created by net/http.(*connReader).startBackgroundRead
/usr/local/go/src/net/http/server.go:670 +0xcc
r0 0x0
r1 0x0
r2 0x0
r3 0x0
r4 0x0
r5 0x16eaa2c00
r6 0xa
r7 0x0
r8 0x58a7f816d970080f
r9 0x58a7f817b7dbb80f
r10 0x2
r11 0xfffffffd
r12 0x10000000000
r13 0x0
r14 0x0
r15 0x0
r16 0x148
r17 0x1f85b8f60
r18 0x0
r19 0x6
r20 0x16eabb000
r21 0x1903
r22 0x16eabb0e0
r23 0x8
r24 0x7
r25 0x8
r26 0x1f3a97720
r27 0x1028aafc0
r28 0x1029d10f0
r29 0x16eaa2bb0
lr 0x198abbc28
sp 0x16eaa2b90
pc 0x198a84724
fault 0x198a84724
I donΒ΄t think this is something that can get fixed. I built the exeutable using the readme instructions.
Maybe a Warning message could be good, since someone arriving at the repo and trying this model at fisrt could lost a lot of time trying to see what is the problem.
from ollama.
Related Issues (20)
- Extended lora support
- Llava 1.6 34B fp16: refuses to answer questions on forms or hallucinates, when official Llava 1.6 34B demo does answer them perfectly HOT 1
- phi3-medium-128k wrong number of tensors HOT 2
- The ollama server is stopped when I submitted jobs parallelly
- about model quantization HOT 2
- aya model : error when using the generate endpoint
- granite-code:20b-instruct-q8_0 error loading model vocabulary: unknown pre-tokenizer type: 'refact' HOT 1
- Runner process terminated
- Repeating answers in an instance.
- Problem while pulling some models HOT 3
- Getting Weird Response
- Failure to push llama3 and mistral:0.3 with custom parameters to ollama model registry HOT 2
- 1.0.39 pre-release - timed out waiting for llama runner to start HOT 2
- windows gpu memory.available always be one value HOT 2
- Prompt caching causes reproducible outputs to be inconsistent
- What's happening?When I enter the Serve command. HOT 1
- Llama.cpp now supports distributed inference across multiple machines.
- more types of models
- 3 GPUs, 2xNVIDIA and 1x AMD onboard - Can I force Python3 to use AMD, and Ollama to use 2xNVIDIA
- phi-3 small and phi-3 vision missing?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ollama.