The Ollama zig library is the easiest way to interact and integrate your zig project with the Ollama REST API.
Zig master version is required to use ollama-zig.
-
Copy the full hash of the latest commit and replace
<COMMIT_HASH>
with it, then run this.zig fetch --save https://github.com/tr1ckydev/ollama-zig/archive/<COMMIT_HASH>.tar.gz
-
Add the dependency and module to your
build.zig
.const ollama_dep = b.dependency("ollama-zig", .{}); const ollama_mod = ollama_dep.module("ollama-zig"); exe.root_module.addImport("ollama-zig", ollama_mod);
-
Import it inside your project.
const Ollama = @import("ollama-zig");
var ollama = Ollama.init(allocator, .{});
defer ollama.deinit();
const res = try ollama.generate(.{
.model = "llama2",
.prompt = "Why is the sky blue?",
.stream = false,
});
std.debug.print("{s}", .{res.response});
Streaming responses are made easy through the Streamable
iterator interface.
Functions with suffix Stream
always return an iterator interface where each part is an object in the stream.
var ollama = Ollama.init(allocator, .{});
defer ollama.deinit();
const stream = try ollama.generateStream(.{
.model = "llama2",
.prompt = "Why is the sky blue?",
});
while (try stream.next()) |part| {
std.debug.print("{s}", .{part.response});
}
Access the underlying Ollama API structs.
const req = Ollama.Type.GenerateRequest{
.model = "llama2",
.prompt = "Why is the sky blue?",
.stream = false,
};
var ollama = Ollama.init(allocator, .{});
defer ollama.deinit();
const res = try ollama.generate(req);
std.debug.print("{s}", .{res.response});
Initialize a new Ollama client.
var ollama = Ollama.init(allocator, .{});
allocator
: The allocator to use in the client.config
: The configuration for the client. (Pass an empty object to use default configuration.)host
: The host URL to use for the Ollama API server.response_max_size
: The maximum size for a response in bytes. Set this to a higher value if you encounter 'error: StreamTooLong'. Default is 4096.
Release all resources used by the client.
ollama.deinit();
Generate the next message in a chat with a provided model.
This is not a streaming endpoint and returns a single response object. (Requires stream: false
)
var msgs = std.ArrayList(Ollama.Type.Message).init(allocator);
defer msgs.deinit();
try msgs.append(.{ .role = "user", .content = "Why is the sky blue?" });
var ollama = Ollama.init(allocator, .{});
defer ollama.deinit();
const res = try ollama.chat(.{
.model = "llama2",
.messages = msgs.items,
.stream = false,
});
std.debug.print("{s}", .{res.message.content});
model
: The name of the model to use for the chat.messages
: Array slice ofMessage
objects representing the conversation history. (Ideally, you'd want to use an ArrayList to store message history.)role
: The role of the message sender. ("system"
|"user"
|"assistant"
)content
: The content of the message.
stream
: (Optional) Should be explicitly set tofalse
for functions without aStream
suffix. Default is true.format
: (Optional) The format to return a response in. Currently the only accepted value is"json"
.options
: (Optional) Additional model parameters listed in the documentation for the Modelfile.template
: (Optional) The prompt template to use. (Overrides what is defined in theModelfile
)keep_alive
: (Optional) Controls how long the model will stay loaded into memory following the request. Accepts both number or string through a union. (e.g..keep_alive = .{ .number = 5000000 }
or.keep_alive = .{ .string = "5m" }
)
Streamable version of the above chat
function.
var msgs = std.ArrayList(Ollama.Type.Message).init(allocator);
defer msgs.deinit();
try msgs.append(.{ .role = "user", .content = "Why is the sky blue?" });
var ollama = Ollama.init(allocator, .{});
defer ollama.deinit();
const stream = try ollama.chatStream(.{
.model = "llama2",
.messages = msgs.items,
});
while (try stream.next()) |part| {
std.debug.print("{s}", .{part.message.content});
}
Generate a response for a given prompt with a provided model.
This is not a streaming endpoint and returns a single response object. (Requires stream: false
)
var ollama = Ollama.init(allocator, .{});
defer ollama.deinit();
const res = try ollama.generate(.{
.model = "llama2",
.prompt = "Why is the sky blue?",
.stream = false,
});
std.debug.print("{s}", .{res.response});
model
: The name of the model to use for generating the response.prompt
: The prompt to generate a response for.stream
: (Optional) Should be explicitly set tofalse
for functions without aStream
suffix. Default is true.format
: (Optional) The format to return a response in. Currently the only accepted value is"json"
.options
: (Optional) Additional model parameters listed in the documentation for the Modelfile.template
: (Optional) The prompt template to use. (Overrides what is defined in theModelfile
)keep_alive
: (Optional) Controls how long the model will stay loaded into memory following the request. Accepts both number or string through a union. (e.g..keep_alive = .{ .number = 5000000 }
or.keep_alive = .{ .string = "5m" }
)system
: (Optional) System message to override what is defined in theModelfile
.context
: (Optional) The context parameter returned from a previousgenerate()
, this can be used to keep a short conversational memory.raw
: (Optional) Iftrue
, no formatting will be applied to the prompt if you are specifying a full templated prompt.
Streamable version of the above generate
function.
var ollama = Ollama.init(allocator, .{});
defer ollama.deinit();
const stream = try ollama.generateStream(.{
.model = "llama2",
.prompt = "Why is the sky blue?",
});
while (try stream.next()) |part| {
std.debug.print("{s}", .{part.response});
}
List models that are available locally.
var ollama = Ollama.init(allocator, .{});
defer ollama.deinit();
const res = try ollama.list();
for (res.models) |model| {
std.debug.print("{s}\n", .{model.model});
}
Show information about a model including details, modelfile, template, parameters, license, and system prompt.
var ollama = Ollama.init(allocator, .{ .response_max_size = 1024 * 100 });
defer ollama.deinit();
const res = try ollama.show(.{ .model = "llama2" });
std.debug.print("{s}", .{res.modelfile});
model
: The name of the model to show the information of.system
: (Optional) System message to override what is defined in theModelfile
.template
: (Optional) The prompt template to use. (Overrides what is defined in theModelfile
)options
: (Optional) Additional model parameters listed in the documentation for the Modelfile.
Generate embeddings from a model for a given prompt.
var ollama = Ollama.init(allocator, .{ .response_max_size = 1024 * 100 });
defer ollama.deinit();
const res = try ollama.embeddings(.{
.model = "llama2",
.prompt = "Why is the sky blue?",
});
std.debug.print("{any}", .{res.embedding});
model
: The name of the model to use for generating the embeddings.prompt
: The prompt to generate embeddings for.options
: (Optional) Additional model parameters listed in the documentation for the Modelfile.keep_alive
: (Optional) Controls how long the model will stay loaded into memory following the request. Accepts both number or string through a union. (e.g..keep_alive = .{ .number = 5000000 }
or.keep_alive = .{ .string = "5m" }
)
This is a very early version and may contain bugs and unusual behaviors.
APIs not implemented: pull
, push
, create
, delete
, copy
Known issues:
- Images aren't supported yet.
- Results in unknown errors if the provided
model
isn't found on host system. - Sometimes
name
field fromlist()
is empty. - ...create issue if you find more!
This repository uses MIT License. See LICENSE for full license text.