Yesterday Google released Gemma - an open LLM that folks can run locally on their machines (similarly to llama2). I was wondering how easy it would be to run Gemma on my computer, chat with it and interact with it from a Go program.
Turns it - thanks to Ollama - it's extremely easy! Gemma was already added to Ollama, so all one has to do is run:
$ ollama run gemma
And wait for a few minutes while the model downloads. From this point on, my previous post about using Ollama locally in Go applies with pretty much no changes. Gemma becomes available through a REST API locally, and can be accessed from ollama-aware libraries like LangChainGo.
I went ahead and added a --model flag to all my code samples from that post, and they can all run with --model gemma now. It all just works, due to the magic of standard interfaces:
- Gemma is packaged in a standard interface for inclusion in Ollama
- Ollama then presents a standardized REST API for this model, just like it does for other compatible models
- LangChainGo has an Ollama provider that lets us write code to interact with any model running through Ollama
So we can write code like:
package main
import (
"context"
"flag"
"fmt"
"log"
"github.com/tmc/langchaingo/llms"
"github.com/tmc/langchaingo/llms/ollama"
)
func main() {
modelName := flag.String("model", "", "ollama model name")
flag.Parse()
llm, err := ollama.New(ollama.WithModel(*modelName))
if err != nil {
log.Fatal(err)
}
query := flag.Args()[0]
ctx := context.Background()
completion, err := llms.GenerateFromSinglePrompt(ctx, llm, query)
if err != nil {
log.Fatal(err)
}
fmt.Println("Response:\n", completion)
}
And then run it as follows:
$ go run ollama-completion-arg.go --model gemma "what should be added to 91 to make -20?"
Response:
The answer is -111.
91 + (-111) = -20
Gemma seems relatively fast for a model running on a CPU. I find that the default 7B model, while much more capable than the default 7B llama2 based on published benchmarks - also runs about 30% faster on my machine.
Without LangChainGo
While LangChainGo offers a conveneint API that's standardized across LLM providers, its use is by no means required for this sample. Ollama itself has a Go API as part of its structure and it can be used externally as well. Here's an equivalent sample that doesn't require LangChainGo:
package main
import (
"context"
"flag"
"fmt"
"log"
"github.com/jmorganca/ollama/api"
)
func main() {
modelName := flag.String("model", "", "ollama model name")
flag.Parse()
client, err := api.ClientFromEnvironment()
if err != nil {
log.Fatal(err)
}
req := &api.GenerateRequest{
Model: *modelName,
Prompt: flag.Args()[0],
Stream: new(bool), // disable streaming
}
ctx := context.Background()
var response string
respFunc := func(resp api.GenerateResponse) error {
response = resp.Response
return nil
}
err = client.Generate(ctx, req, respFunc)
if err != nil {
log.Fatal(err)
}
fmt.Println("Response:\n", response)
}