Yesterday Google released Gemma - an open LLM that folks can run locally on their machines (similarly to llama2). I was wondering how easy it would be to run Gemma on my computer, chat with it and interact with it from a Go program.
Turns it - thanks to Ollama - it's extremely easy! Gemma was already added to Ollama, so all one has to do is run:
$ ollama run gemma
And wait for a few minutes while model downloads. From this point on, my previous post about using Ollama locally in Go applies with pretty much no changes. Gemma becomes available through a REST API locally, and can be accessed from ollama-aware libraries like LangChainGo.
I went ahead and added a --model flag to all my code samples from that post, and they can all run with --model gemma now. It all just works, due to the magic of standard interfaces:
- Gemma is packaged in a standard interface for inclusion in Ollama
- Ollama then presents a standardized REST API for this model, just like it does for other compatible models
- LangChainGo has an Ollama provider that lets us write code to interact with any model running through Ollama
So we can write code like:
package main
import (
"context"
"flag"
"fmt"
"log"
"github.com/tmc/langchaingo/llms"
"github.com/tmc/langchaingo/llms/ollama"
)
func main() {
modelName := flag.String("model", "", "ollama model name")
flag.Parse()
llm, err := ollama.New(ollama.WithModel(*modelName))
if err != nil {
log.Fatal(err)
}
query := flag.Args()[0]
ctx := context.Background()
completion, err := llms.GenerateFromSinglePrompt(ctx, llm, query)
if err != nil {
log.Fatal(err)
}
fmt.Println("Response:\n", completion)
}
And then run it as follows:
$ go run ollama-completion-arg.go --model gemma "what should be added to 91 to make -20?"
Response:
The answer is -111.
91 + (-111) = -20
The Gemma runner was optimized well to run on CPUs - I find that the default 7B model, while much more capable than the default 7B llama2 based on published benchmarks - also runs about 30% faster on my machine.