Llamero models are created by using the Llamero::BaseModel
class.
model = Llamero::BaseModel.new(model_name: "meta-llama-3-8b-instruct-Q6_K.gguf")
- Models are neat and easy ways to organize your settings for which model and how you want your model to work.
- You can re-use the same model and change the settings in the
chat
method signature if the defaults don't work for your needs. - Use the
quick_chat
method if you want to quickly test a model. - Use the
chat
method with agrammar_class
andprompt
- Models must be initialized with a
model_name
that represents the entire file name of the model. - By default, models will be found relative to the users home directory, under a folder
models
. You can provide a different folder by setting themodel_root_path
named parameter.
# How the default model folder is determined.
Path["/Users/#{`whoami`.strip}/models"]
Required parameters:
- a
Llamero::BasePrompt
or a subclass ofLlamero::BasePrompt
as the first parameter, or named parameter ofprompt_chain
grammar_class
as a named parameter, this should be a class instance of the expected structured response
Optional parameters:
max_retries
default of 5, this is the maximum number of attempts to get a valid response from the modeltemperature
this is aFloat32?
, the default is0.9
which is fairly creative. Adjust this up or down to adjust the models creativitymax_tokens
this is the max tokens for the model to generate. Default is 2048, which includes the system prompt and the provided promptrepeat_penalty
this is aFloat32?
, the default is1.1
which prevents responses that are too repetative.top_k_sampling
this is anInt32
, the default is80
which means the model will only consider the top 80 tokens when generating the next token.n_predict
this is anInt32
, the default is512
which means the model will generate 512 tokens in response to the provided prompt.temperature
: aFloat
between 0 and 1. Defaults to 0.5.max_tokens
: anInt
. Defaults to 1024.
The grammar_class
that is provided will be the returned value from the chat
method.
class CapitalOfTheMoonStructuredResponse < Llamero::BaseGrammar
property answer : String
end
prompt = Llamero::BasePrompt.new(
system_prompt: "You are a helpful assistant.",
user_prompt: "What is the capital of the moon?"
)
ai_model = Llamero::BaseModel.new(model_name: "meta-llama-3-8b-instruct-Q6_K.gguf")
response = ai_model.chat(prompt, grammar_class: CapitalOfTheMoonStructuredResponse.from_json(%({})))
quick_chat
accepts anArray
ofNamedTuples
, each containing arole
andcontent
.quick_chat
returns aString
.- This method was designed for doing rapid and simple testing.
require "llamero"
model = Llamero::BaseModel.new(model_name: "meta-llama-3-8b-instruct-Q6_K.gguf")
response = model.quick_chat([{ role: "user", content: "What is the capital of the moon?" }])
puts response