20 Jul 07:16

mudler

f9f8379

v2.19.1

LocalAI 2.19.1 is out! 📣

TLDR; Summary spotlight

🖧 Federated Instances via P2P: LocalAI now supports federated instances with P2P, offering both load-balanced and non-load-balanced options.
🎛️ P2P Dashboard: A new dashboard to guide and assist in setting up P2P instances with auto-discovery using shared tokens.
🔊 TTS Integration: Text-to-Speech (TTS) is now included in the binary releases.
🛠️ Enhanced Installer: The installer script now supports setting up federated instances.
📥 Model Pulling: Models can now be pulled directly via URL.
🖼️ WebUI Enhancements: Visual improvements and cleanups to the WebUI and model lists.
🧠 llama-cpp Backend: The llama-cpp (grpc) backend now supports embedding ( https://localai.io/features/embeddings/#llamacpp-embeddings )
⚙️ Tool Support: Small enhancements to tools with disabled grammars.

🖧 LocalAI Federation and AI swarms

LocalAI is revolutionizing the future of distributed AI workloads by making it simpler and more accessible. No more complex setups, Docker or Kubernetes configurations – LocalAI allows you to create your own AI cluster with minimal friction. By auto-discovering and sharing work or weights of the LLM model across your existing devices, LocalAI aims to scale both horizontally and vertically with ease.

How it works?

Starting LocalAI with --p2p generates a shared token for connecting multiple instances: and that's all you need to create AI clusters, eliminating the need for intricate network setups. Simply navigate to the "Swarm" section in the WebUI and follow the on-screen instructions.

For fully shared instances, initiate LocalAI with --p2p --federated and adhere to the Swarm section's guidance. This feature, while still experimental, offers a tech preview quality experience.

Federated LocalAI

Launch multiple LocalAI instances and cluster them together to share requests across the cluster. The "Swarm" tab in the WebUI provides one-liner instructions on connecting various LocalAI instances using a shared token. Instances will auto-discover each other, even across different networks.

Check out a demonstration video: Watch now

LocalAI P2P Workers

Distribute weights across nodes by starting multiple LocalAI workers, currently available only on the llama.cpp backend, with plans to expand to other backends soon.

Check out a demonstration video: Watch now

What's Changed

Bug fixes 🐛

fix: make sure the GNUMake jobserver is passed to cmake for the llama.cpp build by @cryptk in #2697
Using exec when starting a backend instead of spawning a new process by @a17t in #2720
fix(cuda): downgrade default version from 12.5 to 12.4 by @mudler in #2707
fix: Lora loading by @vaaale in #2893
fix: short-circuit when nodes aren't detected by @mudler in #2909
fix: do not list txt files as potential models by @mudler in #2910

🖧 P2P area

feat(p2p): Federation and AI swarms by @mudler in #2723
feat(p2p): allow to disable DHT and use only LAN by @mudler in #2751

Exciting New Features 🎉

Allows to remove a backend from the list by @mauromorales in #2721
ci(Makefile): adds tts in binary releases by @mudler in #2695
feat: HF /scan endpoint by @dave-gray101 in #2566
feat(model-list): be consistent, skip known files from listing by @mudler in #2760
feat(models): pull models from urls by @mudler in #2750
feat(webui): show also models without a config in the welcome page by @mudler in #2772
feat(install.sh): support federated install by @mudler in #2752
feat(llama.cpp): support embeddings endpoints by @mudler in #2871
feat(functions): parse broken JSON when we parse the raw results, use dynamic rules for grammar keys by @mudler in #2912
feat(federation): add load balanced option by @mudler in #2915

🧠 Models

models(gallery): ⬆️ update checksum by @localai-bot in #2701
models(gallery): add l3-8b-everything-cot by @mudler in #2705
models(gallery): add hercules-5.0-qwen2-7b by @mudler in #2708
models(gallery): add llama3-8b-darkidol-2.2-uncensored-1048k-iq-imatrix by @mudler in #2710
models(gallery): add llama-3-llamilitary by @mudler in #2711
models(gallery): add tess-v2.5-gemma-2-27b-alpha by @mudler in #2712
models(gallery): add arcee-agent by @mudler in #2713
models(gallery): add gemma2-daybreak by @mudler in #2714
models(gallery): add L3-Stheno-Maid-Blackroot-Grand-HORROR-16B-GGUF by @mudler in #2715
models(gallery): add qwen2-7b-instruct-v0.8 by @mudler in #2717
models(gallery): add internlm2_5-7b-chat-1m by @mudler in #2719
models(gallery): add gemma-2-9b-it-sppo-iter3 by @mudler in #2722
models(gallery): add llama-3_8b_unaligned_alpha by @mudler in #2727
models(gallery): add l3-8b-lunaris-v1 by @mudler in #2729
models(gallery): add llama-3_8b_unaligned_alpha_rp_soup-i1 by @mudler in #2734
models(gallery): add hathor_respawn-l3-8b-v0.8 by @mudler in #2738
models(gallery): add llama3-8b-instruct-replete-adapted by @mudler in #2739
models(gallery): add llama-3-perky-pat-instruct-8b by @mudler in #2740
models(gallery): add l3-uncen-merger-omelette-rp-v0.2-8b by @mudler in #2741
models(gallery): add nymph_8b-i1 by @mudler in #2742
models(gallery): add smegmma-9b-v1 by @mudler in #2743
models(gallery): add hathor_tahsin-l3-8b-v0.85 by @mudler in #2762
models(gallery): add replete-coder-instruct-8b-merged by @mudler in #2782
models(gallery): add arliai-llama-3-8b-formax-v1.0 by @mudler in #2783
models(gallery): add smegmma-deluxe-9b-v1 by @mudler in #2784
models(gallery): add l3-ms-astoria-8b by @mudler in #2785
models(gallery): add halomaidrp-v1.33-15b-l3-i1 by @mudler in #2786
models(gallery): add llama-3-patronus-lynx-70b-instruct by @mudler in #2788
models(gallery): add llamax3 by @mudler in #2849
models(gallery): add arliai-llama-3-8b-dolfin-v0.5 by @mudler in #2852
models(gallery): add tiger-gemma-9b-v1-i1 by @mudler in #2853
feat: models(gallery): add deepseek-v2-lite by @mudler in #2658
models(gallery): ⬆️ update checksum by @localai-bot in #2860
models(gallery): add phi-3.1-mini-4k-instruct by @mudler in #2863
models(gallery): ⬆️ update checksum by @localai-bot in #2887
models(gallery): add ezo model series (llama3, gemma) by @mudler in #2891
models(gallery): add l3-8b-niitama-v1 by @mudler in #2895
models(gallery): add mathstral-7b-v0.1-imat by @mudler in #2901
models(gallery): add MythicalMaid/EtherealMaid 15b by @mudler in #2902
models(gallery): add flammenai/Mahou-1.3d-mistral-7B by @mudler in #2903
models(gallery): add big-tiger-gemma-27b-v1 by @mudler in #2918
models(gallery): add phillama-3.8b-v0.1 by @mudler in #2920
models(gallery): add qwen2-wukong-7b by @mudler in #2921
models(gallery): add einstein-v4-7b by @mudler in #2922
models(gallery): add gemma-2b-translation-v0.150 by @mudler in #2923
models(gallery)...

Contributors

cryptk, mauromorales, and 9 other contributors

Assets 11

19 Jul 17:44

mudler

v2.19.0

f19ee46

v2.19.0

LocalAI 2.19.0 is out! 📣

TLDR; Summary spotlight

🖧 Federated Instances via P2P: LocalAI now supports federated instances with P2P, offering both load-balanced and non-load-balanced options.
🎛️ P2P Dashboard: A new dashboard to guide and assist in setting up P2P instances with auto-discovery using shared tokens.
🔊 TTS Integration: Text-to-Speech (TTS) is now included in the binary releases.
🛠️ Enhanced Installer: The installer script now supports setting up federated instances.
📥 Model Pulling: Models can now be pulled directly via URL.
🖼️ WebUI Enhancements: Visual improvements and cleanups to the WebUI and model lists.
🧠 llama-cpp Backend: The llama-cpp (grpc) backend now supports embedding ( https://localai.io/features/embeddings/#llamacpp-embeddings )
⚙️ Tool Support: Small enhancements to tools with disabled grammars.

🖧 LocalAI Federation and AI swarms

How it works?

For fully shared instances, initiate LocalAI with --p2p --federated and adhere to the Swarm section's guidance. This feature, while still experimental, offers a tech preview quality experience.

Federated LocalAI

Check out a demonstration video: Watch now

LocalAI P2P Workers

Distribute weights across nodes by starting multiple LocalAI workers, currently available only on the llama.cpp backend, with plans to expand to other backends soon.

Check out a demonstration video: Watch now

What's Changed

Bug fixes 🐛

fix: make sure the GNUMake jobserver is passed to cmake for the llama.cpp build by @cryptk in #2697
Using exec when starting a backend instead of spawning a new process by @a17t in #2720
fix(cuda): downgrade default version from 12.5 to 12.4 by @mudler in #2707
fix: Lora loading by @vaaale in #2893
fix: short-circuit when nodes aren't detected by @mudler in #2909
fix: do not list txt files as potential models by @mudler in #2910

🖧 P2P area

feat(p2p): Federation and AI swarms by @mudler in #2723
feat(p2p): allow to disable DHT and use only LAN by @mudler in #2751

Exciting New Features 🎉

Allows to remove a backend from the list by @mauromorales in #2721
ci(Makefile): adds tts in binary releases by @mudler in #2695
feat: HF /scan endpoint by @dave-gray101 in #2566
feat(model-list): be consistent, skip known files from listing by @mudler in #2760
feat(models): pull models from urls by @mudler in #2750
feat(webui): show also models without a config in the welcome page by @mudler in #2772
feat(install.sh): support federated install by @mudler in #2752
feat(llama.cpp): support embeddings endpoints by @mudler in #2871
feat(functions): parse broken JSON when we parse the raw results, use dynamic rules for grammar keys by @mudler in #2912
feat(federation): add load balanced option by @mudler in #2915

🧠 Models

models(gallery): ⬆️ update checksum by @localai-bot in #2701
models(gallery): add l3-8b-everything-cot by @mudler in #2705
models(gallery): add hercules-5.0-qwen2-7b by @mudler in #2708
models(gallery): add llama3-8b-darkidol-2.2-uncensored-1048k-iq-imatrix by @mudler in #2710
models(gallery): add llama-3-llamilitary by @mudler in #2711
models(gallery): add tess-v2.5-gemma-2-27b-alpha by @mudler in #2712
models(gallery): add arcee-agent by @mudler in #2713
models(gallery): add gemma2-daybreak by @mudler in #2714
models(gallery): add L3-Stheno-Maid-Blackroot-Grand-HORROR-16B-GGUF by @mudler in #2715
models(gallery): add qwen2-7b-instruct-v0.8 by @mudler in #2717
models(gallery): add internlm2_5-7b-chat-1m by @mudler in #2719
models(gallery): add gemma-2-9b-it-sppo-iter3 by @mudler in #2722
models(gallery): add llama-3_8b_unaligned_alpha by @mudler in #2727
models(gallery): add l3-8b-lunaris-v1 by @mudler in #2729
models(gallery): add llama-3_8b_unaligned_alpha_rp_soup-i1 by @mudler in #2734
models(gallery): add hathor_respawn-l3-8b-v0.8 by @mudler in #2738
models(gallery): add llama3-8b-instruct-replete-adapted by @mudler in #2739
models(gallery): add llama-3-perky-pat-instruct-8b by @mudler in #2740
models(gallery): add l3-uncen-merger-omelette-rp-v0.2-8b by @mudler in #2741
models(gallery): add nymph_8b-i1 by @mudler in #2742
models(gallery): add smegmma-9b-v1 by @mudler in #2743
models(gallery): add hathor_tahsin-l3-8b-v0.85 by @mudler in #2762
models(gallery): add replete-coder-instruct-8b-merged by @mudler in #2782
models(gallery): add arliai-llama-3-8b-formax-v1.0 by @mudler in #2783
models(gallery): add smegmma-deluxe-9b-v1 by @mudler in #2784
models(gallery): add l3-ms-astoria-8b by @mudler in #2785
models(gallery): add halomaidrp-v1.33-15b-l3-i1 by @mudler in #2786
models(gallery): add llama-3-patronus-lynx-70b-instruct by @mudler in #2788
models(gallery): add llamax3 by @mudler in #2849
models(gallery): add arliai-llama-3-8b-dolfin-v0.5 by @mudler in #2852
models(gallery): add tiger-gemma-9b-v1-i1 by @mudler in #2853
feat: models(gallery): add deepseek-v2-lite by @mudler in #2658
models(gallery): ⬆️ update checksum by @localai-bot in #2860
models(gallery): add phi-3.1-mini-4k-instruct by @mudler in #2863
models(gallery): ⬆️ update checksum by @localai-bot in #2887
models(gallery): add ezo model series (llama3, gemma) by @mudler in #2891
models(gallery): add l3-8b-niitama-v1 by @mudler in #2895
models(gallery): add mathstral-7b-v0.1-imat by @mudler in #2901
models(gallery): add MythicalMaid/EtherealMaid 15b by @mudler in #2902
models(gallery): add flammenai/Mahou-1.3d-mistral-7B by @mudler in #2903
models(gallery): add big-tiger-gemma-27b-v1 by @mudler in #2918
models(gallery): add phillama-3.8b-v0.1 by @mudler in #2920
models(gallery): add qwen2-wukong-7b by @mudler in #2921
models(gallery): add einstein-v4-7b by @mudler in #2922
models(gallery): add gemma-2b-translation-v0.150 by @mudler in #2923
models(gallery)...

Contributors

cryptk, mauromorales, and 9 other contributors

Assets 2

01 Jul 20:53

mudler

v2.18.1

b941732

v2.18.1

What's Changed

Bug fixes 🐛

fix(talk): identify the model by ID instead of name by @mudler in #2685
fix(initializer): do select backends that exist by @mudler in #2694

Exciting New Features 🎉

feat(backend): fallback with autodetect by @mudler in #2693

🧠 Models

models(gallery): add new-dawn-llama by @mudler in #2672
models(gallery): ⬆️ update checksum by @localai-bot in #2678
models(gallery): add l3-aethora-15b-v2 by @mudler in #2679
models(gallery): add bungo-l3-8b-iq-imatrix by @mudler in #2682
models(gallery): add llama3-8b-darkidol-2.1-uncensored-1048k-iq-imatrix by @mudler in #2686
models(gallery): add llm-compiler by @mudler in #2684
models(gallery): add llama3-turbcat-instruct-8b by @mudler in #2687
models(gallery): ⬆️ update checksum by @localai-bot in #2690

👒 Dependencies

⬆️ Update ggerganov/llama.cpp by @localai-bot in #2677
⬆️ Update docs version mudler/LocalAI by @localai-bot in #2676
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2683
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2689
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2696

Full Changelog: v2.18.0...v2.18.1

Contributors

mudler and localai-bot

Assets 9

28 Jun 14:17

mudler

v2.18.0

8d9a452

v2.18.0

⭐ Highlights

Here’s a quick overview of what’s new in 2.18.0:

🐳 Support for models in OCI registry (includes ollama)
🌋 Support for llama.cpp with vulkan (container images only for now)
🗣️ the transcription endpoint now can also translate with translate
⚙️ Adds repeat_last_n and properties_order as model configurations
⬆️ CUDA 12.5 Upgrade: we are now tracking the latest CUDA version (12.5).
💎 Gemma 2 model support!

🐋 Support for OCI Images and Ollama Models

You can now specify models using oci:// and ollama:// prefixes in your YAML config files. Here’s an example for Ollama models:

parameters:
  model: ollama://...

Start the Ollama model directly with:

local-ai run ollama://gemma:2b

Or download only the model by using:

local-ai models install ollama://gemma:2b

For standard OCI images, use the oci:// prefix. To build a compatible container image, use docker for example.

Your Dockerfile should look like this:

FROM scratch
COPY ./my_gguf_file.gguf /

You can actually use it to store also other model types (for instance safetensors files for stable diffusion) and YAML config files !

🌋 Vulkan Support for Llama.cpp

We’ve introduced Vulkan support for Llama.cpp! Check out our new image tags latest-vulkan-ffmpeg-core and v2.18.0-vulkan-ffmpeg-core.

🗣️ Transcription and Translation

Our transcription endpoint now supports translation! Simply add translate: true to your transcription requests to translate the transcription to English.

⚙️ Enhanced Model Configuration

We’ve added new configuration options repeat_last_n and properties_order to give you more control. Here’s how you can set them up in your model YAML file:

# Force JSON to return properties in the specified order
function:
   grammar:
      properties_order: "name,arguments"

And for setting repeat_last_n (specific to Llama.cpp):

parameters:
   repeat_last_n: 64

💎 Gemma 2!

Google has just dropped gemma 2 models (blog post here), you can already install and run gemma 2 models in LocalAI with

local-ai run gemma-2-27b-it
local-ai run gemma-2-9b-it

What's Changed

Bug fixes 🐛

fix(install.sh): correctly handle systemd service installation by @mudler in #2627
fix(worker): use dynaload for single binaries by @mudler in #2620
fix(install.sh): fix version typo by @mudler in #2645
fix(install.sh): move ARCH detection so it works also for mac by @mudler in #2646
fix(cli): remove duplicate alias by @mudler in #2654

Exciting New Features 🎉

feat: Upgrade to CUDA 12.5 by @reneleonhardt in #2601
feat(oci): support OCI images and Ollama models by @mudler in #2628
feat(whisper): add translate option by @mudler in #2649
feat(vulkan): add vulkan support to the llama.cpp backend by @mudler in #2648
feat(ui): allow to select between all the available models in the chat by @mudler in #2657
feat(build): only build llama.cpp relevant targets by @mudler in #2659
feat(options): add repeat_last_n by @mudler in #2660
feat(grammar): expose properties_order by @mudler in #2662

🧠 Models

models(gallery): add l3-umbral-mind-rp-v1.0-8b-iq-imatrix by @mudler in #2608
models(gallery): ⬆️ update checksum by @localai-bot in #2607
models(gallery): add llama-3-sec-chat by @mudler in #2611
models(gallery): add llama-3-cursedstock-v1.8-8b-iq-imatrix by @mudler in #2612
models(gallery): add llama3-8b-darkidol-1.1-iq-imatrix by @mudler in #2613
models(gallery): add magnum-72b-v1 by @mudler in #2614
models(gallery): add qwen2-1.5b-ita by @mudler in #2615
models(gallery): add hermes-2-theta-llama-3-70b by @mudler in #2626
models(gallery): ⬆️ update checksum by @localai-bot in #2630
models(gallery): add dark-idol-1.2 by @mudler in #2663
models(gallery): add einstein v7 qwen2 by @mudler in #2664
models(gallery): add arcee-spark by @mudler in #2665
models(gallery): add gemma2-9b-it and gemma2-27b-it by @mudler in #2670

📖 Documentation and examples

docs: update to include installer and update advanced YAML options by @mudler in #2631
feat(swagger): update swagger by @localai-bot in #2651
feat(swagger): update swagger by @localai-bot in #2666
telegram-bot example: Update LocalAI version (fixes #2638) by @greygoo in #2640

👒 Dependencies

⬆️ Update docs version mudler/LocalAI by @localai-bot in #2605
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2606
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2617
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2629
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2632
deps(llama.cpp): bump to latest, update build variables by @mudler in #2669
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2652
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2671

Other Changes

ci: bump parallel jobs by @mudler in #2633
chore: fix go.mod module by @sozercan in #2635
rf: centralize base64 image handling and secscan cleanup by @dave-gray101 in #2595
refactor: gallery inconsistencies by @mudler in #2647

New Contributors

@greygoo made their first contribution in #2640

Full Changelog: v2.17.1...v2.18.0

Contributors

sozercan, greygoo, and 4 other contributors

Assets 9

0 Join discussion

19 Jun 06:56

mudler

v2.17.1

8142bdc

v2.17.1

Highlights

This is a patch release to address issues with Linux single binary releases. It also adds support for Stable diffusion 3!

Stable diffusion 3

You can use Stable diffusion 3 by installing the model in the gallery (stable-diffusion-3-medium) or by placing this YAML file in the model folder:

backend: diffusers
diffusers:
  cuda: true
  enable_parameters: negative_prompt,num_inference_steps
  pipeline_type: StableDiffusion3Pipeline
f16: false
name: sd3
parameters:
  model: v2ray/stable-diffusion-3-medium-diffusers
step: 25

You can try then generating an image:

http://localhost:9091/v1/images/generations -H "Content-Type: application/json" -d '{
  "prompt": "A cute baby sea otter", "model": "sd3"
}

Example result:

What's Changed

Bug fixes 🐛

fix(single-binary): bundle ld.so by @mudler in #2602

Exciting New Features 🎉

feat(sd-3): add stablediffusion 3 support by @mudler in #2591
feat(talk): display an informative box, better colors by @mudler in #2600

📖 Documentation and examples

⬆️ Update docs version mudler/LocalAI by @localai-bot in #2593

👒 Dependencies

⬆️ Update ggerganov/llama.cpp by @localai-bot in #2594

Other Changes

⬆️ Update ggerganov/llama.cpp by @localai-bot in #2603

Full Changelog: v2.17.0...v2.17.1

Contributors

mudler and localai-bot

Assets 9

17 Jun 18:10

mudler

v2.17.0

2f29797

v2.17.0

Ahoj! this new release of LocalAI comes with tons of updates, and enhancements behind the scenes!

🌟 Highlights TLDR;

Automatic identification of GGUF models
New WebUI page to talk with an LLM!
https://models.localai.io is live! 🚀
Better arm64 and Apple silicon support
More models to the gallery!
New quickstart installer script
Enhancements to mixed grammar support
Major improvements to transformers
Linux single binary now supports rocm, nvidia, and intel

🤖 Automatic model identification for llama.cpp-based models

Just drop your GGUF files into the model folders, and let LocalAI handle the configurations. YAML files are now reserved for those who love to tinker with advanced setups.

🔊 Talk to your LLM!

Introduced a new page that allows direct interaction with the LLM using audio transcription and TTS capabilities. This feature is so fun - now you can just talk with any LLM with a couple of clicks away.

🍏 Apple single-binary

Experience enhanced support for the Apple ecosystem with a comprehensive single-binary that packs all necessary libraries, ensuring LocalAI runs smoothly on MacOS and ARM64 architectures.

ARM64

Expanded our support for ARM64 with new Docker images and single binary options, ensuring better compatibility and performance on ARM-based systems.

Note: currently we support only arm core images, for instance: localai/localai:master-ffmpeg-core, localai/localai:latest-ffmpeg-core, localai/localai:v2.17.0-ffmpeg-core.

🐞 Bug Fixes and small enhancements

We’ve ironed out several issues, including image endpoint response types and other minor problems, boosting the stability and reliability of our applications. It is now also possible to enable CSRF when starting LocalAI, thanks to @dave-gray101.

🌐 Models and Galleries

Enhanced the model gallery with new additions like Mirai Nova, Mahou, and several updates to existing models ensuring better performance and accuracy.

Now you can check new models also in https://models.localai.io, without running LocalAI!

Installation and Setup

A new install.sh script is now available for quick and hassle-free installations, streamlining the setup process for new users.

curl https://localai.io/install.sh | sh

Installation can be configured with Environment variables, for example:

curl https://localai.io/install.sh | VAR=value sh

List of the Environment Variables:

DOCKER_INSTALL: Set to "true" to enable the installation of Docker images.
USE_AIO: Set to "true" to use the all-in-one LocalAI Docker image.
API_KEY: Specify an API key for accessing LocalAI, if required.
CORE_IMAGES: Set to "true" to download core LocalAI images.
PORT: Specifies the port on which LocalAI will run (default is 8080).
THREADS: Number of processor threads the application should use. Defaults to the number of logical cores minus one.
VERSION: Specifies the version of LocalAI to install. Defaults to the latest available version.
MODELS_PATH: Directory path where LocalAI models are stored (default is /usr/share/local-ai/models).

We are looking into improving the installer, and as this is a first iteration any feedback is welcome! Open up an issue if something doesn't work for you!

Enhancements to mixed grammar support

Mixed grammar support continues receiving improvements behind the scenes.

🐍 Transformers backend enhancements

Temperature = 0 correctly handled as greedy search
Handles custom words as stop words
Implement KV cache
Phi 3 no more requires trust_remote_code: true flag

Shout-out to @fakezeta for these enhancements!

Install models with the CLI

Now the CLI can install models directly from the gallery. For instance:

local-ai run <model_name_in gallery>

This command ensures the model is installed in the model folder at startup.

🐧 Linux single binary now supports rocm, nvidia, and intel

Single binaries for Linux now contain Intel, AMD GPU, and NVIDIA support. Note that you need to install the dependencies separately in the system to leverage these features. In upcoming releases, this requirement will be handled by the installer script.

📣 Let's Make Some Noise!

A gigantic THANK YOU to everyone who’s contributed—your feedback, bug squashing, and feature suggestions are what make LocalAI shine. To all our heroes out there supporting other users and sharing their expertise, you’re the real MVPs!

Remember, LocalAI thrives on community support—not big corporate bucks. If you love what we're building, show some love! A shoutout on social (@LocalAI_OSS and @mudler_it on twitter/X), joining our sponsors, or simply starring us on GitHub makes all the difference.

Also, if you haven't yet joined our Discord, come on over! Here's the link: https://discord.gg/uJAeKSAGDy

Thanks a ton, and.. enjoy this release!

What's Changed

Bug fixes 🐛

fix: gpu fetch device info by @sozercan in #2403
fix(watcher): do not emit fatal errors by @mudler in #2410
fix: install pytorch from proper index for hipblas builds by @cryptk in #2413
fix: pin version of setuptools for intel builds to work around #2406 by @cryptk in #2414
bugfix: CUDA acceleration not working by @fakezeta in #2475
fix: pkg/downloader should respect basePath for file:// urls by @dave-gray101 in #2481
fix: chat webui response parsing by @sozercan in #2515
fix(stream): do not break channel consumption by @mudler in #2517
fix(Makefile): enable STATIC on dist by @mudler in #2569

Exciting New Features 🎉

feat(images): do not install python deps in the core image by @mudler in #2425
feat(hipblas): extend default hipblas GPU_TARGETS by @mudler in #2426
feat(build): add arm64 core containers by @mudler in #2421
feat(functions): allow parallel calls with mixed/no grammars by @mudler in #2432
feat(image): support response_type in the OpenAI API request by @prajwalnayak7 in #2347
feat(swagger): update swagger by @localai-bot in #2436
feat(functions): better free string matching, allow to expect strings after JSON by @mudler in #2445
build(Makefile): add back single target to build native llama-cpp by @mudler in #2448
feat(functions): allow response_regex to be a list by @mudler in #2447
TTS API improvements by @blob42 in #2308
feat(transformers): various enhancements to the transformers backend by @fakezeta in #2468
feat(webui): enhance card visibility by @mudler in #2473
feat(default): use number of physical cores as default by @mudler in #2483
feat: fiber CSRF by @dave-gray101 in #2482
feat(amdgpu): try to build in single binary by @mudler in #2485
feat:OpaqueErrors to hide error information by @dave-gray101 in #2486
build(intel): bundle intel variants in single-binary by @mudler in #2494
feat(install): add install.sh for quick installs by @mudler in #2489
feat(llama.cpp): guess model defaults from file by @mudler in #2522
feat(ui): add page to talk with voice, transcription, and tts by @mudler in #2520
feat(arm64): enable single-binary builds by @mudler in #2490
feat(util): add util command to print GGUF informations by @mudler in #2528
feat(defaults): add defaults for Command-R models by @mudler in #2529
feat(detection): detect by template in gguf file, add qwen2, phi, mistral and chatml by @mudler in #2536
feat(gallery): show available models in website, allow local-ai models install to install from galleries by @mudler in #2555
feat(gallery): uniform download from CLI by @mudler in #2559
feat(guesser): identify gemma models by @mudler in #2561
feat(binary): support extracted bundled libs on darwin by @mudler in #2563
feat(darwin): embed grpc libs by @mudler in #2567
feat(build): bundle libs for arm64 and x86 linux binaries by @mudler in #2572
feat(libpath): refactor and expose functions for external library paths by @mudler in #2578

🧠 Models

models(gallery): add Mirai Nova by @mudler in https://github.com/mudler/LocalAI/pu...

Contributors

blob42, cryptk, and 9 other contributors

Assets 9

0 Join discussion

24 May 17:35

mudler

v2.16.0

e0187c2

v2.16.0

Welcome to LocalAI's latest update!

🎉🎉🎉 woot woot! So excited to share this release, a lot of new features are landing in LocalAI!!!!! 🎉🎉🎉

🌟 Introducing Distributed Llama.cpp Inferencing

Now it is possible to distribute the inferencing workload across different workers with llama.cpp models !

This feature has landed with #2324 and is based on the upstream work of @rgerganov in ggerganov/llama.cpp#6829.

How it works: a front-end server manages the requests compatible with the OpenAI API (LocalAI) and workers (llama.cpp) are used to distribute the workload. This makes possible to run larger models split across different nodes!

How to use it

To start workers to offload the computation you can run:

local-ai llamacpp-worker <listening_address> <listening_port>

However, you can also follow the llama.cpp README and building the rpc-server (https://github.com/ggerganov/llama.cpp/blob/master/examples/rpc/README.md), which is still compatible with LocalAI.

When starting the LocalAI server, which is going to accept the API requests, you can set a list of workers IP/address by specifying the addresses with LLAMACPP_GRPC_SERVERS:

LLAMACPP_GRPC_SERVERS="address1:port,address2:port" local-ai run

At this point the workload hitting in the LocalAI server should be distributed across the nodes!

🤖 Peer2Peer llama.cpp

LocalAI is the first AI Free, Open source project offering complete, decentralized, peer2peer while private, LLM inferencing on top of the libp2p protocol. There is no "public swarm" to offload the computation, but rather empowers you to build your own cluster of local and remote machines to distribute LLM computation.

This feature leverages the ability of llama.cpp to distribute the workload explained just above and features from one of my other projects, https://github.com/mudler/edgevpn.

LocalAI builds on top of the twos, and allows to create a private peer2peer network between nodes, without the need of centralizing connections or manually configuring IP addresses: it unlocks totally decentralized, private, peer-to-peer inferencing capabilities. Works also behind different NAT-ted networks (uses DHT and mDNS as discovery mechanism).

How it works: A pre-shared token can be generated and shared between workers and the server to form a private, decentralized, p2p network.

You can see the feature in action here:

How to use it

Start the server with --p2p:

./local-ai run --p2p
# 1:02AM INF loading environment variables from file envFile=.env
# 1:02AM INF Setting logging to info
# 1:02AM INF P2P mode enabled
# 1:02AM INF No token provided, generating one
# 1:02AM INF Generated Token:
# XXXXXXXXXXX
# 1:02AM INF Press a button to proceed

A token is displayed, copy it and press enter.

You can re-use the same token later restarting the server with --p2ptoken (or P2P_TOKEN).

Start the workers. Now you can copy the local-ai binary in other hosts, and run as many workers with that token:

TOKEN=XXX ./local-ai  p2p-llama-cpp-rpc
# 1:06AM INF loading environment variables from file envFile=.env
# 1:06AM INF Setting logging to info
# {"level":"INFO","time":"2024-05-19T01:06:01.794+0200","caller":"config/config.go:288","message":"connmanager disabled\n"}
# {"level":"INFO","time":"2024-05-19T01:06:01.794+0200","caller":"config/config.go:295","message":" go-libp2p resource manager protection enabled"}
# {"level":"INFO","time":"2024-05-19T01:06:01.794+0200","caller":"config/config.go:409","message":"max connections: 100\n"}
# 1:06AM INF Starting llama-cpp-rpc-server on '127.0.0.1:34371'
# {"level":"INFO","time":"2024-05-19T01:06:01.794+0200","caller":"node/node.go:118","message":" Starting EdgeVPN network"}
# create_backend: using CPU backend
# Starting RPC server on 127.0.0.1:34371, backend memory: 31913 MB
# 2024/05/19 01:06:01 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). # See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.
# {"level":"INFO","time":"2024-05-19T01:06:01.805+0200","caller":"node/node.go:172","message":" Node ID: 12D3KooWJ7WQAbCWKfJgjw2oMMGGss9diw3Sov5hVWi8t4DMgx92"}
# {"level":"INFO","time":"2024-05-19T01:06:01.806+0200","caller":"node/node.go:173","message":" Node Addresses: [/ip4/127.0.0.1/tcp/44931 /ip4/127.0.0.1/udp/33251/quic-v1/webtransport/certhash/uEiAWAhZ-W9yx2ZHnKQm3BE_ft5jjoc468z5-Rgr9XdfjeQ/certhash/uEiB8Uwn0M2TQBELaV2m4lqypIAY2S-2ZMf7lt_N5LS6ojw /ip4/127.0.0.1/udp/35660/quic-v1 /ip4/192.168.68.110/tcp/44931 /ip4/192.168.68.110/udp/33251/quic-v1/webtransport/certhash/uEiAWAhZ-W9yx2ZHnKQm3BE_ft5jjoc468z5-Rgr9XdfjeQ/certhash/uEiB8Uwn0M2TQBELaV2m4lqypIAY2S-2ZMf7lt_N5LS6ojw /ip4/192.168.68.110/udp/35660/quic-v1 /ip6/::1/tcp/41289 /ip6/::1/udp/33160/quic-v1/webtransport/certhash/uEiAWAhZ-W9yx2ZHnKQm3BE_ft5jjoc468z5-Rgr9XdfjeQ/certhash/uEiB8Uwn0M2TQBELaV2m4lqypIAY2S-2ZMf7lt_N5LS6ojw /ip6/::1/udp/35701/quic-v1]"}
# {"level":"INFO","time":"2024-05-19T01:06:01.806+0200","caller":"discovery/dht.go:104","message":" Bootstrapping DHT"}

(Note you can also supply the token via args)

At this point, you should see in the server logs messages stating that new workers are found

Now you can start doing inference as usual on the server (the node used on step 1)

Interested in to try it out? As we are still updating the documentation, you can read the full instructions here #2343

📜 Advanced Function calling support with Mixed JSON Grammars

LocalAI gets better at function calling with mixed grammars!

With this release, LocalAI introduces a transformative capability: support for mixed JSON BNF grammars. It allows to specify a grammar for the LLM that allows to output structured JSON and free text.

How to use it:

To enable mixed grammars, you can set in the YAML configuration file function.mixed_mode = true, for example:

  function:
    # disable injecting the "answer" tool
    disable_no_action: true

    grammar:
      # This allows the grammar to also return messages
      mixed_mode: true

This feature significantly enhances LocalAI's ability to interpret and manipulate JSON data coming from the LLM through a more flexible and powerful grammar system. Users can now combine multiple grammar types within a single JSON structure, allowing for dynamic parsing and validation scenarios.

Grammars can also turned off entirely and leave the user to determine how the data is parsed from the LLM to be correctly interpretated by LocalAI to be still compliant to the OpenAI REST spec.

For example, to interpret Hermes results, one can just annotate regexes in function.json_regex_match to extract the LLM response:

  function:
    grammar:
      disable: true
    # disable injecting the "answer" tool
    disable_no_action: true
    return_name_in_function_response: true

    json_regex_match:
    - "(?s)<tool_call>(.*?)</tool_call>"
    - "(?s)<tool_call>(.*?)"
  
    replace_llm_results:
    # Drop the scratchpad content from responses
    - key: "(?s)<scratchpad>.*</scratchpad>"
      value: ""
    replace_function_results:
    # Replace everything that is not JSON array or object, just in case.
    - key: '(?s)^[^{\[]*'
      value: ""
    - key: '(?s)[^}\]]*$'
      value: ""
    # Drop the scratchpad content from responses
    - key: "(?s)<scratchpad>.*</scratchpad>"
      value: ""

Note that regex can still be used when enabling mixed grammars is enabled.

This is especially important for models which does not support grammars - such as transformers or OpenVINO models, that now can support as well function calling. As we update the docs, further documentation can be found in the PRs that you can find in the changelog below.

🚀 New Model Additions and Updates

Our model gallery continues to grow with exciting new additions like Aya-35b, Mistral-0.3, Hermes-Theta and updates to existing models ensuring they remain at the cutting edge.

This release is having major enhancements on tool calling support. Besides working on making our default models in AIO images more performant - now you can try an enhanced out-of-the-box experience with function calling in the Hermes model family ( Hermes-2-Pro-Mistral and Hermes-2-Theta-Llama-3)

Our LocalAI function model!

I have fine-tuned a function call model specific to leverage entirely the grammar support of LocalAI, you can find it in the model gallery already and on huggingface

🔄 Single Binary Release: Simplified Deployment and Management

In our continuous effort to streamline the user experience and deployment process, LocalAI v2.16.0 proudly introduces a single binary release. This enha...

Contributors

rgerganov, cryptk, and 11 other contributors

Assets 6

09 May 17:20

mudler

v2.15.0

f69de3b

v2.15.0

🎉 LocalAI v2.15.0! 🚀

Hey awesome people! I'm happy to announce the release of LocalAI version 2.15.0! This update introduces several significant improvements and features, enhancing usability, functionality, and user experience across the board. Dive into the key highlights below, and don't forget to check out the full changelog for more detailed updates.

🌍 WebUI Upgrades: Turbocharged!

🚀 Vision API Integration

The Chat WebUI now seamlessly integrates with the Vision API, making it easier for users to test image processing models directly through the browser interface - this is a very simple and hackable interface in less then 400L of code with Alpine.JS and HTMX!

💬 System Prompts in Chat

System prompts can be set in the WebUI chat, which guide the user through interactions more intuitively, making our chat interface smarter and more responsive.

🌟 Revamped Welcome Page

New to LocalAI or haven't installed any models yet? No worries! The updated welcome page now guides users through the model installation process, ensuring you're set up and ready to go without any hassle. This is a great first step for newcomers - thanks for your precious feedback!

🔄 Background Operations Indicator

Don't get lost with our new background operations indicator on the WebUI, which shows when tasks are running in the background.

🔍 Filter Models by Tag and Category

As our model gallery balloons, you can now effortlessly sift through models by tag and category, making finding what you need a breeze.

🔧 Single Binary Release

LocalAI is expanding into offering single binary releases, simplifying the deployment process and making it easier to get LocalAI up and running on any system.

For the moment we have condensed the builds which disables AVX and SSE instructions set. We are also planning to include cuda builds as well.

🧠 Expanded Model Gallery

This release introduces several exciting new models to our gallery, such as 'Soliloquy', 'tess', 'moondream2', 'llama3-instruct-coder' and 'aurora', enhancing the diversity and capability of our AI offerings. Our selection of one-click-install models is growing! We pick carefully model from the most trending ones on huggingface, feel free to submit your requests in a github issue, hop to our Discord or contribute by hosting your gallery, or.. even by adding models directly to LocalAI!

Want to share your model configurations and customizations? See the docs: https://localai.io/docs/getting-started/customize-model/

📣 Let's Make Some Noise!

Also, if you haven't yet joined our Discord, come on over! Here's the link: https://discord.gg/uJAeKSAGDy

Thanks a ton, and.. enjoy this release!

What's Changed

Bug fixes 🐛

fix(webui): correct documentation URL for text2img by @mudler in #2233
fix(ux): fix small glitches by @mudler in #2265

Exciting New Features 🎉

feat: update ROCM and use smaller image by @cryptk in #2196
feat(llama.cpp): do not specify backends to autoload and add llama.cpp variants by @mudler in #2232
fix(webui): display small navbar with smaller screens by @mudler in #2240
feat(startup): show CPU/GPU information with --debug by @mudler in #2241
feat(single-build): generate single binaries for releases by @mudler in #2246
feat(webui): ux improvements by @mudler in #2247
fix: OpenVINO winograd always disabled by @fakezeta in #2252
UI: flag trust_remote_code to users // favicon support by @dave-gray101 in #2253
feat(ui): prompt for chat, support vision, enhancements by @mudler in #2259

🧠 Models

fix(gallery): hermes-2-pro-llama3 models checksum changed by @Nold360 in #2236
models(gallery): add moondream2 by @mudler in #2237
models(gallery): add llama3-llava by @mudler in #2238
models(gallery): add llama3-instruct-coder by @mudler in #2242
models(gallery): update poppy porpoise by @mudler in #2243
models(gallery): add lumimaid by @mudler in #2244
models(gallery): add openbiollm by @mudler in #2245
gallery: Added some OpenVINO models by @fakezeta in #2249
models(gallery): Add Soliloquy by @mudler in #2260
models(gallery): add tess by @mudler in #2266
models(gallery): add lumimaid variant by @mudler in #2267
models(gallery): add kunocchini by @mudler in #2268
models(gallery): add aurora by @mudler in #2270
models(gallery): add tiamat by @mudler in #2269

📖 Documentation and examples

docs: updated Transformer parameters description by @fakezeta in #2234
Update readme: add ShellOracle to community integrations by @djcopley in #2254
Add missing Homebrew dependencies by @michaelmior in #2256

👒 Dependencies

⬆️ Update docs version mudler/LocalAI by @localai-bot in #2228
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2229
⬆️ Update ggerganov/whisper.cpp by @localai-bot in #2230
build(deps): bump tqdm from 4.65.0 to 4.66.3 in /examples/langchain/langchainpy-localai-example in the pip group across 1 directory by @dependabot in #2231
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2239
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2251
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2255
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2263

Other Changes

test: check the response URL during image gen in app_test.go by @dave-gray101 in #2248

New Contributors

@Nold360 made their first contribution in #2236
@djcopley made their first contribution in #2254
@michaelmior made their first contribution in #2256

Full Changelog: v2.14.0...v2.15.0

Contributors

michaelmior, cryptk, and 7 other contributors

Assets 6

03 May 07:29

mudler

v2.14.0

b58274b

v2.14.0

🚀 AIO Image Update: llama3 has landed!

We're excited to announce that our AIO image has been upgraded with the latest LLM model, llama3, enhancing our capabilities with more accurate and dynamic responses. Behind the scenes uses https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF which is ready for function call, yay!

💬 WebUI enhancements: Updates in Chat, Image Generation, and TTS

Chat	TTS	Image gen

Our interfaces for Chat, Text-to-Speech (TTS), and Image Generation have finally landed. Enjoy streamlined and simple interactions thanks to the efforts of our team, led by @mudler, who have worked tirelessly to enhance your experience. The WebUI interface serves as a quick way to debug and assess models loaded in LocalAI - there is much to improve, but we have now a small, hackable interface!

🖼️ Many new models in the model gallery!

The model gallery has received a substantial upgrade with numerous new models, including Einstein v6.1, SOVL, and several specialized Llama3 iterations. These additions are designed to cater to a broader range of tasks , making LocalAI more versatile than ever. Kudos to @mudler for spearheading these exciting updates - now you can select with a couple of click the model you like!

🛠️ Robust Fixes and Optimizations

This update brings a series of crucial bug fixes and security enhancements to ensure our platform remains secure and efficient. Special thanks to @dave-gray101, @cryptk, and @fakezeta for their diligent work in rooting out and resolving these issues 🤗

✨ OpenVINO and more

We're introducing OpenVINO acceleration, and many OpenVINO models in the gallery. You can now enjoy fast-as-hell speed on Intel CPU and GPUs. Applause to @fakezeta for the contributions!

📚 Documentation and Dependency Upgrades

We've updated our documentation and dependencies to keep you equipped with the latest tools and knowledge. These updates ensure that LocalAI remains a robust and dependable platform.

👥 A Community Effort

A special shout-out to our new contributors, @QuinnPiers and @LeonSijiaLu, who have enriched our community with their first contributions. Welcome aboard, and thank you for your dedication and fresh insights!

Each update in this release not only enhances our platform's capabilities but also ensures a safer and more user-friendly experience. We are excited to see how our users leverage these new features in their projects, freel free to hit a line on Twitter or in any other social, we'd be happy to hear how you use LocalAI!

📣 Spread the word!

First off, a massive thank you (again!) to each and every one of you who've chipped in to squash bugs and suggest cool new features for LocalAI. Your help, kind words, and brilliant ideas are truly appreciated - more than words can say!

And to those of you who've been heros, giving up your own time to help out fellow users on Discord and in our repo, you're absolutely amazing. We couldn't have asked for a better community.

Just so you know, LocalAI doesn't have the luxury of big corporate sponsors behind it. It's all us, folks. So, if you've found value in what we're building together and want to keep the momentum going, consider showing your support. A little shoutout on your favorite social platforms using @LocalAI_OSS and @mudler_it or joining our sponsors can make a big difference.

Also, if you haven't yet joined our Discord, come on over! Here's the link: https://discord.gg/uJAeKSAGDy

Every bit of support, every mention, and every star adds up and helps us keep this ship sailing. Let's keep making LocalAI awesome together!

Thanks a ton, and.. exciting times ahead with LocalAI!

What's Changed

Bug fixes 🐛

fix: config_file_watcher.go - root all file reads for safety by @dave-gray101 in #2144
fix: github bump_docs.sh regex to drop emoji and other text by @dave-gray101 in #2180
fix: undefined symbol: iJIT_NotifyEvent in import torch ##2153 by @fakezeta in #2179
fix: security scanner warning noise: error handlers part 2 by @dave-gray101 in #2145
fix: ensure GNUMake jobserver is passed through to whisper.cpp build by @cryptk in #2187
fix: bring everything onto the same GRPC version to fix tests by @cryptk in #2199

Exciting New Features 🎉

feat(gallery): display job status also during navigation by @mudler in #2151
feat: cleanup Dockerfile and make final image a little smaller by @cryptk in #2146
fix: swap to WHISPER_CUDA per deprecation message from whisper.cpp by @cryptk in #2170
feat: only keep the build artifacts from the grpc build by @cryptk in #2172
feat(gallery): support model deletion by @mudler in #2173
refactor(application): introduce application global state by @dave-gray101 in #2072
feat: organize Dockerfile into distinct sections by @cryptk in #2181
feat: OpenVINO acceleration for embeddings in transformer backend by @fakezeta in #2190
chore: update go-stablediffusion to latest commit with Make jobserver fix by @cryptk in #2197
feat: user defined inference device for CUDA and OpenVINO by @fakezeta in #2212
feat(ux): Add chat, tts, and image-gen pages to the WebUI by @mudler in #2222
feat(aio): switch to llama3-based for LLM by @mudler in #2225
feat(ui): support multilineand style ul by @mudler in #2226

🧠 Models

models(gallery): add Einstein v6.1 by @mudler in #2152
models(gallery): add SOVL by @mudler in #2154
models(gallery): add average_normie by @mudler in #2155
models(gallery): add solana by @mudler in #2157
models(gallery): add poppy porpoise by @mudler in #2158
models(gallery): add Undi95/Llama-3-LewdPlay-8B-evo-GGUF by @mudler in #2160
models(gallery): add biomistral-7b by @mudler in #2161
models(gallery): add llama3-32k by @mudler in #2183
models(gallery): add openvino models by @mudler in #2184
models(gallery): add lexifun by @mudler in #2193
models(gallery): add suzume-llama-3-8B-multilingual-gguf by @mudler in #2194
models(gallery): add guillaumetell by @mudler in #2195
models(gallery): add wizardlm2 by @mudler in #2209
models(gallery): Add Hermes-2-Pro-Llama-3-8B-GGUF by @mudler in #2218

📖 Documentation and examples

⬆️ Update docs version mudler/LocalAI by @localai-bot in #2149
draft:Update model-gallery.md with correct gallery file by @QuinnPiers in #2163
docs: update gallery, add rerankers by @mudler in #2166
docs: enhance and condense few sections by @mudler in #2178
[Documentations] Removed invalid numberings from troubleshooting mac by @LeonSijiaLu in #2174

👒 Dependencies

⬆️ Update ggerganov/llama.cpp by @localai-bot in #2150
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2159
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2176
⬆️ Update ggerganov/whisper.cpp by @localai-bot in #2177
update go-tinydream to latest commit by @cryptk in #2182
build(deps): bump dependabot/fetch-metadata from 2.0.0 to 2.1.0 by @dependabot in #2186
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2189
⬆️ Update ggerganov/whisper.cpp by @localai-bot in #2188
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2203
⬆️ Update ggerganov/llama.cpp by @localai-bot in #2213

Other Changes

Revert ":arrow_up: Update docs version mudler/LocalAI" by @mudler in #2165
Issue-1720: Updated Build on mac documentations by @LeonSijiaLu in #2171
⬆️ Update ggerganov/llama.cpp by @localai-bot in https://github.com/mudler/LocalAI/pu...

Contributors

cryptk, mudler, and 6 other contributors

Assets 10

25 Apr 20:34

mudler

v2.13.0

c9451cb

🖼️ v2.13.0 - Model gallery edition

Hello folks, Ettore here - I'm happy to announce the v2.13.0 LocalAI release is out, with many features!

Below there is a small breakdown of the hottest features introduced in this release - however - there are many other improvements (especially from the community) as well, so don't miss out the changelog!

Check out the full changelog below for having an overview of all the changes that went in this release (this one is quite packed up).

🖼️ Model gallery

This is the first release with model gallery in the webUI, you can see now a "Model" button in the WebUI which lands now in a selection of models:

You can choose now models between stablediffusion, llama3, tts, embeddings and more! The gallery is growing steadly and being kept up-to-date.

The models are simple YAML files which are hosted in this repository: https://github.com/mudler/LocalAI/tree/master/gallery - you can host your own repository with your model index, or if you want you can contribute to LocalAI.

If you want to contribute adding models, you can by opening up a PR in the gallery directory: https://github.com/mudler/LocalAI/tree/master/gallery.

Rerankers

I'm excited to introduce a new backend for rerankers. LocalAI now implements the Jina API (https://jina.ai/reranker/#apiform) as a compatibility layer, and you can use existing Jina clients and point to those to the LocalAI address. Behind the hoods, uses https://github.com/AnswerDotAI/rerankers.

You can test this by using container images with python (this does NOT work with core images) and a model config file like this, or by installing cross-encoder from the gallery in the UI:

name: jina-reranker-v1-base-en
backend: rerankers
parameters:
  model: cross-encoder

and test it with:

    curl http://localhost:8080/v1/rerank \
      -H "Content-Type: application/json" \
      -d '{
      "model": "jina-reranker-v1-base-en",
      "query": "Organic skincare products for sensitive skin",
      "documents": [
        "Eco-friendly kitchenware for modern homes",
        "Biodegradable cleaning supplies for eco-conscious consumers",
        "Organic cotton baby clothes for sensitive skin",
        "Natural organic skincare range for sensitive skin",
        "Tech gadgets for smart homes: 2024 edition",
        "Sustainable gardening tools and compost solutions",
        "Sensitive skin-friendly facial cleansers and toners",
        "Organic food wraps and storage solutions",
        "All-natural pet food for dogs with allergies",
        "Yoga mats made from recycled materials"
      ],
      "top_n": 3
    }'

Parler-tts

There is a new backend available for tts now, parler-tts. It is possible to install and configure the model directly from the gallery. https://github.com/huggingface/parler-tts

🎈 Lot of small improvements behind the scenes!

Thanks to our outstanding community, we have enhanced the performance and stability of LocalAI across various modules. From backend optimizations to front-end adjustments, every tweak helps make LocalAI smoother and more robust.

📣 Spread the word!

And to those of you who've been heros, giving up your own time to help out fellow users on Discord and in our repo, you're absolutely amazing. We couldn't have asked for a better community.

Also, if you haven't yet joined our Discord, come on over! Here's the link: https://discord.gg/uJAeKSAGDy

Every bit of support, every mention, and every star adds up and helps us keep this ship sailing. Let's keep making LocalAI awesome together!

Thanks a ton, and here's to more exciting times ahead with LocalAI!

What's Changed

Bug fixes 🐛

fix(autogptq): do not use_triton with qwen-vl by @thiner in #1985
fix: respect concurrency from parent build parameters when building GRPC by @cryptk in #2023
ci: fix release pipeline missing dependencies by @mudler in #2025
fix: remove build path from help text documentation by @cryptk in #2037
fix: previous CLI rework broke debug logging by @cryptk in #2036
fix(fncall): fix regression introduced in #1963 by @mudler in #2048
fix: adjust some sources names to match the naming of their repositories by @cryptk in #2061
fix: move the GRPC cache generation workflow into it's own concurrency group by @cryptk in #2071
fix(llama.cpp): set -1 as default for max tokens by @mudler in #2087
fix(llama.cpp-ggml): fixup max_tokens for old backend by @mudler in #2094
fix missing TrustRemoteCode in OpenVINO model load by @fakezeta in #2114
Incl ocv pkg for diffsusers utils by @jtwolfe in #2115

Exciting New Features 🎉

feat: kong cli refactor fixes #1955 by @cryptk in #1974
feat: add flash-attn in nvidia and rocm envs by @golgeek in #1995
feat: use tokenizer.apply_chat_template() in vLLM by @golgeek in #1990
feat(gallery): support ConfigURLs by @mudler in #2012
fix: dont commit generated files to git by @cryptk in #1993
feat(parler-tts): Add new backend by @mudler in #2027
feat(grpc): return consumed token count and update response accordingly by @mudler in #2035
feat(store): add Golang client by @mudler in #1977
feat(functions): support models with no grammar, add tests by @mudler in #2068
refactor(template): isolate and add tests by @mudler in #2069
feat: fiber logs with zerlog and add trace level by @cryptk in #2082
models(gallery): add gallery by @mudler in #2078
Add tensor_parallel_size setting to vllm setting items by @Taikono-Himazin in #2085
Transformer Backend: Implementing use_tokenizer_template and stop_prompts options by @fakezeta in #2090
feat: Galleries UI by @mudler in #2104
Transformers Backend: max_tokens adherence to OpenAI API by @fakezeta in #2108
Fix cleanup sonarqube findings by @cryptk in #2106
feat(models-ui): minor visual enhancements by @mudler in #2109
fix(gallery): show a fake image if no there is no icon by @mudler in #2111
feat(rerankers): Add new backend, support jina rerankers API by @mudler in #2121

🧠 Models

models(llama3): add llama3 to embedded models by @mudler in #2074
feat(gallery): add llama3, hermes, phi-3, and others by @mudler in #2110
models(gallery): add new models to the gallery by @mudler in #2124
models(gallery): add more models by @mudler in #2129

📖 Documentation and examples

⬆️ Update docs version mudler/LocalAI by @localai-bot in #1988
docs: fix stores link by @adrienbrault in #2044
AMD/ROCm Documentation update + formatting fix by @jtwolfe in #2100

👒 Dependencies

deps: Update version of vLLM to add support of Cohere Command_R model in vLLM inference by @holyCowMp3 in #1975
⬆️ Update ggerganov/llama.cpp by @localai-bot in #1991
build(deps): bump google.golang.org/protobuf from 1.31.0 to 1.33.0 by @dependabot in #1998
build(deps): bump github.com/docker/docker from 20.10.7+incompatible to 24.0.9+incompatible by @dependabot in #1999
build(deps): bump github.com/gofiber/fiber/v2 from 2.52.0 to 2.52.1 by @dependabot in #2001
build(deps): bump actions/checkout from 3 to 4 by @dependabot in #2002
build(deps): bump actions/setup-go from 4 to 5 by @dependabot in #2003
build(deps): bump peter-evans/create-pull-request from 5 to 6 by @dependabot in #2005
build(deps): bump actions/cache from ...

Contributors

cryptk, adrienbrault, and 11 other contributors

Assets 10

Releases: mudler/LocalAI

v2.19.1

LocalAI 2.19.1 is out! 📣

TLDR; Summary spotlight

🖧 LocalAI Federation and AI swarms

How it works?

Federated LocalAI

LocalAI P2P Workers

What's Changed

Bug fixes 🐛

🖧 P2P area

Exciting New Features 🎉

🧠 Models

Contributors

v2.19.0

LocalAI 2.19.0 is out! 📣

TLDR; Summary spotlight

🖧 LocalAI Federation and AI swarms

How it works?

Federated LocalAI

LocalAI P2P Workers

What's Changed

Bug fixes 🐛

🖧 P2P area

Exciting New Features 🎉

🧠 Models

Contributors

v2.18.1

What's Changed

Bug fixes 🐛

Exciting New Features 🎉

🧠 Models

👒 Dependencies

Contributors

v2.18.0

⭐ Highlights

🐋 Support for OCI Images and Ollama Models

🌋 Vulkan Support for Llama.cpp

🗣️ Transcription and Translation

⚙️ Enhanced Model Configuration

💎 Gemma 2!

What's Changed

Bug fixes 🐛

Exciting New Features 🎉

🧠 Models

📖 Documentation and examples

👒 Dependencies

Other Changes

New Contributors

Contributors

v2.17.1

Highlights

Stable diffusion 3

What's Changed

Bug fixes 🐛

Exciting New Features 🎉

📖 Documentation and examples

👒 Dependencies

Other Changes

Contributors

v2.17.0

🌟 Highlights TLDR;

🤖 Automatic model identification for llama.cpp-based models

🔊 Talk to your LLM!

🍏 Apple single-binary

ARM64

🐞 Bug Fixes and small enhancements

🌐 Models and Galleries

Installation and Setup

Enhancements to mixed grammar support

🐍 Transformers backend enhancements

Install models with the CLI

🐧 Linux single binary now supports rocm, nvidia, and intel

📣 Let's Make Some Noise!

What's Changed

Bug fixes 🐛

Exciting New Features 🎉

🧠 Models

Contributors

v2.16.0