Experimental playground for Gemma 3 vision #12348
ngxson
started this conversation in
Show and tell
Replies: 1 comment
-
no need for build if you installed from brew btw (with ![]() |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I mirror the guide from #12344 for more visibility.
To support Gemma 3 vision model, a new binary
llama-gemma3-cli
was added to provide a playground, support chat mode and simple completion mode.Important
Please note that this is not intended to be a prod-ready product, but mostly acts as a demo. Please refer to #11292 for future plan of the vision support.
How to try this?
Step 1: Get the text model
Download it from: https://huggingface.co/collections/ggml-org/gemma-3-67d126315ac810df1ad9e913
Step 2: Get the mmproj (multi-modal projection) model
Option 1: Download the pre-quantized version from HF: https://huggingface.co/collections/ggml-org/gemma-3-67d126315ac810df1ad9e913
(You must download both the text model and the
mmproj
file)Option 2: Convert it yourself
We will need
model.gguf
generated from theconvert_hf_to_gguf.py
script above, plus vision tower saved inmmproj.gguf
Firstly, get the
mmproj.gguf
file:Step 3: Compile and run
Clone this repo and compile
llama-gemma3-cli
cd llama.cpp cmake -B build cmake --build build -j --target llama-gemma3-cli
Run it:
Example output:
Beta Was this translation helpful? Give feedback.
All reactions