Experimental playground for Gemma 3 vision #12348

ngxson · 2025-03-12T10:00:26Z

ngxson
Mar 12, 2025
Collaborator

I mirror the guide from #12344 for more visibility.

To support Gemma 3 vision model, a new binary llama-gemma3-cli was added to provide a playground, support chat mode and simple completion mode.

Important

Please note that this is not intended to be a prod-ready product, but mostly acts as a demo. Please refer to #11292 for future plan of the vision support.

How to try this?

Step 1: Get the text model

Download it from: https://huggingface.co/collections/ggml-org/gemma-3-67d126315ac810df1ad9e913

Step 2: Get the mmproj (multi-modal projection) model

Option 1: Download the pre-quantized version from HF: https://huggingface.co/collections/ggml-org/gemma-3-67d126315ac810df1ad9e913

(You must download both the text model and the mmproj file)

Option 2: Convert it yourself

We will need model.gguf generated from the convert_hf_to_gguf.py script above, plus vision tower saved in mmproj.gguf

Firstly, get the mmproj.gguf file:

cd gemma-3-4b-it
python ~/work/llama.cpp-gemma/examples/llava/gemma3_convert_encoder_to_gguf.py .
# output file: mmproj.gguf

Step 3: Compile and run

Clone this repo and compile llama-gemma3-cli

cd llama.cpp
cmake -B build
cmake --build build -j --target llama-gemma3-cli

Run it:

./build/bin/llama-gemma3-cli -m model.gguf --mmproj mmproj.gguf

Example output:

 Running in chat mode, available commands:
   /image <path>    load an image
   /clear           clear the chat history
   /quit or /exit   exit the program

> hi    
Hello! How's it going today? 

Is there something specific on your mind, or were you simply saying hi? 😊 

I’m here to chat, answer questions, help with creative tasks, or just listen – whatever you need!

> /image ../models/bliss.png
Encoding image ../models/bliss.png

> what is that
That's a beautiful image!

julien-c · 2025-03-12T12:23:21Z

julien-c
Mar 12, 2025

no need for build if you installed from brew btw (with brew install llama.cpp --HEAD)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental playground for Gemma 3 vision #12348

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Experimental playground for Gemma 3 vision #12348

ngxson Mar 12, 2025 Collaborator

How to try this?

Step 1: Get the text model

Step 2: Get the mmproj (multi-modal projection) model

Step 3: Compile and run

Replies: 1 comment

julien-c Mar 12, 2025

ngxson
Mar 12, 2025
Collaborator

julien-c
Mar 12, 2025