To the Home Page

Fun with Gemma 4 on a laptop with Intel iGPU

Published on April 8, 2026 · Reading time: 5 minutes

This article is about generative AI, a rapidly moving and controversial domain of computer science. Some facts and opinions may quickly get outdated.

This article was finalized on April 7th.

Gemma 4 does this,” “Gemma 4 does that”…

Fine. Let’s run Google’s latest model on my PC and test if it can answer two simple questions.

The “President Benchmark”

Facts:

Questions:

  1. “Kto jest prezydentem Polski i dlaczego?” (Who is the president of Poland, and why?)
  2. “Czy może być wybrany na trzecią kadencję?” (Can they be elected for a third term?)

Expectations:

Hardware:

Software:

Results

Well…

Based on my hands-on experience, Gemma 4 is useful for simple code generation. There are small syntax errors or mismatched parameters, but I could easily find them and fix manually. For example, it used H instead of h when grouping data with Pandas. But, if you want to run the model on Vulkan, the 31B model can barely fit in 16 GB VRAM only if you set the context size to as low as 4096 tokens. It’s not enough for vibe coding, no matter how small your projects are. The CPU backend is slightly faster, but it still takes some time to get an answer.

Gemma can also be helpful for basic research. I’ve asked it how long the 1000 mAh Li-Po battery lasts if my 3.3 V microcontroller is using 15 mA thirty minutes a day. It gave me the answer I needed, but also asked some legitimate questions. Is my battery fully charged? Is there an LDO or a buck-boost converter? What about the other 23.5 hrs a day, is my device fully turned off or in a deep sleep mode? I’ve told it I’m using Raspberry Pi Pico, and it re-did all the math for me. Not sure if I fully trust such a response, but it’s better than nothing.

Web UI hosted by llama-server
Web UI hosted by llama-server

But if you want to get very precise answers, Gemma 4 31B is not the best solution. On the other hand, Gemma 2B is too slow on a consumer-grade laptop, so even if you had the MCP server with an optimized offline copy of Wikipedia, it may still be faster to read Wikipedia manually.

Summary

(This table can be scrolled horizontally on mobile.)

(out of 10) E2B (CPU) E2B (Vulkan) E4B (CPU) E4B (Vulkan) 26B‑A4B (CPU) 26B‑A4B (Vulkan) 31B (CPU) 31B (Vulkan)
Completed attempts 10 10 10 10 10 10 10 10
Successful attempts 10 10 9 10 3 6 10 8
Valid attempts 8 6 3 6 0 3 10 8

Completed attempt is an attempt that didn’t abruptly fail because of memory exhaustion, llama.cpp crash, system crash, etc.

Successful attempt is an attempt that didn’t get stuck on reasoning or answering.

Valid attempt is an attempt where the answer for question 2 is similar to: “they can’t be elected more than twice.”

Details

(This table can be scrolled horizontally on mobile.)

Run # E2B (CPU) E2B (Vulkan) E4B (CPU) E4B (Vulkan) 26B‑A4B (CPU) 26B‑A4B (Vulkan) 31B (CPU) 31B (Vulkan)
1 OK OK [1] any [2] one OK* [3]
2 OK OK OK any any OK OK* OK*
3 OK OK any OK stuck stuck OK OK
4 OK OK any OK one OK OK* OK*
5 any one any OK stuck one OK OK
6 any any any any stuck stuck OK* [3]
7 OK any any OK stuck stuck OK OK
8 OK OK OK any one OK OK* OK
9 OK any OK OK stuck stuck OK OK
10 OK OK any OK stuck one OK* OK

Legend:

Resource usage (average)

(This table can be scrolled horizontally on mobile.)

  E2B (CPU) E2B (Vulkan) E4B (CPU) E4B (Vulkan) 26B‑A4B (CPU) 26B‑A4B (Vulkan) 31B (CPU) 31B (Vulkan)
Session time (mm:ss, estimated) 01:40 01:52 02:34 02:18 04:16 03:51 11:57 12:14
Start-up
Start-up time (mm:ss, estimated) 00:03 00:03 00:04 00:03 00:20 00:23 00:32 00:26
Question 1
Prompt processing time (mm:ss) 00:00 00:00 00:00 00:00 00:00 00:02 00:04 00:04
Prompt processing throughput (t/s) 73.2 65.9 49.2 40.7 38.5 11.7 5.8 6.4
Prompt tokens 26 26 26 26 26 26 26 26
Prediction time (mm:ss) 00:38 00:48 00:53 00:52 00:54 01:28 06:03 06:10
Prediction throughput (t/s) 14.1 12.0 10.4 9.9 10.8 9.1 1.7 1.7
Prediction tokens 542 586 554 516 592 803 610 616
Question 2
Prompt processing time (mm:ss) 00:02 00:01 00:06 00:02 00:05 00:04 01:01 00:16
Prompt processing throughput (t/s) 78.5 175.9 42.4 118.3 35.0 78.3 4.8 17.1
Prompt tokens 191 211 255 241 192 319 296 287
Prediction time (mm:ss) 00:42 00:45 01:17 01:07 02:42 01:39 04:02 05:02
Prediction throughput (t/s) 13.4 11.5 9.8 9.3 9.6 8.4 1.6 1.6
Prediction tokens 572 523 758 627 1443 838 399 487
Peak memory usage
During start-up (GB) 5.2 5.85 5.79 6.37 17.22 17.86 22.07 22.78
Question 1 (GB) 5.21 5.85 5.79 6.44 17.23 17.9 22.09 22.78
Question 2 (GB) 5.27 5.85 5.88 6.47 17.44 18.15 22.99 23.55

Extra observations

Check out other blog posts: