Making framebuf text 10x faster in CircuitPython

Published on December 23, 2024 · Reading time: 4 minutes

So, you have installed CircuitPython onto the development board you own, and you want to use framebuf library to interact with the display buffer directly. Maybe you are kinda forced to do so, because displayio is not available for your device. Either way, you try some demo code, and… it takes a whole second to simply fill a little OLED screen with text.

Pretty bad, huh? Luckily, you can fix this by adding a few lines of code to your program, which will significantly improve the rendering time.

I have confirmed this code runs well on the Raspberry Pi Pico with a 128x64 SSD1306 OLED display connected via I2C. This may NOT work with color displays and pixel matrices.

SSD1306 display connected to the Pi Pico W on a breadboard

Why is font rendering slow?

The framebuf library is compatible with multiple types of displays, so it is expected that its code will have a unified API, and as a side effect, there will be some performance loss. Every character you are going to render needs to be read from the font file (there is no cache), and then it is written to the buffer one pixel at a time.

It doesn’t sound bad on its own, but here’s the catch - instead of drawing a single pixel, a single 1×1 px rectangle is drawn instead. Every rectangle needs to be rotated, checked if it fits on the screen, and then the buffer data is actually updated. On top of that, this is a pure Python implementation, so no wonder it is so slow.

# Go through each row in the column byte.
for char_y in range(self.font_height):
    # Draw a pixel for each bit that's flipped on.
    if (line >> char_y) & 0x1:
        framebuffer.fill_rect(
            x + char_x * size, y + char_y * size, size, size, color
        )

If you do not care about screen rotation, and you are sure you won’t go out of bounds, you can try to use a simpler implementation.

How can this be optimized?

It just so happens that the display buffer and font data are arranged in the same way. Monochrome displays with SSD1306, SH1106 or ST7565 driver have pixels arranged in pages, each one 8 pixels tall. Inside a font file, each column of a glyph is represented by a single byte that can be simply copied to the buffer.

The code responsible for drawing text is decoupled from the remaining framebuffer implementation. We can easily override the draw_char() method of BitmapFont class:

import struct
from adafruit_framebuf import BitmapFont

class FastBitmapFont(BitmapFont):
    def draw_char(self, char, x, y, framebuffer, color, size=1):
        if y % 8 != 0:
            # Not aligned to the page, going back to the default (slower) implementation.
            return super().draw_char(char, x, y, framebuffer, color, size)

        # Go through each column of the character.
        for char_x in range(self.font_width):
            # Grab the byte for the current column of font data.
            self._font.seek(2 + (ord(char) * self.font_width) + char_x)
            try:
                line = struct.unpack("B", self._font.read(1))[0]
            except RuntimeError:
                continue  # maybe character isnt there? go to next

            # THIS SINGLE LINE REPLACES THE framebuffer.fill_rect() CALL
            framebuffer.buf[framebuffer.width * (y >> 3) + x + char_x] |= line

Add this to your display initialization code:

display = SSD1306_I2C(128, 64, I2C(board.GP21, board.GP20))
display._font = FastBitmapFont()

Now the same program needs only about 200 ms to complete.

But wait, there’s more! Have you noticed that in each loop iteration, we read only one byte of glyph data? What if we read all necessary bytes at once and skip the struct library entirely?

from adafruit_framebuf import BitmapFont

class FastBitmapFont(BitmapFont):
    def draw_char(self, char, x, y, framebuffer, color, size=1):
        if y % 8 != 0:
            # Not aligned to the page, going back to the default (slower) implementation.
            return super().draw_char(char, x, y, framebuffer, color, size)

        # Grab bytes for the current glyph from font data.
        self._font.seek(2 + (ord(char) * self.font_width))
        data = self._font.read(self.font_width)

        # Go through each column of the character.
        for char_x in range(self.font_width):
            framebuffer.buf[framebuffer.width * (y >> 3) + x + char_x] |= data[char_x]

With this custom BitmapFont implementation, we are down to less than 100 ms. At this point, the only way to improve the performance is to either cache glyphs (which does not seem to help), try to increase I2C or SPI frequency, come up with another solution that does not use framebuf internals, or use another programming language.

Check out other blog posts:

Installing bootloader and CircuitPython on Waveshare Core52840

2025-07-02 · 6 min read

Figuring out the minimal circuit, finding a firmware, and flashing with OpenOCD.
Restarting unhealthy Compose containers with a one-liner

2025-05-12 · 3 min read

Adding the essential feature Docker Compose does and does not have at the same time.
EclairM0, the pocket notepad

2025-04-24 · 14 min read

Tiny device with great performance, long battery life, open hardware design and many use cases. Software written in TinyGo.

Why is font rendering slow?

How can this be optimized?

Installing bootloader and CircuitPython on Waveshare Core52840

Restarting unhealthy Compose containers with a one-liner

EclairM0, the pocket notepad