Photo processing designs

This post is a part of a series of posts about my design process for a photo gallery service for GDG Hanoi. You can find the introduction post here: Introduction.

This is the critical component of our gallery service, which peforms write operations for a photo entity to the gallery’s database and storage. It is the only component I had to build from scratch.

In this post, I would like to share how I design and build it.

Functionalities

For the triggered endpoint, the worker plays the role of a consumer in the system pub/sub pattern. Its source code provides a client that pulls messages from the remote message queue and forwards them to handler methods.

For registering a photo entity, the worker has to:

For deregistering a photo entity, it has to:

Implementation drafts

Sketch diagram

Photo processing diagram
Photo processing diagram

Programming language

I choose Golang; besides my interest in trying this programming language, it also provides many great utilities:

Image processing library

I choose libvips, which offers me low execution time and low memory usage for image operations.

Concurrency keynotes

By choosing Golang, I would like to deprecate the whole concurrency channel orchestration to its goroutines on the application level.

It’s worth mentioning here because libvips also offers its own concurrency model, so I have to keep it in mind to ensure the library does not multi-thread internally to prevent CPU thrashing.

Another aspect that is significant to consider is managing the concurrencies in message pulling. We can utilize pulling messages to multi-channel depending on the number of CPU threads that are available in the underlying compute instance.

Processing metrics estimation

Memory consumption

I will calculate the maximum memory consumption per execution task (concurrent goroutine) based on this formula:

mperTask=(mgoRuntime+mgcsBuffer+mvipsOverhead)(1+rsafeBuffer)m_{perTask} = (m_{goRuntime} + m_{gcsBuffer} + m_{vipsOverhead}) * (1 + r_{safeBuffer})

Where:

Therefore, the maximum memory consumption per execution task is:

(20+5+100)(1+0.2)=150(MB)(20 + 5 + 100) * (1 + 0.2) = 150 \text{(MB)}

Having the memory-consuming estimation, I can sketch a table of memory estimation for the hosting machine:

ItemsUnitsQuantities
RAM per concurrent goroutineMB150
Max concurrent tasks per 8GBs pooltasks53
Max concurrent tasks per 2GBs pooltasks13
Concurrency assumptionchannels16
RAM reservation by concurrency assumptionMB2400

Execution time

I did an experiment to measure the execution time of a single task on my local machine (Intel(R) Core(TM) i7-14700F CPU @ 5.40GHz, 28 threads).

#!/bin/bash

hyperfine -w 3 \
	"time vipsthumbnail test.jpg -o test.webp -s 320" \
	"time vipsthumbnail test.jpg -o test.webp -s 512" \
	"time vipsthumbnail test.jpg -o test.webp -s 768" \
	"time vipsthumbnail test.jpg -o test.webp -s 1024" \
	"time vipsthumbnail test.jpg -o test.webp -s 1280" \
	"time vipsthumbnail test.jpg -o test.webp -s 1600" \
	"time vipsthumbnail test.jpg -o test.webp -s 2048" \
	"time vipsthumbnail test.jpg -o test.webp -s 2560" \
	"time vips webpsave test.jpg test.webp"

It gave me the following mean metrics:

MetricsValues (approximately)
Photo size6MB
Resized photo total size2.417MB
Download time (235 Mbps)204ms
Upload time (16 Mbps)1209ms
Resize time (320px)150ms
Resize time (512px)175ms
Resize time (768px)240ms
Resize time (1024px)270ms
Resize time (1280px)360ms
Resize time (1600px)400ms
Resize time (2048px)430ms
Resize time (2560px)500ms
Convert time (original)850ms
Total4788ms

Deployment

Following up on the estimation, now I can make a decision on designing a runner capacity for the photo processing worker.

As quoted from the high-level design, I consider deploying on Google Cloud services only. Up to this moment, there are two options that appear in my mind:

Deploying on GCE instances

The event day is usually a one-day event, so it is fair enough to assume that we just need to keep the worker alive for 24 hours.

Therefore, I can consider cherry-picking a high-end instance to host the worker.

In the Build with AI Hanoi 2026 event day, I chose a c2d-standard-4 spot instance to deploy the worker. This instance features:

Why that much of RAM?
  1. There are a lot of GCP credits left.
  2. I want to try a hack that mounts the libvips’ temporary directory to the RAM, so it can boost up the performance rapidly when performing file handling tasks directly on memory instead of disk.

For c2d-standard-4 instance, I can calculate the image throughput per hour as:

tmachine=3600toneTaskcvCPU=36004.7884188 photos/ht_{machine} = \frac{3600}{t_{oneTask} * c_{vCPU}} = \frac{3600}{4.788 * 4} \approx \text{188} \text{ photos/h}

Where:

Deploying on Cloud Run functions

The GCE approach is not really a good fit for long-term runs due to its cost, especially applying to the current usage & traffic amount. Lol, I’m still not that rich to throw money away easily -_-.

In the future, when the GCP credits expire, I will consider moving to the Cloud Run functions, which only charge mostly on CPU time consumption. I’m not gonna confirm it’s a dramatically cheaper choice, but it potentially is.

By setting up the functions, I can utilize:

At this time, I have not tried this approach in practices, so I won’t give any more details about the technique specifications.