Image vectorization refers to various processes for converting pixel-based image formats to line-based image formats — for example bitmaps to SVG’s. Vectorization is useful because digital cameras record images in terms of pixels, but vector-based formats are much easier to manipulate, allowing you to warp or erase individual lines, seamlessly scale the image, or draw the image in sequence. This is an active field of research in computer vision, and I found most existing programs to be limited in their functionality. Here, I recap some of my explorations of custom processes for doing so (with thanks to Jeff Hara @ Stanford).
The most existing vectorization programs I found were applications that did not have or had a limited command-line component, such as Super Vectorizer and Inkscape. Existing command-line vectorizers such as Potrace, work by tracing around the outline of colored pixels (tracing of the outside contour of line strokes) rather than stroke extraction:
contour tracing with potrace
A different type of vectorization, centerline tracing, involves recognizing lines within a pixel-based image.
desired centerline tracing
I was specifically interested in centerline tracing because I wanted to convert PNGs to SVGs to stroke-based images. Theoretically, this would allow any PNG image database to be converted into a stroke-based database that could be plugged into a stroke-based neural network model. The ability to construct an SVG dataset from a PNG one would have the potential for smoother and more flexible — less machinic — image generation.
Additionally, Hough transforms have also been used for line extraction, but work better for clean, predefined shapes rather than arbitrary shapes. Instead, we employ the technique of centerline tracing, which has been previously used to capture handwriting and process contour maps.
To perform centerline tracing, we first applied Guo-Hall thinning, an efficient thinning algorithm that can reduce the image to a pixel-width skeleton. This is similar to finding the centerline of a shape. We can now extract all the points of the skeleton.
These points are unordered, but there is an optimal stroke that passes through the points in a sensible order. This stroke may be vertical, horizontal, or any arbitrary shape, and we use a brute-force solution to find the optimal path, which minimizes the Euclidean distance between each consecutive pair of points. There may also be multiple disconnected strokes, so we use Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to cluster the points of a stroke together, and find the optimal paths per cluster.
Finally, we take the optimal paths and apply the Ramer-Douglas-Peuker algorithm with an epsilon of 2, which is used to simplify lines by representing them with fewer points. We can convert the simplify lines to strokes simply by calculating the change in x and y. Now, we have the desired pen-stroke format.
To quantitatively evaluate the success of this centerline tracing, I run the results through a classifier. I take a base dataset of SVG images (Dataset A), convert them to PNG, then back to SVG through centerline tracing (Dataset B). I train the classifier on the base dataset, then test it with the traced dataset.
The classifier had a high accuracy rate (98.48%) when tested on Dataset A. The classifier had much lower accuracy (69.5%) when tested on bitmap to vector converted images. In particular, the classifier mis-categorized 45% of ears as noses based on vectorized images.
test on 100 samples per class
This is likely because some curvature detail is lost in the vectorization process due to noise or over-simplification of lines; ear images with reduced curvature may begin to resemble nose images. Comparing stroke-based inputs versus vectorized bitmap inputs to the classifier, vectorization adds a 29.5% error rate.