Vision Cortex SymCode Impression Premise The Clustering Algorithm Simplification Segmentation The Implementation Impressionism VTracer

Impression

Researcher: Chris Tsang | Published: 2020-12-24

Impression is a family of algorithms for image simplification, segmentation and vectorization. Try the Demo Web App.

Premise

The fundamental information result from visual perception is shape. For example, we would describe ‘there is an orange on a basket’ or more conceptually ‘there is a circle above a triangle’.

If we regard simplification as a process of information reduction, we could imagine information being taken away from an image shape by shape, starting from the least important shapes. At the same time, we can also simplify each shape in the image to make it ‘less fractal’.

By controlling this reduction process, we would be able to control the amount of visual information in an image in a quantitative manner.

Under this understanding, segmentation is an extreme degree of simplification, while vectorization is a slight degree of simplification.

The Clustering Algorithm

The clustering algorithm is hierarchical in nature and the idea is best described by the following verse:

On an unclaimed land, there are villages. Villages ally with each other to form tribes. Tribes conquer each other and form Kingdoms. Kingdoms clashes with each other and form empires. Empires breakdown and finally the entire earth becomes one union.

Stage 1: clustering

Clustering is a process to connect pixels of similar colors into clusters. Each cluster have a unique position and shape. There is well known algorithm to perform clustering and Impression’s implementation is essentially same as Kopelman Algorithm.

Stage 2: hierarchical clustering

Hierarchical clustering is a process to build an image tree from the clusters of stage 1. Impression builds a binary tree bottom up. In preparation, clusters are sorted by their size. Starting from the smallest, in each iteration, a cluster is merged into the closest cluster. There are different formulas for ‘close’, but it generally involves comparing the average colors between neighboring clusters.

Stage 3: tree walking

In image vectorization as in VTracer, we would walk the tree from top to bottom, trace each layer, and stack the vector paths on the output canvas. As if a painter paints on an empty canvas, he would lay down background first and then overlay objects atop, and finally add on details.

Simplification

In image simplification, there are multiple dimensions of information which can be controlled quantitatively:

Shape details: proportional to how many points we use to outline each shape
Fidelity: proportional to how many nodes we retain from the image tree
Color levels: proportional to how densely we sample the image tree layers

Shape Simplification

Impression currently chose the Ramer-Douglas-Peucker algorithm for shape simplification. While the original algorithm is designed for open curves, Impression adapted it for closed shapes. We cut a closed path into 4 sections, simplify each, and stitch them back together after path simplification. We choose the north most, east most, south most and west most points to cut a given path. Effectively the simplest possible shape is a diamond with four points. However in practice, it is not desirable to have shapes with sharp corners, and so we would smooth them with a 4-point scheme as described in VTracer.

Left to right: gradual reduction of shape details

Fidelity

It has to be understood that statistically there are exponentially more nodes as we decrease the cluster size. There is one root node known as the image background, but as many leaf nodes as there are number of pixels. As such, fidelity is actually a cutoff which we discard all clusters smaller than a desired size. When this cutoff threshold is low, we are removing salt and pepper noise but in a true color sense. As we increase this cutoff, structures are being discarded. Further increasing this cutoff, the image would become more abstract, and in the end only one solid background will be left.

It is important to note that this background color is not the mean pixel color of the image, but is the average color of the largest cluster, which conceptually is the ‘base tone’ of the given image.

Left: low fidelity; Right: high fidelity

Color Levels

More color levels mean finer gradient. Color levels set to 256 means utilizing the full color precision of RGB888. Setting color levels to 32 meaning a cluster has to have at least a color difference of 8 in order to be considered a separate cluster with the cluster on the upper level. In effect, tuning down the color levels would create a retro 8-bit color look. While the number of colors in the palette is limited, the colors are still 24-bit and thus are faithful to the original image.

Left: less color levels; Right: more color levels

Segmentation

Segmentation follows stage 3 of the algorithm. Similar clusters are grouped together by the disjoint sets algorithm, where each set of the result represents a solid color patch. For each cluster, its neighbours are being considered. If the color difference is smaller than a defined deviation, they will be put into the same union. To prevent grouping too greedily, after the first union, subsequent union would require stricter thresholds. To output segmentation result, each set would be rendered by the average color of all its constituting clusters.

After this stage, there would still be unwanted patches in the output. The output would further be re-clustered by Stage 1 to 3 described before, and an aggregation pass be applied afterwards. It would prune away smaller patches by merging into the closest (in terms of color) cluster if deviation would allow.

Finally, to output result, each aggregate would be rendered by its original cluster color.

Left: original; Mid: initial segmentation; Right: after aggregation

The Implementation

The described algorithms are implemented as reusable components under VisionMagic. Together with a processing pipeline, they are designed to support real-time and interactive applications. Users can easily organize processing stages and add additional processing passes as needed.

The Processor trait is the interface that future additions of algorithms will conform to. It also allows us to support video streaming applications in the future.

Impressionism

The above methodology is designed to imitate visual perception in humans. It is thus no surprising that resulting images exhibit characteristics similar to paintings drawn by Artists. As the name suggests, it is heavy inspired by a painting technique that we called ‘Impressionism’ for the meaning it convey during a brief period of art history.

Simplification Parameters: Shape Details  = 27983, Fidelity = 51256, Color Levels = 16
Photoshop ‘Oil Paint’ filter: Stylization = 0.1, Cleanliness = 10.0, Scale = 0.1, Bristle Detail = 6.5, Light = -60, Shine = 1.0

Left: Artwork generated by Impression; Right: Water Lilies Painting by Claude Monet

Photo Credits

Building photo by Hernan Lucio on Unsplash

Dog photo by Elijah Ekdahl on Unsplash

Cityscape photo by Mark Denton on Unsplash

Landscape photo by Luca Bravo on Unsplash

Parrot photo by Sanved Bangale on Unsplash

Water lily photo by Ravi Sharma on Unsplash