Implementing Efficient Graph Based Image Segmentation with C++
This is a summary and my implementation of the research paper Efficient Graph based Image Segmentation. To understand the implementation it is recommended to know C++ (pointers, vectors, structs, linked lists, pass by reference and return by reference).
What is Image Segmentation?
Image Segmentation is the process of dividing an image into sementaic regions, where each region represents a separate object. Quoting wikipedia:
More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.
Look at the following examples:
Now this might be a very trivial task for the human brain, but no so easy for the computer.
There have been a good number of methods in the past to solve this problem but either they were not convincing or efficient enough for larger adoption and real world applications. In 2004, Pedro F. Felzenszwalb (MIT) and Daniel P. Huttenlocher (Cornell), published this method to be more effective and efficient compared to previous methods for this problem. We are going to discuss this method in this article.
What this particular method aims to achieve?
Extract global aspects of an image – this means grouping and dividing all regions which are perceptually important. They achieve this based on a predicate that measures the evidence of a boundary. As the algorithm states,
Intuitively, the intensity differences across the boundary of two regions are perceptually important if they are large relative to the intensity
differences inside at least one of the regions.
Be super fast. To be of practical use, the segmentation needs to be as quick as edge detection techniques or could be run at several frames per second (important for any video applications)
Defining the Graph
\[G = (V, E)\]
The graph G is an undirected weighted graph with vertices $v_i \in V$ and edges $(v_i, v_j) \in E$ corresponding to pairs of adjacent vertices. In this context, the vertices represent the pixels of the image.
Each region, also called as the Component, will be denoted by $C$, i.e., $C_1, C_2, \dots C_k$ are componets from Component 1 to Component k.
The edges of the graph can be weighted using intensity differences of two pixels. If it’s a grayscale image the the weight of an edge between vertices $v_i$ and $v_j$ can be calculated using:
\[w(e_{ij}) = |I_i - I_j|\]
where $I_i$ and $I_j$ are intensities of vertex $v_i$ and $v_j$ respectively.
where $r_p$, $b_p$ and $g_p$ are intensities of red, blue and green channels of the vertex p, respectively.
Defining the metrics to measure a boundary between regions
The algorithm merges any two Componets(regions) if there is no evidence for a bounday between them. And how is this evidence measured? Well there are 4 simple equations to understand:
First we need to know the differences within a region:
\[Int(C) = \max_{e \in MST(C, E)} w(e)\]
Int(C) is the internal difference of a component C defined as the maximum edge of the Minimum Spanning Tree (MST) of the component C. MST is calculated using the Kruskal’s algorithm. For those who want a refresher on MST, can find a simple and intuitive explanation here.
Then we need to find differences between the regions:
The Difference between two components is meased by the minimum of the weighted edges connecting the two components.
We also need a thresholding function that controls the degree of how much the internal difference of two regions should be lower than the difference between the regions to conclude there is an evidence for a boundary.
\[\Gamma(C) = k/|C|\]
Here k is constant, which controls the degree of difference required between internal and external differences to find a boundary. $ | C | $ is the size of the component which is equal to the number of pixels in the component.
In my opinion, we need this thresholding to confidently divide or merge two regions and also be more robust against high-variability regions.
Finally, the algorithm evaluates the evidence of a boundary by checking if the difference between components is larger than the internal difference of any of the two components. Which means, there is a boundary between two components if Dif(C_1, C_2) should be greater than the minimum of the internal difference of the two components.
Of course, if the second component is large, the value of $\Gamma$ would be comparatively low and increases the possibilities of merging the two components. That also emphasizes the point that larger k prefers larger components.
To conclude, the above equation means that there is evidence of a boundary if the external difference between two components is greater than the internal difference of any of the components relative to there size ($k/|C|$).
The Algorithm
We start with the initial setup where each pixel is its own componets and $Int(C) = 0$, as $|C| = 1$ (no edges).
Sort the edges in non-decreasing order
Loop through each edge
For the $qth$ iteration, $v_i$ and $v_j$ are the vertices connected to the $qth$ edge in the ordering, i.e., $e_q = (v_i, v_j)$ and belong to components $C_i$ and $C_j$ respectively. If $v_i$ and $v_j$ are in different components ($C_i \neq C_j$) and $MInt(C_i, C_j)$ > $Dif(C_i, C_j)$ then merge the components, otherwise do nothing.
That’s it. That’s the algorithm. Pretty simple eh?
Intuition
Larger k causes a preference for larger components(components with more pixels).
Consider k = 700 and the range of intensities is [0, 255]. In this case, the size of resulting components will be at least 3 pixels. This is because if range is [0, 255] then the maximum weight (difference between two pixels) is also 255. Let S be a component of size 3, then:
\[\Gamma(S) = 700/3\]
\[\Gamma(S) \approx 233\]
This means that for the component S to have a boundary with another component P(let $|P| = 2$ and $Int(P) > Int(S)$) , it’s internal difference should be at least 233 units less than the minimum edge connecting 2 components. And if component S was of size 10, then $\Gamma = 70$. A comparatively smaller difference between external and internal differences is required to merge components. So with lower components size a higher difference is required(stronger evidence) to prove existence of a boundary, which means that the component is likely to be merged and merging of two components results in a larger component.
You can take any other measures as weight.
In place of intensity differences, you can take other measures for weight such as location of pixels, HSV color space values, motion, or other attributes. Anything that works for your use-case.
Threshold function can also be modified to identify particular shapes
You can also change the threshold function such that it identifies particular shapes such as circles, squares or pretty much anything. In this case, not all segmentations will be of this particular shape but one of two neighboring regions will be of this particular shape.
Implementation
You can access my entire Github Repo for the most recent updates.
I have used a Disjoint Forest to represent our segmentation as suggested by the paper. To summarize disjoint Forest Data Structure:
It is a faster implementation of the Disjoint Set Data Stucture. A disjoint set is a collections of sets which do not have any elements in common. A Disjoint Set uses linked lists to represent sets whereas a disjoint forest uses trees. In our case, each set(tree) represents a component and each element of a component represents a unique pixel.
Each set has a representative, like an identity to identify a unique set. In our case, the representative of a component is the first pixel of the component(or the root node of the tree).
Merging two sets results in a single set whose size is equal to the sum of the two merged sets. In our case, we merge two sets when we find a evidence for a boundary between them (when $D(C_i, C_j)$ is true).
DisjointForest.h
Explanation
Each ComponentStruct also represents a component in a linked list. It points to the previous ComponentStruct and the next ComponentStruct in the linked list. Each time two components are merged, one ComponentStruct is deleted and the adjacent neighboring ComponentStructs in the list point to one another.
A component has a vector of pixels and a representative. The representative is initiated as the first pixel of the component. The component also points to the ComponentStruct in the linked list which points to the component. The rank represents the height of the component(The component is a tree structure). When two components are merged the component with lower rank is merged into the component with higher rank.
Each Pixel represents an individual pixel of the image that needs to be segmented. It holds its itensity, location((row, column)) and the component it belongs too.
DisjointForest.cpp
Explanation:
makeComponent creates a component with a single element(pixel). As you will see in utils.cpp, that initially each pixel is part of a separate component.
createEdge creates an Edge object using two Pixel Object pointers and sets weight depending on the colorSpace given. If we are comparing rgb differences then edgeDifferenceFunction is rgbPixelDifference euclidean difference of rgb values) otherwise it is grayPixelDifference(absolute difference between the pixel intensities).
mergeComponents has multi-fold operations. If Component A is merged into Component B (rank(A) < rank(B)):
Sets A’s representative parent as B’s representative.
Add’s all the pixels A to the pixels of the component B
Deletes the componentStruct of A and points the previousComponentStruct and nextComponentStruct accordingly.
utils.h
utils.cpp
Explanation
constructImageGraph constructs the linked list while creating components and pixels. This function is initiating the graph, by creating a component for each pixel and adding a component to the linkedlist. Note that the pixels vector is one dimensional. So a getSingleIndex is used to get the corresponding index where the pixel needs to be saved/accessed based on its location(row & column) in the image.
addColorToSegmentation adds color to all the componentStruct(Remember that ComponentStruct is pointing a single Component). All pixels of a component are assigned a single random color
setEdges creates edges as displayed by the image below:
segmentation.h
segmentation.cpp
segmentation.cpp implements the segmentation algorithm of the paper. Note that the post-processing part of the code ensures that each component is at-least equal to minimumComponentSize. If not, it merges two neighboring components.
Note that the MSTMaxEdge represents the maximum edge for the minimum Spanning Tree which is actually the internal difference of the component. But where is Kruskal’s algorithm? Well, the edge causing the merge will be the maximum edge of the MST selected by kruskals algorithm:
We sort the edges and iterate through them in a non-descending order.
According to the algorithm, the difference between the components is the minimum edge weight $e_k$. And Int(C) is the maximum of the edge weights in the components ($e_i, e_j$). Remember that $e_i$ and $e_j$ are already iterated through because we iterating in a non-descending order and $e_i$ and $e_j$ are part of few components (Initially there is only one pixel in an image and therefore, no edges), which means $e_k > = e_i$ and $e_k > = e_j$
If the components are to be merged, a new component is to be created and therefore a new MST maximum edge. Creating the new MST is easy as we only need to connect the two MSTs (MSTs of the components) with the minimum edge(kruskals). This minimum edge is $e_k$ as established in the previous point.
As we know that $e_k$ is greater than maximum MST edges of the two components, $e_k$ becomes the new MSTMaxEdge.
main.cpp
Explanation:
Operations of main.cpp in order:
Image is read into img, which is then smoothened with a Gaussian filter
Pixels and Edge Vectors are initialized
From the image, the Pixel objects are created, initialized and placed into pixels vector depending on the location of these pixels.
Edges are initialized and computed between neighbors as shown in the image above and placed into edges vector
The edges are sorted based on their weight
The segmentation algorithm is applied to the edges
The image is colored based on the segmentation results
Building and execution
You can run the following commands in any directory, but it is recommended to execute this in the same directory where source is located. Make sure cmake is installed.
Execute the program using the folling syntax from the build directory:
Results
There is still some scope of improvement but this is a decent start.
Phew! We’re done. Don’t hesitate to contact me if you have any feedback or questions.