Boundary effect in digital pathology is a phenomenon where the tissue shapes of biopsy samples get distorted during the sampling process. The morphological pattern of an epithelial layer is greatly affected. Theoretically, the shape deformation model can normalise the distortions, but it needs a 2D image. Curvatures theory, on the other hand, is not yet tested on digital pathology images. Therefore, this work proposed a curvature detection to reduce the boundary effects and estimates the epithelial layer. The boundary effect on the tissue surfaces is normalised using the frequency of a curve deviates from being a straight line. The epithelial layer’s depth is estimated from the tissue edges and the connected nucleolus only. Then, the textural and spatial features along the estimated layer are used for dysplastic tissue detection. The proposed method achieved better performance compared to the whole tissue regions in terms of detecting dysplastic tissue. The result shows a leap of kappa points from fair to a substantial agreement with the expert’s ground truth classification. The improved results demonstrate that curvatures have been effective in reducing the boundary effects on the epithelial layer of tissue. Thus, quantifying and classifying the morphological patterns for dysplasia can be automated. The textural and spatial features on the detected epithelial layer can capture the changes in tissue.
Digital and computational pathology are becoming in demand now for clinical diagnosis and education, and a recent study in oral pathology [
Tissue changes, known as morphological transitions, are continuous and vary over time, rendering diagnosis more difficult at higher stages due to their complexity [
Dysplasia, in BO, is characterised by morphological transitions in oesophagus lining where mucus-secreting goblet cells and glands start to form in the epithelium and lamina propria. Then, the condition may progress into low-grade dysplasia where size, shape, and other cytological features in the cells and nuclei, starts to morph. The micro changes will gradually be seen in oesophagus lining, as the smooth surface of the squamous epithelial layer begins to form a fingerlike cellular pattern, also known as villiform. The pattern becomes more complicated with back-to-back gland formations and more prominent nuclei sizes, and other abnormalities invade into lamina or even the muscularis mucosa area, a sign of invasive carcinoma.
A trained pathologist can usually detect tissue changes under a light microscope, but the inter-observer agreement for dysplasia is generally less than moderate. On top of this, sampling error at the time of endoscopy may contribute to the diagnosis of BO [
Thus, implementing machine learning to help identify and measure morphological transition patterns may help in reducing the variations. Recent attempts on implementing machine learning on histopathology images are on the oesophagus, using textural and spatial features [
Oesophagus tissue usually is of a smooth line shape, but during sampling, they are pinched off or scraped from the oesophagus lining. The process yields curvature-shaped tissue for the biopsy, known as ‘boundary effect’. Tissue samples located in the surface area are usually affected, such as the epithelial layer in the mouth, oesophagus, colon and cervical biopsies. In characterising dysplasia, examining the epithelium layer is essential [
Meanwhile, in the machine vision field, it is agreed that the shapes and curvatures of biopsy sample tissues have a significant impact on tissue architecture analysis [
The main objective of this paper is to identify dysplastic tissue from the epithelial layer. However, the epithelial layer is greatly affected by boundary effects during the biopsy sampling processes. Thus, one of the significant contributions of this paper is the curvature-based method for detecting and extracting the correct epithelial tissue texture from the digital slide. This is because curvature theory on tissue boundary or squamous epithelial layer is not yet found in any literature. The proposed method has enabled the analysis of textural feature along the curvy tissue surface and solved the boundary effect problem.
Then, rigorous experiments to find the correct thickness of the epithelial layer in each particular zoom level for these regions were carried out. The second contribution is the new texture features called cluster co-occurrence matrix (CCM), which preserves and correlates the textural and spatial features between different magnifications. Considering spatial information between clusters of image texture has enabled the algorithm to capture different primitives on tissue images, without the need to identify them. The CCM feature extractions are carried out in two levels of magnification for each selected region and used in the machine learning approach for tissue classification.
This paper is organised as follows: Section 2 describes the methodology used including datasets and the processes for tissue detection, segmentation to region and type of tissues, as well as the feature extraction and classification from the curvy and smooth-surfaced virtual slides. Theories that underpins the shape complexity measurement also presented. The parameter and feature selection for epithelial layer classification is also explained. Then, Section 3 follows, describing the result obtained from experiments carried out to normalise the boundary effects of tissue shape and the estimation of epithelial thickness. It also includes the boundary complexity quantifiers after the normalisation of the boundary effect and the CCM feature performance in classifying dysplastic tissue. In Section 4, discussions on the findings from all experiments to show the significance and relationship of the machine learning and machine vision modelling to the domain understandings are carried out.
Tissue structure, as defined collectively by [
Stage 3 is to detect the epithelial layer in regions using the boundary lines detected at the previous stage. The layer thickness level is estimated with few parameters which will be elaborated in 2.4. The analysis of textural features will be carried out on the detected epithelial layer at
The dataset used in this paper was obtained from an Aperio Server at St James’s Hospital, scanned with an Aperio Scanner at
Two regions were identified based on the general characteristics of the tissue: the epithelial tissue and the lamina propria. The architectural and cytological features in each region differentiate the two types, and both had different patterns in the morphological sequences as well. As dysplastic changes occurred in the epithelial layer, segregating both tissue types was crucial. Therefore, additional measurements were taken in different magnification levels, as shown in
Based on the morphological pattern described above, the smoothness of the tissue’s boundaries can be used to gauge the tissue condition. While the changes in tissue progressed and invaded other areas, regions with smooth, curvy or complex surfaces needed to be segregated. In this work, feature extraction for epithelial tissue was applied in different magnification levels for each virtual slide in our training dataset. The first process was to detect the tissue from the virtual slides. At this stage, the whole virtual slide was viewed as thumbnail-sized images, with tissue labelled as a foreground using binary classification from grey-level thresholding. The image background is the white areas outside tissues including any object sizes of less than 200 pixels (usually smears or torn tissues). The background pixels were eliminated, leaving only tissues.
Knowing that normal oesophagus is covered with stratified squamous epithelium cells, different patterns of cells’ existence may show some abnormality. When dysplasia becomes more severe, the smooth lining eventually changes to form a villiform structure, similar to colon tissue. So, the detected tissues were further investigated in a binary format at
The high curvatures points (
where
Images of the ROIs created were accessed in
However, filters were applied to enhance the boundary detection accuracy and to retain unique coordinates. The coordinate of start-and-end points for each detected line are compared, and lines were linked together as one IF the coordinate is the same. The line measurements were recorded by the straight-line distance of the start-and-end points, and the number of connected pixels. The measurements were used as the third filter to verify the severity of surface complexity, as the detected line might be from torn tissue or the muscularis mucosa. Finally, the longest connected pixels detected was used as a priority for the candidate boundary.
The first method tested, was using optical density matrix for colour deconvolution in Hematoxylin and Eosin stained tissues, as suggested by Ruifrok et al., as the cytoplasm would appear pinkish and nuclei purplish. Applying this to an image produced a clearer image with enhanced nuclei or cytoplasm, making it easy to analyse each component further. The hypothesis was to extract the epithelial based on either nuclei lining or the cytoplasm in the ROI, as both components arranged themselves differently as dysplasia progressed. The nuclei component was deconvolved from each ROI. Then, repeated dilation and erosion processes were applied to the deconvolved ROI. This process was carried out to obtain the thinnest and smoothest line possible along the ROI boundary, without affecting the important crypts and peaks in the images. The line constructed was initialised as the candidate boundary:
The second approach is extracting the epithelial layer using the eight connected components. This method requires a binary image of ROI with the threshold grey-level values. From coordinate [0, 0], the coordinate for each connected pixel in eight directions, AND just next to a zero-valued pixel [background] was recorded as the boundary. Therefore, in each annotated region, a set of lines would be obtained, which would end at the bounding box but might start anywhere in the annotated regions. Five pixels on the image sides were excluded to ensure only relevant tissues are selected for boundary detection.
Next, the location and thickness of the epithelial tissue were detected using the two boundary coordinates [
Annotated regions were analysed at
Next, the patches were transformed into greyscale images for feature extraction at the pixel level. GLCM features, namely contrast, correlation, energy, and homogeneity, were extracted in four directions from each patch to represent the patch texture. Then,
The GLCM features were used to investigate the effect of zoom-level and neighbouring distance [
Then, texture features at patch level were investigated. Three texture features from the rotated and unrotated patches along the epithelial layer from both
However, as the patches were aligned along the tissue boundary, the properties of the clusters in CCM (as in GLCM) were projected in one direction within two neighbouring patches. Therefore, eight features were extracted and used to differentiate between normal and dysplastic epithelial tissues in oesophagus virtual slides. The texture similarity between these patches was observed using ten-fold cross-validation on k-means clustering.
After feature extraction, the classifier model for classifying the annotated regions into dysplasia or non-dysplasia was trained. Decision Tree (DT), Random Forest (RF) and Support Vector Machine (SVM) classifier model were used on a 70% split-training test dataset to ensure the reliability and robustness of the results. The linear SVM was used, while the DT used was the C4.5 algorithm with a confidence factor of 0.25, and the minimum number of objects in each leaf was set as two. The RF built up to 100 trees with a maximum depth of 10.
Although deep learning is becoming a trend now in image classification and image detection, it lacks in explainable reasoning on how a decision is made [
This section presents the results of tissue boundary selection, region creation, boundary complexity identification, and the epithelial layer feature extraction and classification.
The first experiment was conducted to extract the correct
Zoom | Precision (%) | Recall (%) | ||||
---|---|---|---|---|---|---|
4 | 20 | 93.83 | 90.61 | |||
10 | 40 | 30 |
81.40 | 72.92 |
Then, the
AP [%] | ||||
---|---|---|---|---|
Train-test | 10-fold | 8-fold | 6-fold | |
65.0 | 47.5 | 45.0 | 55.0 | |
83.5 | 82.5 | 82.0 | 80.0 |
Then, feature extraction processes were carried out for the epithelial layer tissue, using
Parameter | Tested value | Selected value |
---|---|---|
3, 4, 5, 6, 7, 8 | 5 | |
50, 100, 150, 200 | 150 | |
50, 100, 150 | 100 | |
2, 4, 6, 8, 10 | 4 | |
Zoom | ||
patches | rotated, unrotated | unrotated |
The effect of
10 | 20 | 10 | 20 | ||
---|---|---|---|---|---|
10 | AP MSE | 75.0 | 71.5 | 85.7 | 67.9 |
6.25 | 7.14 | 3.5 | 8.03 | ||
8 | AP MSE | 78.6 | 78.6 | 82.1 | 67.9 |
4.65 | 5.23 | 4.4 | 8.03 | ||
6 | AP MSE | 78.6 | 78.6 | 60.7 | 85.7 |
4.94 | 4.36 | 9.8 | 3.5 | ||
4 | AP MSE | 78.6 | 64.3 | 57.1 | |
5.36 | 8.9 | 10.7 | |||
2 | AP MSE | 78.6 | 71.4 | 71.4 | 67.9 |
5.35 | 7.14 | 7.14 | 8.03 |
In this step, the window sizes for patch creation were projected based on the
Input features | SVM (%) | DT (%) | RF (%) | |
---|---|---|---|---|
CCM rotated | 75.0 | 75.0 | 75.0 | |
CCM unrotated | 82.1 | 75.0 | 75.5 | |
Freq rotated | 55.0 | 60.0 | 62.5 | |
Freq un-rotated | 72.5 | 67.5 | 80.5 | |
LBP | 55.0 | 55.0 | 60.0 |
Dysplasia | Non-dysplasia | |
---|---|---|
Dysplasia | 18 | 3 |
Non-dysplasia | 4 | 15 |
The selection of
Few conclusions can be made from
A statistical test was also conducted to support the hypothesis that the texture features in the epithelial layer contained enough information to classify dysplastic and non-dysplastic tissue. A Chi-Square test of homogeneity was used to test if the results of classifying a tissue region as non-dysplastic or dysplastic using epithelial layer are the same as that of when using lamina layer. Hence, the hypotheses for the homogeneity test were:
As the classification accuracy from the epithelial layer was 0.835 [refer
Manual detection and characterisation of dysplastic regions in pathology images are labour-intensive and subjective. Thus, we proposed an automated region detection. The significance of detecting the tissue boundary, which subsequently allows the estimation of the epithelial layer of a tissue, is proven. However, the accuracy percentage of the dysplastic tissue classification still needs improvement. Thus, further experiment on machine learning optimisation will be carried out.
This paper has presented a new approach to assist dysplastic tissue classification and summarised as follows. Two region selection criteria of tissue types were first established for characterising different architectural and cytological features. Then, feature extractions of the epithelial tissue were performed at different magnification levels, and their ROIs were captured. With the ROIs, the boundaries of the tissue shapes were detected using two approaches; an optical density matrix for colour deconvolution, and a boundary detection technique based on eight connected components. The boundary detected based on deconvolved cytoplasm of ROI was shown to be superior and selected for further classification validation. The co-occurrence matrix and frequency of clusters based on rotated and unrotated patches of the boundaries were also performed via three classification methods–-SVM, DT, and RF–-with the AP measured. The results showed that classifications were generally improved.
Further statistical tests also proved the research hypothesis–-the changes in tissue texture along the tissue boundary to detect dysplasia can be detected was accepted. It demonstrated a solution to the boundary effect issue with tissue changes. This finding contributes to the domain and image processing fields as it solves the issue of boundary effect.
Authors would like to acknowledge the expert opinion of AP. Dr. Darren Treanor and Dr. Andy Bulpitt of University of Leeds.