An exponential growth in advanced technologies has resulted in the exploration of Ocean spaces. It has paved the way for new opportunities that can address questions relevant to diversity, uniqueness, and difficulty of marine life. Underwater Wireless Sensor Networks (UWSNs) are widely used to leverage such opportunities while these networks include a set of vehicles and sensors to monitor the environmental conditions. In this scenario, it is fascinating to design an automated fish detection technique with the help of underwater videos and computer vision techniques so as to estimate and monitor fish biomass in water bodies. Several models have been developed earlier for fish detection. However, they lack robustness to accommodate considerable differences in scenes owing to poor luminosity, fish orientation, structure of seabed, aquatic plant movement in the background and distinctive shapes and texture of fishes from different genus. With this motivation, the current research article introduces an Intelligent Deep Learning based Automated Fish Detection model for UWSN, named IDLAFD-UWSN model. The presented IDLAFD-UWSN model aims at automatic detection of fishes from underwater videos, particularly in blurred and crowded environments. IDLAFD-UWSN model makes use of Mask Region Convolutional Neural Network (Mask RCNN) with Capsule Network as a baseline model for fish detection. Besides, in order to train Mask RCNN, background subtraction process using Gaussian Mixture Model (GMM) model is applied. This model makes use of motion details of fishes in video which consequently integrates the outcome with actual image for the generation of fish-dependent candidate regions. Finally, Wavelet Kernel Extreme Learning Machine (WKELM) model is utilized as a classifier model. The performance of the proposed IDLAFD-UWSN model was tested against benchmark underwater video dataset and the experimental results achieved by IDLAFD-UWSN model were promising in comparison with other state-of-the-art methods under different aspects with the maximum accuracy of 98% and 97% on the applied blurred and crowded datasets respectively.
Water covers 75% of earth’s surface in the form of different water bodies such as canals, oceans, rivers, and seas. Most of the expensive resources are present in these water bodies and it should be investigated to explore further. Technological advancements, made in the recent years, have managed the likelihood of performing underwater exploration with the help of sensors at every level. Consequently, Underwater Sensor Network (UWSN) is one such advanced technique that enables underwater exploration. Being a network of independent sensor nodes [
Underwater transmission is mostly performed by a group of nodes that transfers the information to buoyant gateway nodes. These gateway nodes in turn transmit the information to nearby coastal monitor-and-control stations, which are otherwise known as remote stations [
In recent years, tracking and underwater tracking detection have become an attractive research field [
In literature [
The current research article designs an Intelligent Deep Learning (DL)-based Automated Fish Detection model for UWSN, named IDLAFD-UWSN model. In background subtraction phase of the presented model, Gaussian Mixture Model (GMM) model is utilized. Besides, the presented IDLAFD-UWSN model makes use of Mask Region Convolutional Neural Network (Mask RCNN) with Capsule Network as a baseline model for fish detection. At last, Wavelet Kernel Extreme Learning Machine (WKELM) model is utilized as a classifier model. The proposed IDLAFD-UWSN model was validated using benchmark underwater video dataset and the simulation outcomes were inspected under distinct dimensions.
The remaining sections of the paper are organized as follows. Section 2 explains the processes involved in automated fish detection and tracking. Then, Section 3 reviews the existing fish detection methods whereas the proposed IDLAFD-UWSN model is discussed under Section 4. The experimental validation process is detailed in Section 5 while the conclusion is drawn in Section 6.
In order to ensure effective marine monitoring, it is mandatory to estimate fish biomass and its abundancy through population sampling in water bodies such as rivers, oceans, and lakes. It monitors the behavior of distinct fish species by altering environmental situations. This task gains significance particularly in those regions where specific fish species are on the verge of extinction or being threatened for life due to industrial pollution, habitation loss and alteration, commercial overfishing, deforestation, and climate change [
Generally, automated fish sampling is conducted through three main processes: (1) Fish recognition that distinguishes fish from non-fish objects in underwater videos. Non-fish objects include aquatic plants, coral reefs, sessile invertebrates, seagrass beds, and common background. (2) The second process is the classification of fish species in which the species of every identified fish is recognized and classified from a predefined pool of distinct species [
The current section reviews state-of-the-art automated fish detection techniques. Hsiao et al. [
Qin et al. [
Sun et al. [
The overall system architecture of the presented IDLAFD-UWSN model is shown in
The presented model was tested using Fish4Knowledge with Complex Scenes (FCS) database. It is mainly created from a huge fish dataset known as Fish4Knowledge. With more than 700,000 underwater videos in unrestricted condition, the Fish4Knowledge database is a result of data collection for about 5 years that intended to monitor the marine ecosystem of coral reef in Taiwan [ Blurred, including three poor contrast blur videos. Complex background includes three videos with rich seabed providing a maximum degree of backdrop confusion. Crowded, in which a set of three videos is present with maximum density of fish movement in all video frames. This poses particular challenges to detect fishes under the existence of occluding objects. Dynamic background, where two videos are given with rich texture of coral reefs backdrop and movable plants. Luminosity variation includes two videos with abrupt luminosity variations, because of the surface wave action. It generates false positives during identification process, owing to the movement of light beam. In Camouflage foreground, two videos are selected which show the camouflaging issue of fish detection in the existence of texture and colorful backdrop. Hybrid, where a pair of videos is chosen to demonstrate the integration of previously-defined conditions of changeability.
This database is primary developed for fish-related tasks such as detection, classification, etc. So, the ground truth images exist for every moving fish on a frame-by-frame basis in every video. A set of 1,328 fish annotations is presented in FCS database as illustrated in
GMM is one of the common methods used for modeling foreground and background conditions of the pixel. It has the capacity to perform general calculation as they could fit in all the density functions, when they possess sufficient combination. Here,
where
where
To simplify the estimation, covariance matrix is always considered as diagonal.
where
This is an elective phase where the model employs EM (Expectation-Maximization) technique on a video portion; however, it could initiate an individual model for each pixel (of weight 1), that beings from the level of initial frame.
Every Gaussian mode is categorized as Background/Foreground. This crucial link is attained from a basic rule
This step arranges the pixels. In all the techniques, a pixel is allocated to a class of nearest mode center in limitation.
where
An update function is given herewith.
When a mode
where
Or else, the latter allocation is substituted by a novel Gaussian mode.
Mask R-CNN model is popular in several object detection tasks. It includes three components namely, CNN-based feature extraction, Region Proposal Network (RPN) and Parallel prediction network. At first, CNN model is applied in feature extraction from the input images. Secondly, RPN makes use of anchors under various scales and aspect ratios to glide on the feature maps so as to generate the generating region proposal. Thirdly, three branches from parallel prediction network with two FC layers are involved for bounding box classification and regression while FCN is involved to predict the object masks. Principally, baseline network is found to be a major model for Deep Neural Networks (DNN) namely, CapNet, GoogLeNet, and ResNet. In this study, MaskRCNN with CapsNet model are used whereas the CapsNet is utilized as the backbone network for feature extraction. This scenario results in effective reduction of gradient vanishing and reduced training with no increase in model parameters.
CapsNet method is one of the latest studies in this research domain. The key element of CapsNet is a capsule that comprises of a set of organized neurons. The length of capsule is decided based on invariance, whereas the number of features is present to reconstruct the image measurement of equivariance. The orientation of vector denotes its variables,
When a standard NN requires extra layers to increase accuracy and details, with CapsNet, an individual layer can nest with other layers. The capsules efficiently denote distinct kinds of visual data which are known as instantiation variables and some of the examples are as follows integration of size, orientation, and position.
where
An activation function named ‘squashing’, shrinks the last output vector to 0, when it is smaller whereas when it is larger, it becomes unit vector and generates the capsule length. The activity vector
The primary network extracts low-level features such as edges whereas the upper network extracts the top-level features that denote the target class. In order to use the features effectively at every stage, Mask RCNN model extends the baseline network to Feature Pyramid Network (FPN). This network exploits both intrinsic layers and multi-scaling characteristics of CNN to derive meaningful features in the detection of objects. The aim of RPN lies in the prediction of set of region proposals in an effective way [
Here, detection outcome designates the predicted box and ground truth specifies the ground truth box. RPN fine-tunes the region proposals based on the attained regression details and discards the region proposals that overlap with image boundaries. At last, based on Non-Maximum Suppression (NMS), around 2000 proposal regions are kept for every image.
The region proposal, produced by RPNs, necessitates RoIAlign to adjust the dimensions for satisfying multibranch prediction network. RoIAlign utilizes bilinear interpolation rather than rounding function in RoIPool for faster R-CNN so as to extract the respective features of all-region proposals in feature map. When training the model, the loss function is determined for Mask RCNN model for all the proposals as given below.
where
where
At this stage, WKELM model is applied to categorize the objects under fish or non-fish entities. WKELM model combines the benefits of distinct kernel functions and integrates the wavelet analysis with kernel extreme learning machine. The weighted ELM method is presented to manage the instances that are unbalanced in probabilities’ distribution while this technique acts excellent. Besides, the weighted WKELM technique establishes the weighted model-to-cost function so as to obtain the same result as weighted ELM [
In KELM method, the output is written as follows
where
The experimental validation of the presented IDLAFD-UWSN model was performed with two testbeds from FCS dataset, namely, Blurred and Crowded. Both the testbeds comprised of a set of 5,756 frames with a duration of 3.83 minutes.
Besides, on the test frame 565, the proposed IDLAFD-UWSN model achieved 0.99, 0.99, and 0.99 accuracy for the targets, targ_1, targ_2, and targ_3 respectively. In addition to the above, on the test frame 1009, IDLAFD-UWSN model found the targets such as targ_1, targ_2, and targ_3 while the accuracy values were 0.99, 0.99, and 0.99 respectively.
Frame Number | Target_1 | Target_2 | Target_3 |
---|---|---|---|
043 | 0.99 | – | – |
113 | 0.99 | 0.99 | – |
134 | 0.96 | 0.99 | 0.98 |
136 | 0.99 | – | – |
160 | 0.99 | 0.99 | 0.99 |
163 | 0.99 | 0.99 | – |
166 | 0.99 | 0.99 | – |
173 | 0.98 | 0.99 | – |
181 | 0.96 | 0.99 | – |
188 | 0.99 | 0.99 | – |
193 | 0.99 | 0.99 | 0.99 |
196 | 0.99 | 0.99 | – |
197 | 0.99 | 0.99 | – |
203 | 0.98 | 0.99 | 0.99 |
217 | 0.99 | 0.99 | – |
243 | 0.90 | 0.99 | – |
250 | 0.99 | 0.99 | – |
565 | 0.99 | 0.99 | 0.99 |
778 | 0.95 | 0.99 | – |
1009 | 0.99 | 0.99 | 0.99 |
Meanwhile, on the test frame 221, the proposed IDLAFD-UWSN model detected targ_1, targ_2, targ_3, and targ_4 while its accuracy values were 0.99, 0.95, 0.99, and 0.99 respectively. Afterwards, on the test frame 435, IDLAFD-UWSN model achieved the accuracy of 0.99, 0.78, and 0.97 for the targets, targ_1, targ_2, and targ_3 correspondingly. Followed by, on the test frame 1217, IDLAFD-UWSN model detected the targets such as targ_1, targ_2, targ_3, targ_4, targ_5, and targ_6 while its accuracy values were 0.99, 0.93, 0.96, 0.99, 0.99, and 0.99 correspondingly. Simultaneously, on the test frame 1506, IDLAFD-UWSN model detected the targets such as targ_1, targ_2, targ_3, targ_4, and targ_5 with an accuracy of 0.97, 0.99, 0.99, 0.99, and 0.99 respectively.
Frame | Target_1 | Target_2 | Target_3 | Target_4 | Target_5 | Target_6 | Target_7 | Target_8 |
---|---|---|---|---|---|---|---|---|
010 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | – |
019 | 0.98 | 0.98 | 0.98 | 0.98 | 0.99 | 0.98 | 0.99 | 0.98 |
024 | 0.95 | 0.98 | 0.98 | 0.99 | 0.99 | 0.99 | 0.99 | – |
036 | 0.98 | 0.87 | 0.99 | 0.96 | 0.99 | – | – | – |
054 | 0.99 | 0.82 | 0.99 | 0.99 | – | – | – | – |
136 | 0.93 | 0.99 | 0.96 | 0.87 | 0.98 | 0.99 | – | – |
160 | 0.96 | 0.96 | 0.99 | 0.93 | 0.99 | – | – | – |
175 | 0.99 | 0.95 | 0.96 | 0.96 | 0.99 | – | – | – |
188 | 0.99 | 0.99 | 0.93 | 0.99 | 0.99 | 0.99 | – | – |
221 | 0.99 | 0.95 | 0.99 | 0.99 | – | – | – | – |
259 | 0.98 | 0.99 | – | – | – | – | – | – |
286 | 0.99 | 0.98 | 0.99 | – | – | – | – | – |
312 | 0.99 | 0.99 | – | – | – | – | – | – |
435 | 0.99 | 0.78 | 0.97 | – | – | – | – | – |
541 | 0.94 | 0.98 | 0.94 | – | – | – | – | – |
1202 | 0.89 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | – | – |
1217 | 0.99 | 0.93 | 0.96 | 0.99 | 0.99 | 0.99 | – | – |
1226 | 0.99 | 0.99 | 0.99 | 0.93 | 0.99 | 0.98 | 0.99 | |
1410 | 0.99 | 0.99 | 0.95 | 0.99 | 0.95 | 0.99 | – | 0.99 |
1506 | 0.97 | 0.99 | 0.99 | 0.99 | 0.99 | – | – | – |
Methods | Accuracy | F-score | ||
---|---|---|---|---|
Blurred | Crowded | Blurred | Crowded | |
IDLAFD-UWSN | 98.00 | 97.00 | 96.00 | 97.00 |
KDE | 91.73 | 84.83 | 92.56 | 82.46 |
ML-BKG | 72.94 | 80.13 | 70.26 | 79.81 |
EIGEN | 82.89 | 75.82 | 81.71 | 73.87 |
VIBE | 86.35 | 85.37 | 85.13 | 84.64 |
TKDE | 93.78 | 85.90 | 93.25 | 84.19 |
Hybrid system | 86.76 | 84.27 | 86.76 | 84.27 |
FLDA-TM | 88.00 | 89.00 | 87.32 | 88.76 |
FLDA | 86.00 | 80.00 | 85.78 | 80.12 |
SCEA | 71.00 | 70.00 | 72.65 | 69.63 |
Finally, when assessing the detection performance of the proposed IDLAFD-UWSN model in terms of F-score on crowded video testbed, the results conclude that SCEA and EIGEN models achieved ineffectual outcomes since its F-score values were 69.63% and 73.87% respectively. Afterward, ML-BKG model attained somewhat enhanced results with an F-score of 79.81%, whereas FLDA, KDE, and TKDE approaches demonstrated moderately-closer F-score values being 80.12%, 82.46%, and 84.19% respectively. At the same time, Hybrid system model exhibited a manageable performance with an F-score of 84.27%. VIBE and FLDA-TM models showcased competitive outcomes while its F-score values were 84.64% and 88.76%. The proposed IDLAFD-UWSN model outperformed all the existing models and produced the highest F-score of 97%.
From the above-discussed tables and figures, it is obvious that the presented IDLAFD-UWSN model accomplished promising results under blurred and crowded environments too. The improved performance is due to the inclusion of GMM-based background subtraction, MaskRCNN with CapsNet-based fish detection, and WKELM-based fish classification. Therefore, it can be employed as an effective fish detection tool in marine environment.
The current research article presented a novel IDLAFD-UWSN model for automated fish detection and classification in underwater environments. The presented IDLAFD-UWSN model aims at automatic detection of fishes from underwater videos, particularly in blurred and crowded environments. The presented IDLAFD-UWSN model operates on three stages namely, GMM-based background subtraction, MaskRCNN with CapsNet-based fish detection, and WKELM-based fish classification. MaskRCNN with CapsNet model distinguishes the candidate regions in video frame from fish to non-fish objects. Lastly, fish and non-fish objects are classified with the help of WKELM model. An extensive experimental analysis was conducted on benchmark dataset while the results of the analysis achieved by IDLAFD-UWSN model were promising with the maximum accuracy of 98% and 97% on the applied blurred and crowded datasets respectively. As a part of future extension, the presented IDLAFD-UWSN model can be implemented in real-time UWSN to automatically monitor the behavior of fishes and other aquatic creatures.