Total of 1072 Asian seabass or barramundi ( Lates calcarifer) were harvested at two different locations in Queensland, Australia. Each fish was digitally photographed and weighed. A subsample of 200 images (100 from each location) were manually segmented to extract the fish-body area ( S in cm 2), excluding all fins. After scaling the segmented images to 1mm per pixel, the fish mass values (M in grams) were fitted by a single-factor model ( M=aS 1.5, a=0.1695 )achieving the coefficient of determination (R 2) and the Mean Absolute Relative Error ( MARE) of R 2=0.9819 and MARE=5.1%, respectively. A segmentation Convolutional Neural Network (CNN) was trained on the 200 hand-segmented images, and then applied to the rest of the available images. The CNN predicted fish-body areas were used to fit the mass-area estimation models: the single-factor model, M=aS 1.5, a=0.170, R 2=0.9819, MARE=5.1%; and the two-factor model, M= aS b, a=0.124, b=0.155, R 2=0.9834, MARE=4.5%.
In aquaculture, the economic value of a particular fish species is primarily determined by its mass (M). However, weight measurement usually involves manual handling, whilst length can easily be estimated from digital images through identifying the nose and tail of the fish. Therefore mathematical models were developed to estimate fish mass from its length (L). For example, the length- mass power model,
M = a L b , (1)
was commonly used, where a and b were empirically-fitted species-dependent parameters [
With the advances in image processing and the widespread availability of low-cost high-definition digital cameras, not only the length, but also other fish shape features could be collected automatically and used to estimate the mass. In particular, it was found that the fish image area (S) could be used to estimate the fish mass (M) via the linear model,
M = a + b S , (2)
for grey mullet (Mugil cephalus), St. Peter’s fish (Sarotherodon galilaeus) and common carp (Cyprinus carpio) [
M = a S b , (3)
does not exhibit the applicability limitations of Equation (2) and achieved the fit of R 2 = 0.99 for Alaskan Pollock (Theragra chalcogramma) [
M = a S 1.5 , (4)
from M ∝ L W H ∝ S 1.5 . For Atlantic salmon (Salmo salar), a similar area-mass power model was fitted as S ∝ M 0.61 (or M ∝ a S 1.64 ) with R 2 = 0.97 by [
Based on the preceding discussion, the first goal of this work was to establish the area-mass power model for the industrial scale harvesting of Asian seabass or barramundi (Lates calcarifer) in Queensland, Australia. The goal was successfully accomplished by fitting Equations (3) and (4), as displayed in
Two datasets were used in this study. The first was the Barra-Ruler-445 (BR445) dataset used in [
The fins of the fish can contribute significantly to the total fish image area, see typical examples in
Segmentation of 200 images (100 from each dataset) into fish-body and background was done manually using the GIMP open-source software program. The
resulting fish-body binary masks were individually scaled to have the same scale of 1 mm per pixel. In this study all custom computer programs were written in Python programming language, which was also used to calculate the fish-body pixel areas. The obtained fish areas and the corresponding measured mass values were fitted via Equation (4) and results displayed in
The recently developed semantic-segmentation Convolutional Neural Networks (CNN) [
The FCN-8s model was implemented [
with the machine-learning Python package TensorFlow [
The FAS model was loaded with the relevant VGG16 weights facilitating the knowledge transfer [
The described 200 images together with the corresponding hand-segmented body masks were used to train the FAS. The 200 image-mask pairs were randomly split 80% - 20%, where the 80% of pairs were used as the actual training set and the remaining 20% were used as the validation set to assess the training process. Since the training set had such small number of images, the encoding VGG16 layers in FAS were fixed and excluded from training. The remaining trainable weights (excluding biases) were regularized by a weight decay set to 1 × 10 − 4 . The training and validation images as well as the masks were rescaled to 1mm per pixel. Then each image-mask pair was extensively augmented for each epoch of training, i.e. one pass through all available training and validation images. Specifically, the python-opencv package was used to perform augmentations, where each image and if applicable the corresponding binary mask were:
・ randomly rotated in the range of [−180, +180] degrees;
・ randomly scaled vertically in the range of [0.8, 1] and independently horizontally within the same range;
・ randomly cropped to retain 480 × 480 pixels;
・ each color channel was ±12.5 range randomly shifted;
・ randomly flipped horizontally and vertically;
・ ImageNet color mean values were subtracted as required when working with the VGG16 model.
To assist better segmentation, the following loss function was adopted,
l o s s ( Y g t , Y p r e d ) = 1 − d i c e ( Y g t , Y p r e d ) + b c ( Y g t , Y p r e d ) , (5)
where: Y p r e d and Y g t were the predicted and ground truth (i.e. segmented-by-hand) 480 × 480 masks; b c ( Y g t , Y p r e d ) was the standard binary cross-entropy; and where d i c e ( Y g t , Y p r e d ) was the Dice coefficient [
Keras implementation of Adam [
Multiple training sessions with different random train/validation split produced very similar results. The FAS model and its training procedure exhibited negligible over-fitting as demonstrated by the comparable final training and validation loss values (mean of Equation (5)) of 0.063 ± 0.001 and 0.072 ± 0.003, respectively. The training and validation per-pixel accuracies were 0.9945 ± 0.0005 and 0.9935 ± 0.0005, respectively. The trained FAS model was applied to all available (scaled to 1mm per pixel) images including the 200 images used for training. By its design FAS could be applied to images of any size. However in practice, it was significantly faster to pad available images by zero values to fill the fixed 640 × 640 shape and then feed them into FAS for prediction, where the 640 × 640 square was large enough to fit all available scaled images. For each image, the prediction heat-map of [0, 1] range pixel values were further processed by setting values above 0.51 to ones (i.e. predicted as the body pixels) and the rest to zeros (i.e. the background pixels). The largest connected non-zero region in each image was accepted as the final fish body segmentation, and its area in pixel2 (i.e. mm2) was calculated. Overlapping fish and/or multiple fish per image were outside the scope of this work.
It took 2 - 3 hours to train FAS on Nvidia GTX 1080Ti GPU. However, once trained the FAS model was fast enough to process 640 × 640 images at a rate of 30 images per second on the same GPU, and therefore it could even be deployed in the aquaculture production processing video feed in real time. All predicted areas were plotted against the measured weights in
The difference in the Equations (3) and (4) fitting results (
The trained on 200 images Segmentation Convolutional Neural Network was used to automatically segment fish-body from background in all of this study’s 1072 digital images of Asian seabass (barramundi, Lates calcarifer). The automatically extracted fish-body areas and the corresponding manually measured weights were fitted to yield highly accurate single- and two-factor mass-from-
area estimation models, see
The authors declare no conflicts of interest regarding the publication of this paper.
Konovalov, D.A., Saleh, A., Domingos, J.A., White, R.D. and Jerry, D.R. (2018) Estimating Mass of Harvested Asian Seabass Lates calcarifer from Images. World Journal of Engineering and Technology, 6, 15-23. https://doi.org/10.4236/wjet.2018.63B003