专题--农业遥感与表型信息获取分析
1.中国农业科学院农业信息研究所,北京 100081
2.农业农村部农业大数据重点实验室,北京 100081
作者简介 About authors
为提高现有苹果目标检测模型在硬件资源受限制条件下的性能和适应性,实现在保持较高检测精度的同时,减轻模型计算量,降低检测耗时,减少模型计算和存储资源占用的目的,本研究通过改进轻量级的MobileNetV3网络,结合关键点预测的目标检测网络(CenterNet),构建了用于苹果检测的轻量级无锚点深度学习网络模型(M-CenterNet),并通过与CenterNet和单次多重检测器(Single Shot Multibox Detector,SSD)网络比较了模型的检测精度、模型容量和运行速度等方面的综合性能。对模型的测试结果表明,本研究模型的平均精度、误检率和漏检率分别为88.9%、10.9%和5.8%;模型体积和帧率分别为14.2MB和8.1fps;在不同光照方向、不同远近距离、不同受遮挡程度和不同果实数量等条件下有较好的果实检测效果和适应能力。在检测精度相当的情况下,所提网络模型体积仅为CenterNet网络的1/4;相比于SSD网络,所提网络模型的AP提升了3.9%,模型体积降低了84.3%;本网络模型在CPU环境中的运行速度比CenterNet和SSD网络提高了近1倍。研究结果可为非结构环境下果园作业平台的轻量化果实目标检测模型研究提供新的思路。
关键词:机器视觉;深度学习;轻量级网络;无锚点;苹果检测
Intelligent production and robotic oporation are the efficient and sustainable agronomic route to cut down economic and environmental costs and boosting orchard productivity. In the actual scene of the orchard, high performance visual perception system is the premise and key for accurate and reliable operation of the automatic cultivation platform. Most of the existing apple detection models, however, are difficult to be used on the platforms with limited hardware resources in terms of computing power and storage capacity due to too many parameters and large model volume. In order to improve the performance and adaptability of the existing apple detection model under the condition of limited hardware resources, while maintaining detection accuracy, reducing the calculation of the model and the model computing and storage footprint, shorten detection time, this method improved the lightweight MobileNetV3 and combined the object detection network which was based on keypoint prediction (CenterNet) to build a lightweight anchor-free model (M-CenterNet) for apple detection. The proposed model used heatmap to search the center point (keypotint) of the object, and predict whether each pixel was the center point of the apple, and the local offset of the keypoint and object size of the apple were estimated based on the extracted center point without the need for grouping or Non-Maximum Suppression (NMS). In view of its advantages in model volume and speed, improved MobileNetV3 which was equipped with transposed convolutional layers for the better semantic information and location information was used as the backbone of the network. Compared with CenterNet and SSD (Single Shot Multibox Detector), the comprehensive performance, detection accuracy, model capacity and running speed of the model were compared. The results showed that the average precision, error rate and miss rate of the proposed model were 88.9%, 10.9% and 5.8%, respectively, and its model volume and frame rate were 14.2MB and 8.1fps. The proposed model is of strong environmental adaptability and has a good detection effect under the circumstance of various light, different occlusion, different fruits’ distance and number. By comparing the performance of the accuracy with the CenterNet and the SSD models, the results showed that the proposed model was only 1/4 of the size of CenterNet model while has comparable detection accuracy. Compared with the SSD model, the average precision of the proposed model increased by 3.9%, and the model volume decreased by 84.3%. The proposed model runs almost twice as fast using CPU than the CenterNet and SSD models. This study provided a new approach for the research of lightweight model in fruit detection with orchard mobile platform under unstructured environment.
Keywords:machine vision;deep learning;lightweight network;anchor-free;apple detection
本文引用格式
现有基于无锚点的目标检测方法虽然在算法稳定性上得到提升,但由于模型参数过多,对硬件计算资源要求较高,同时较大的模型体积也使其无法适用于硬件资源相对受限的作业平台。为此,本研究基于果园作业平台硬件资源受限的现状,针对算法模型计算负担和模型大小敏感的问题,将轻量级深度卷积神经网络与无锚点目标检测网络结合,构建了一个轻量级无锚点的树上苹果检测网络模型,实现了在保持较高检测精度的同时,减轻模型的计算量,降低模型资源占用的目标,从而满足果园作业平台对于轻量化目标检测模型的需求。
试验图像数据的采集地点位于中国辽宁省兴城市的苹果园。数据采集设备为手持式数码相机,采集时间为8:00-17:00,在晴朗和多云天气条件下共采集1455幅苹果图像。在采集过程中,相机镜头与果树列平行,并与果树保持50cm左右的距离,该距离利于果园作业平台找到合适的目标搜索区域,方便其高效地完成任务。采集图像的像素分辨率为5472×3648,为减轻计算负担,将采集图像的像素分辨率调整为750×500。同时,运用自主开发的标注工具对所有图像中的苹果进行逐一标注,获取并记录图像中每个苹果标注框的坐标信息,即标注框的左上角和右下角两个点的x、y坐标信息。
图1CenterNet网络结构示意图
Fig.1Network structure diagram of CenterNet
图2CenterNet网络输出三类预测结果示意图
Fig.2Three prediction outputs types of CenterNet network
(a) Keypoint heatmap (b) Local offset (c) Object size
为了保证果园作业平台的移动灵活性和轻便性,平台通常使用小体积的工控硬件系统,由于这类硬件系统往往计算和存储资源相对缺乏,因此对算法模型的计算负担和模型大小比较敏感,如果使用的深度学习模型参数较多、计算量和体积较大,必将会影响果园作业平台的速度和效率。CenterNet初始使用的骨干网络是用于语义分割的编码解码全卷积网络,虽然可以得到较高的检测精度,但生成的模型由于参数较多,模型体积依然较大,难以用于硬件资源相对有限的果园作业平台。
图3深度可分离卷积结构示意图
Fig.3Structure diagram of DSC
图4MobileNetV3中SE-Block的结构示意图
Fig.4Structure diagram of SE-Block in MobileNetV3
MobileNetV3在网络中大量使用了深度可分离卷积和具有线性瓶颈的反向残差结构,利用5×5大小的深度卷积替代网络中部分3×3的深度卷积,大幅减少了模型的参数量和计算开销,同时引入SE-Block以提高模型的精度,从而实现了模型体积、速度与精度的平衡。鉴于MobileNetV3在模型体积和速度上的优势,将其作为本研究网络模型的主干网络,用来提取苹果图像中的有效特征。MobileNetV3网络分为MobileNetV3-Large和MobileNetV3-Small两个版本,其中本研究使用MobileNetV3-Large拥有更深的网络,因此在模型精度上表现更为优异。
图5M-CenterNet网络结构示意图
Fig.5Network structure of M-CenterNet
网络整体的损失函数由目标中心点损失(Lk)、目标中心偏置损失(Loff)与目标大小损失(Lsize)组成,即:
对于目标关键点的损失,训练关键点网络时,将Ground Truth的关键点通过如式(3)的高斯核分散到热力图上,
对于目标中心的偏置损失,由于网络会对输入图像进行下采样操作,所得特征图必然会在重新映射到原图像上时产生精度误差,因此对于图像中每个有效中心点,额外添加一个Local Offset(Loff)来进行补偿。这样,所有类别c的中心点将共享同一个偏移预测值。Loff的偏置值由L1 loss计算得到,即:
本研究试验运用深度学习框架进行模型训练和测试,因此选用图形工作站作为硬件平台,硬件配置为Intel Core i7-7700 CPU处理器,32GB内存,NVIDIA TITAN Xp型GPU显卡(16GB),操作系统为Linux Ubuntu 16.04,并行计算框架为CUDA 10.0,深度神经网络加速库为CUDNN 7.5,使用python编程语言在Pytorch 1.0深度学习框架下实现本文网络模型的构建、训练和验证。
网络模型在带有GPU的硬件环境下进行训练,以提高模型训练的收敛速度。采用带动量因子(Momentum)的小批量(Mini-batch)随机梯度下降法(Stochastic Gradient Descent,SGD)来训练网络。其中,每一批量图像样本数量(Batch size)设置为16,动量因子设为固定值0.9,权值衰减(Decay)为5×10-4。权重的初始化会影响网络训练的收敛速度,因此本试验中采用均值为0、标准偏差为0.001的高斯分布对网络每一层的权重进行随机初始化。所有卷积层和反向卷积层的偏置(Bias)值均初始化为0。对网络中的所有层采用相同的学习速率,初始学习速率(Learning Rate)设为1.25×10-4,训练过程中,当验证集的检测精度停止增加时,则使用余弦退火(Cosine Annealing)的方式将学习速率降低为当前学习速率的10%,直到通过调整学习速率不再提高验证集的检测精度为止。同时,使用在线数据增强的方法对数据进行光度扭曲和随机抽样。
图6Loss值随迭代次数的变化曲线图
Fig.6Curve of the Loss value changing with the number of iterations
在果园作业平台实际工作中,平台移动会使图像采集环境发生变化。因此,试验分别选取测试集中不同光照方向、不同远近距离、不同遮挡程度和不同果实数量的苹果图像送入训练好的网络模型,对图像中的树上苹果进行自动检测并记录结果,以评价网络模型在不同条件下的检测能力。
图7不同光照情况的检测效果图
Fig.7Detection results of different illumination conditions
模型中的深度卷积神经网络具有较强的特征提取能力,能够根据不同苹果图像,自主提取不同特点的特征进行学习,从而克服因光照变化导致的过暗或过亮苹果目标无法较好检测的问题。
图 8不同远近距离的果实检测效果图
Fig.8Results of fruit detection in different distance conditions
图9果实受遮挡后的检测效果图(果实外露面积大于整个果实面积的1/2)
Fig.9Detection results of partly obscured apples(the exposed area of the apple is greater than 1/2 of its total area)
图10枝叶严重遮挡情况下的检测效果图(果实外露面积小于整个果实面积的1/3)
Fig.10Detection results of apples occluded severely by branches and leaves (the exposed area of the apple is less than 1/3 of its total area)
图11果实间严重遮挡情况下的检测效果图(果实外露面积小于整个果实面积的1/3)
Fig.11Detection results of fruits occluded severely by other fruits (the exposed area of the apple is less than 1/3 of its total area)
图12不同果实数量的检测效果图
Fig.12Results of fruit detection of different quantities
网络模型的检测精度决定着果园栽培平台能否成功对果实实施园艺作业,通常模型检测精度越高,果实作业成功率越高,同时,较小的模型容量意味着更小的硬件计算和存储资源占用,也更容易移植到硬件资源有限的果园作业平台中应用,而较快的模型运行速度代表着模型能够在更短时间内处理更多数据,也更加能满足果园栽培平台高效园艺作业的需求。鉴于此,分别从网络模型的检测精度、模型容量和运行速度三个维度来定量评价模型的综合性能。同时,将本网络模型与CenterNet和SSD等网络模型进行对比分析。
表 1不同网络模型的检测精度和模型容量对比
Table 1 Comparison of the detection accuracies and volumes of different network models
平均
精度
(%)
模型
体积
(MB)
参数量
(个)
表2不同网络模型的运行速度对比
Table 2 Running speed comparison of different network models
本研究网络模型在运行速度上的较好表现得益于网络中的无锚点检测策略和深度可分离卷积的轻量级模块,在CPU条件下深度可分离卷积比普通卷积的计算速度更快,同时无锚点的检测策略省去了NMS等操作,使得整个网络的推断时间大幅缩短,检测速度得以提升。
针对果园自然环境下苹果目标的视觉检测任务,提出了一种基于轻量级无锚点的树上苹果检测网络模型。通过改进CenterNet网络模型的特征提取网络,使用更加轻量化的MobileNetV3网络进行特征提取,并引入转置卷积获取更加有效的特征图,从而在苹果目标检测精度、模型容量以及运行速度上实现网络模型性能的综合提升。试验结果表明,本研究的M-CenterNet网络模型能够较好分辨出果实目标和背景,其误检率和漏检率分别为10.9%、5.8%,优于SSD和CenterNet网络模型;本研究网络模型的平均精度和模型体积分别为88.9%、14.2MB,相比于SSD网络模型,分别高了3.9%和降低了84.3%;相比于CenterNet网络模型,在检测精度相当的情况下,本研究提出的M-CenterNet网络模型的体积仅为CenterNet网络模型的1/4;本研究网络模型的帧率为8fps,相较于CenterNet和SSD网络模型,该模型在CPU环境下的运行速度提升了近1倍。
本研究的M-CenterNet网络模型在保持较高检测精度前提下,计算和存储资源占用更低、模型轻量化程度更高,且在硬件资源受限条件下检测速度更快,适合在户外果园移动作业平台上部署。在下一步的研究工作中,将继续改进模型,增加学习样本数量,探索更多方法优化苹果目标的检测性能,并在嵌入式设备中做进一步测试。
Fruit detection and segmentation for apple harvesting using visual sensor in orchards
基于R-FCN深度卷积神经网络的机器人疏果前苹果目标的识别
Recognition of apple targets before fruits thinning by robot based on R-FCN deep convolution neural network
基于YOLO深度卷积神经网络的复杂背景下机器人采摘苹果定位
Apple positioning based on YOLO deep convolutional neural network for picking robot in complex background
Multi-modal deep learning for Fuji apple detection using RGB-D cameras and their radiometric capabilities
Mapping almond orchard canopy volume, flowers, fruit and yield using LiDAR and vision sensors
Image segmentation for fruit detection and yield estimation in apple orchards
Apple identification in field environment with over the row machine vision system
Low and high-level visual feature-based apple detection from multi-modal images
Machine vision for counting fruit on mango tree canopies
Using colour features of cv.‘Gala’ apple fruits in an orchard in image processing to predict yield
Automated crop yield estimation for apple orchards
Green grape detection and picking-point calculation in a night-time natural environment using a charge-coupled device (ccd) vision sensor with artificial illumination
Combining SUN-based visual attention model and saliency contour detection algorithm for apple image segmentation
(78):
Sensors and systems for fruit detection and localization: A review
Automatic fruit recognition and counting from multiple images
Robust grape cluster detection in a vineyard by combining the AdaBoost framework and multiple color components
Detection and counting of immature green citrus fruit based on the local binary patterns (lbp) feature using illumination-normalized images
Lychee fruit detection based on monocular machine vision in orchard environment
19: no
MangoNet: A deep semantic segmentation architecture for a method to detect and count mangoes in an open orchard
inception-resnet and the impact of residual connections on learning
A survey of deep neural network architectures and their applications
Deepfruits: a fruit detection system using deep neural networks
Fruit detection for strawberry harvesting robot in non-structural environment based on Mask-RCNN
(163):
基于迁移学习与卷积神经网络的玉米植株病害识别
Corn plant disease recognition based on migration learning and convolutional neural network
Apple detection during different growth stages in orchards using the improved YOLO-V3 model
Deep learning for real-time fruit detection and orchard fruit load estimation: Benchmarking of ‘MangoYOLO’
Robotic kiwifruit harvesting using machine vision, convolutional neural networks, and robotic arms
AFP-Net: Realtime Anchor-Free Polyp Detection in Colonoscopy
Cornernet: Detecting objects as paired keypoints
Centernet: Keypoint triplets for object detection
Bottom-up object detection by grouping extreme and center points
Objects as Points
arXiv: 1904
Searching for mobilenetv3
arXiv: 1905
基于轻量化 SSD 的车辆及行人检测网络
Vehicle and pedestrian detection model based on lightweight SSD
用轻量化卷积神经网络图像语义分割的交通场景理解
Traffic scene understanding using image semantic segmentation with an improved lightweight convolutional-neural-network
轻量化卷积神经网络技术研究
Research on lightweight convolutional neural network technology
Squeeze-and-excitation networks
Microsoft coco: common objects in context