|本期目录/Table of Contents|

[1]侯春萍,张倩楠,王宝亮,等.基于Hadoop的视觉词袋模型图像分类算法[J].天津大学学报(自然科学版),2017,(06):643-648.[doi:10.11784/tdxbz201604045]
 Hou Chunping,Zhang Qiannan,Wang Baoliang,et al.Image Classification Approach of Bag of Visual Words Model Based on Hadoop[J].Journal of Tianjin University,2017,(06):643-648.[doi:10.11784/tdxbz201604045]
点击复制

基于Hadoop的视觉词袋模型图像分类算法()
分享到:

《天津大学学报(自然科学版)》[ISSN:0493-2137/CN:12-1127/N]

卷:
期数:
2017年06
页码:
643-648
栏目:
电气自动化与信息工程
出版日期:
2017-06-19

文章信息/Info

Title:
Image Classification Approach of Bag of Visual Words Model Based on Hadoop
文章编号:
0493-2137(2017)06-0643-06
作者:
侯春萍1 张倩楠1 王宝亮2 常鹏2 孙韶伟2
1. 天津大学电气自动化与信息工程学院,天津 300072;2. 天津大学信息与网络中心,天津 300072
Author(s):
Hou Chunping1 Zhang Qiannan1 Wang Baoliang2 Chang Peng2 Sun Shaowei2
1.School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
2.Information and Network Center, Tianjin University, Tianjin 300072, China
关键词:
Hadoop 图像分类 视觉词袋 随机森林 软分配
分类号:
TP391
DOI:
10.11784/tdxbz201604045
文献标志码:
A
摘要:
随着互联网的发展和数字图像获取技术的进步, 传统图像分类算法在处理海量数字图像时, 面临耗时过多、文件系统及处理架构落后的问题.针对这一问题, 利用主流的Hadoop开源分布式计算平台, 引入视觉词袋模型实现对图像的表示, 并对模型的图像直方图化过程做出改进, 提出一种自适应的特征分配方法, 最后采用易于并行的随机森林算法作为分类器, 以充分利用Hadoop平台强大的分布式计算能力.实验显示, 基于Hadoop平台的图像分类方法在处理大规模数据集时较单机环境能有效减少时间消耗, 同时具有良好的分类效果.
Abstract:
Abstract:As the Internet grows and technology of acquiring digital images advances rapidly,problems with the conventional image classification methods gradually arise while dealing with massive digital images,such as being time-consuming and lacking timely update of the file system and processing architecture.To combat this problem,an image classification approach is proposed based on Apache Hadoop,the mainstream open-source distributed processing system.Firstly,the bag of visual words(BoVW)model was utilized to achieve simplified image representations.Meanwhile,an improvement was made to the model during the histogram representation period and an adaptive soft assignment algorithm was proposed.Lastly,the easy-paralleled random forest algorithm was employed as the classifier so as to make full use of the advantages of the platform.Experiments show that the proposed method of image classification based on Hadoop could effectively decrease the computing time compared with single-PC method while dealing with mass images,and at the same time gain good classification results. Keywords: Hadoop;image classification;bag of visual words;random forest;soft assignment

参考文献/References:

[1] Doukim C, Dargham J, Chekima A. State of the art of content-based image classification[C]//The 2014 International Conference on Computational Science and Technology. Kota Kinabalu, Malaysia, 2014:1-6.
[2] Foody G M, Mathur A. A relative evaluation of multiclass image classification by support vector machines [J]. IEEE Transactions on Geoscience and Remote Sensing, 2004, 42(6):1335-1343.
[3] Sivaraman E, Manickachezian R. High performance and fault tolerant distributed file system for big data storage and processing using Hadoop[C]//The 2014 Interna-
tional Conference on Intelligent Computing Applications.
Coimbatore, India, 2014:32-36.
[4] Hadoop Architecture Guide[EB/OL]. http://hadoop. apache.org/docs/r1.2.1/hdfs_design. html, 2013-08-04.
[5] 宋枫溪. 自动文本分类若干基本问题研究[D]. 南京:南京理工大学计算机系, 2004.
Song Fengxi. Studies on Some Essential Problems in Automatic Text Categorization[D]. Nanjing:Department of Computer Science and Technology, Nanjing University of Science and Technology, 2004(in Chinese).
[6] Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2):91-110.
[7] 牛怡晗, 海沫. Hadoop平台下Mahout聚类算法的比较研究[J]. 计算机科学, 2015, 42(S1):465-469.
Niu Yihan, Hai Mo. Comparison research on Mahout clustering algorithms under Hadoop platform[J]. Computer Science, 2015, 42(S1):465-469(in Chinese).
[8] Chougrad H, Zouaki H, Alheyane O. Soft assignment vs hard assignment coding for bag of visual words. [C]//The 10th International Conference on Intelligent Systems:Theories and Applications. Rabat, Morocco, 2015:1-5.
[9] van Gemert J C, Veenman C J, Smeulders A W, et al. Visual word ambiguity[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(7):1271-1283.
[10] van Essen B, Macaraeg C, Gokhale M, et al. Accelerating a random forest classifier:Multi-core, GP-GPU or FPGA?[J]. IEEE International Symposium on Field-Programmable Custom Computing Machines, 2012, 282(1):232-239.
[11] Liu Qi, Liang Peng, Zhang Haitao, et al. Distributed image classification based on high-order features [C]//2015 12th IEEE International Conference on Electronic Measurement and Instruments. Qingdao, China, 2015:1122-1125.

备注/Memo

备注/Memo:
收稿日期: 2016-04-18; 修回日期: 2016-11-01.
作者简介: 侯春萍(1957—), 女, 教授, hcp@tju.edu.cn.
通讯作者: 王宝亮, wbl@tju.edu.cn.
基金项目: 国家自然科学基金资助项目(61571325).
Supported by the National Natural Science Foundation of China(No. 61571325).
更新日期/Last Update: 2017-06-10