|本期目录/Table of Contents|

[1]冀中,郭威辰.基于局部保持典型相关分析的零样本动作识别[J].天津大学学报(自然科学版),2017,(09):975-983.[doi:10.11784/tdxbz201607010]
 Ji Zhong,Guo Weichen.Zero Shot Action Recognition Based on Local Preserving Canonical Correlation Analysis[J].Journal of Tianjin University,2017,(09):975-983.[doi:10.11784/tdxbz201607010]
点击复制

基于局部保持典型相关分析的零样本动作识别()
分享到:

《天津大学学报(自然科学版)》[ISSN:0493-2137/CN:12-1127/N]

卷:
期数:
2017年09
页码:
975-983
栏目:
电气自动化与信息工程
出版日期:
2017-09-22

文章信息/Info

Title:
Zero Shot Action Recognition Based on Local Preserving Canonical Correlation Analysis
文章编号:
0493-2137(2017)09-0975-09
作者:
冀中 郭威辰
天津大学电气自动化与信息工程学院,天津 300072
Author(s):
Ji Zhong Guo Weichen
School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
关键词:
零样本学习 动作识别 典型相关分析 局部保持
Keywords:
zero shot learning(ZSL) action recognition canonical correlation analysis(CCA) local preserving
分类号:
TP391
DOI:
10.11784/tdxbz201607010
文献标志码:
A
摘要:
动作识别领域需要识别的类别越来越多, 这使得标注足够多的训练数据越来越难.零样本学习是针对传统机器学习收集和标注数据日益困难而提出的一种新思路.针对基于零样本学习的动作识别问题, 提出了一种基于局部保持典型相关分析映射的方法.该方法使用流形约束的典型相关分析将视觉特征和辅助特征映射到一个公共特征空间, 并且在映射过程中保留视觉特征和辅助特征的局部信息, 还考虑了域转换所带来的不利影响, 同时采用自训练和hubness修正等方法增强所提方法的鲁棒性.通过在主流数据集HMDB51和UCF101上的大量实验, 表明所提方法具有较好的零样本学习性能.
Abstract:
The number of categories for action recognition is growing rapidly and it has become increasingly hard to label sufficient training data for learning classification models of all categories. Zero shot learning(ZSL)is an attractive approach aiming at handling the difficulty in collecting ever more data and labeling them exhaustively. This paper proposes a ZSL-based action recognition method with the idea of local preserving canonical correlation analysis(LPCCA). Specifically,a mapping from visual and side information to a common CCA feature space is constructed,using a manifold-regularized term. The impact of domain shift is also taken into consideration. Approaches of self-training and hubness correction are applied to improve the robustness of the proposed method. The proposed method is evaluated extensively on popular human action datasets of HMDB51 and UCF101. The results demonstrate that the proposed method achieves a better performance against the state-of-the-art with a simple and efficient pipeline.

参考文献/References:

[1] Xu X, Hospedales T, Gong S. Zero-shot action recognition by word-vector embedding[J]. International Journal of Computer Vision, 2015, 123(3): 309-333.
[2] Xu X, Hospedales T, Gong S. Semantic embedding space for zero-shot action recognition[C]//IEEE International Conference on Image Processing. Quebec City, Canada, 2015: 63-67.
[3] Fu Y, Hospedales T M, Xiang T, et al. Transductive multi-view zero-shot learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(11): 2332-2345.
[4] Elhoseiny M, Liu J, Cheng H, et al. Zero-shot event detection by multimodal distributional semantic embedding of videos[C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Phoenix, USA, 2015: 10-19.
[5] Zhang Z, Saligrama V. Zero-shot learning via semantic similarity embedding[C]// IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 4166-4174.
[6] Lampert C H, Nickisch H, Harmeling S. Attribute-based classification for zero-shot visual object categorization [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(3): 453-465.
[7] Liu J, Kuipers B, Savarese S. Recognizing human actions by attributes[C]//IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA, 2015: 3337-3344.
[8] Lampert C H, Nickisch H, Harmeling S. Learning to detect unseen object classes by between-class attribute transfer[C]//IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA, 2009: 951-958.
[9] Fu Y, Hospedales T M, Xiang T, et al. Learning multimodal latent attributes[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(2): 303-316.
[10] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems. South Lake Tahoe, USA, 2013: 3111-3119.
[11] Socher R, Ganjoo M, Manning C D, et al. Zero-shot learning through cross-modal transfer[C]//Advances in Neural Information Processing Systems. South Lake Tahoe, USA, 2013: 935-943.
[12] Habibian A, Mensink T, Snoek C G. Videostory: A new multimedia embedding for few-example recognition and translation of events[C]//Proceedings of the ACM International Conference on Multimedia. New York, USA, 2014: 17-26.
[13] Fu Z, Xiang T A, Kodirov E, et al. Zero-shot object recognition by semantic manifold distance[C]//IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015: 2635-2644.
[14] Akata Z, Reed S, Walter D, et al. Evaluation of output embeddings for fine-grained image classification [C]//IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015: 2927-2936.
[15] Soomro K, Zamir A R, Shah M. Ucf101: A dataset of 101 Human actions classes from videos in the wild[J]. Computer Science, 2012(11):1-7.
[16] Frome A, Corrado G S, Shlens J, et al. Devise: A deep visual-semantic embedding model[C]//Advances in Neural Information Processing Systems. South Lake Tahoe, USA, 2013: 2121-2129.
[17] Romera-Paredes B, Torr P. An embarrassingly simple approach to zero-shot learning[C]//Proceedings of International Conference on Machine Learning. Lille, France, 2015: 2152-2161.
[18] Liu M, Zhang D, Chen S. Attribute relation learning for zero-shot classification[J]. Neurocomputing, 2014, 139(2): 34-46.
[19] Xian Y, Akata Z, Sharma G , et al. Latent embeddings for zero-shot classification[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas,USA, 2016: 69-77.
[20] Pan S J, Yang Q. A survey on transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10):1345-1359.
[21] Tran D, Bourdev L, Fergus R, et al. Learning spatiotemporal features with 3D convolutional networks [C]//IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 4489-4497.
[22] Yang Y, Liu R, Deng C, et al. Multi-task human action recognition via exploring super-category[J]. Signal Processing, 2016, 124(6):36-44.
[23] Yang Y, Deng C, Tao D, et al. Latent max-margin multitask learning with skelets for 3-D action recognition[J]. IEEE Transactions on Cybernetics, 2016, 47 (2) :1-10.
[24] Zhao F, Huang Y, Wang L, et al. Relevance topic model for unstructured social group activity recognition [C]//Advances in Neural Information Processing Systems. South Lake Tahoe,USA, 2013: 2580-2588.
[25] Gan C, Lin M, Yang Y, et al. Exploring semantic inter-class relationships (SIR) for zero-shot action recognition [C]//AAAI Conference on Artificial Intelligence. Austin, German, 2015: 468-471.
[26] Melzer T, Reiter M, Bischof H. Appearance models based on kernel canonical correlation analysis[J]. Pattern Recognition, 2003, 36(9): 1961-1971.
[27] Hsieh W W. Nonlinear canonical correlation analysis by neural networks[J]. Neural Networks, 2000, 13(10): 1095-1105.
[28] Kumar S, Martin E B, Morris A J. Non-linear canonical correlation analysis using a RBF networks[C]//Eurorean Symposium on Artificial Neural Networks. Bruges, Belgique, 2002: 507-512.
[29] Lai P L, Fyfe C. A neural implementation of canonical
correlation analysis[J]. Neural Networks, 1999, 12(10): 1391-1397.
[30] Kambhatla N, Leen T K. Dimension reduction by local principal component analysis[J]. Neural Computation, 1997, 9(7):1493-1516.
[31] He X. Locality preserving projections[C]//Advances in Neural Information Processing Systems. Chicago, USA,2005: 186-197.
[32] Verbeek J J, Roweis S T, Vlassis N. Non-linear CCA and PCA by alignment of local models[C]//Advances in Neural Information Processing Systems. Vancouver, Canada, 2003: 297-304.
[33] Roweis S T, Saul L K. Nonlinear dimensionality reduction by locally linear embedding[J]. Science, 2000, 290 (5500): 2323-2326.
[34] Saul L K, Roweis S T. Think globally, fit locally: Unsupervised learning of low dimensional manifolds[J]. Journal of Machine Learning Research, 2003, 4(2): 119-155.
[35] Sun T, Chen S. Locality preserving CCA with applications to data visualization and pose estimation[J]. Image and Vision Computing, 2007, 25(5): 531-543.
[36] Wang H, Schmid C. Action recognition with improved trajectories[C]//IEEE International Conference on Computer Vision. Sydney, Australia, 2013: 3551-3558
[37] Dinu G, Lazaridou A, Baroni M. Improving zero-shot learning by mitigating the Hubness problem[C]// International Conference on Learning Representations. San Diego, USA, 2015: 10-20.
[38] Kuehne H, Jhuang H, Garrote E, et al. Hmdb: A large video database for human motion recognition[C]//IEEE International Conference on Computer Vision. Barcelona, Spain,2011: 2556-2563.

相似文献/References:

[1]冀中,谢于中,庞彦伟.基于典型相关分析和距离度量学习的零样本学习[J].天津大学学报(自然科学版),2017,(08):813.[doi:10.11784/tdxbz201606003]
 Ji Zhong,Xie Yuzhong,Pang Yanwei.Zero-Shot Learning Based on Canonical Correlation Analysis and Distance Metric Learning[J].Journal of Tianjin University,2017,(09):813.[doi:10.11784/tdxbz201606003]

备注/Memo

备注/Memo:
收稿日期: 2016-07-04; 修回日期: 2016-09-29.
作者简介: 冀中(1979—), 男, 副教授.
通讯作者: 冀中, jizhong@tju.edu.cn.
基金项目: 国家自然科学基金资助项目(61271325, 61472273); 天津大学“北洋学者-青年骨干”教师资助项目(2015XRG-0014).
Supported by the National Natural Science Foundation of China(No. 61271325 and No. 61472273)and Elite Scholar Program of Tianjin Uni-
versity(No. 2015XRG-0014).
更新日期/Last Update: 2017-09-10