| 332 | 4 | 56 |
| 下载次数 | 被引频次 | 阅读次数 |
应用半监督学习方法拉普拉斯支持向量机(Laplace Support Vector Machine, LapSVM)对蛋白质结构类进行预测。首先7个氨基酸理化性质参数作为替代模型将蛋白质序列转换为数字序列,自协方差变换(Autocross-Covariance, AC)用来描述具有一定间隔氨基酸残基之间的相互关系并将数字序列变换为统一长度的向量,构建样本的特征空间。然后在数据集中分别随机挑选20、50、80、110、140、170个样本作为无标签样本构建训练集,一对多分解策略和留一法用来评价LapSVM模型的预报能力。分类器对蛋白质样本类预测正确率为94.12%,与标准支持向量机算法(Support Vector Machine, SVM)方法90.69%的预测精度相比有明显的竞争力。实验结果有效验证了无标签样本的分布信息作为弱规则能有效提升分类器的预报性能。同时提供了一种新颖的思路,应用半监督方法解决全监督学习问题,更小的优化规模,更好的预报能力。
Abstract:The purpose of the study is to predict protein structural classes by using Laplace support vector machine(LapSVM) which is a novel semi-supervised learning method. Firstly, seven amino acid physicochemical properties cited from literature was applied to transform the protein sequences into numeric vectors, and auto covariance(AC) was used in transforming the physicochemical properties of the amino acids of given proteins into features space with the same size, which is suitable for training models. AC focuses on the neighboring effects and the interactions between residues with a certain distance apart in protein sequences. Secondly, 20, 50, 80, 110, 140 and 170 samples were randomly selected as unlabelled samples to construct training datasets, "one-against-all" strategy and leave-one-out method were employed to estimate the performance. The prediction accuracy 94.12% was obtained, and it is very promising compared with the accuracy 90.69% predicted by Support Vector Machine(SVM). The experimental results proofed that the unlabelled samples input as weak rules can lightly improve the prediction performances, simultaneously, a novel idea is using semi-supervised method to solve a supervised learning problem intends to less optimal scale and higher prediction accuracy.
[1] 宋华,闫会峰.面向云环境的的蛋白质折叠模拟计算并行化算法[J].科学技术与工程,2018,18(5):258-263.
[2] 顾佳丽,伊鲁东,李东玲,等.光谱法研究小分子与蛋白质间相互作用的进展[J].科学技术与工程,2018,18(14):85-90.
[3] Belkin M,Niyogi P,Sindhwani V.Manifold regularization:a geometric framework for learning from labeled and unlabeled examples[J].Journal of Machine Learning Research,2006,7(12):2399-2434.
[4] Wu Jiang,Diao Yuanbo,Li Menglong,et al.A semi-supervised learning based method:Laplacian support vector machine ssed in diabetes disease diagnosis[J].Interdiscip Sci.Comput.Life Sci.,2009,1(2):151-155.
[5] 高金金,尹四清.基于半监督学习的双线性映射图像检索[J].科学技术与工程,2014,14(4):255-259.
[6] Chou KC.A key driving force in determination of protein structural classes[J].Biochemical and Biophysical Research Communications,1999,264:216-224.
[7] Chen C,Tian YX,Zou XY,et al.Using pseudo-amino acid composition and support vector machine to predict protein structural classes[J].Journal of Theoretical Biology,2006,243(3):444-448.
[8] Wu Jiang,Li MengLong,Yu LeZheng,et al.An ensemble classifier of support vector machines ssed to predict protein structural classes by fusing auto vovariance and pseudo-amino acid composition[J].The Protein Journal,2010,29(1):62-67.
[9] Zhang,TL,Ding,YS.Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes[J].Amino Acids,2007,33(4):623-629.
[10] Guo,YZ,Yu LZ,Wen ZN,et al.Using support vector machine combined with auto covariance to predict protein-protein interactions form protein sequences[J].Nucleic Acids Research.2008,36(9):3025-3030.
[11] Guo,YZ,Li ML,Lu MC,et al.Predicting G-protein coupled receptors-G-protein coupling specificity based on autocross-covariance transform[J].Proteins,2006,65(1):55-60.
[12] Trevor H,Robert T.Classification by pairwise coupling[J].The Annals of Statistics,1998,26(2):451-471.
[13] 刘晓悦,王云明.基于HOG-SVM的改进跟踪-学习-检测算法目标跟踪方法[J].科学技术与工程,2019,19(27):266-271.
[14] 王念秦.基于SVM-LR模型的滑坡易发性评价——以临潼区为例[J].科学技术与工程,2019,19(30):62-69.
[15] Lin H,Li QZ.Using pseudo amino acid composition to predict protein structural class:approached by incorporating 400 dipeptide components[J].Journal of Computational Chemistry,2007,28(9):1463-1466.
基本信息:
中图分类号:S51;TP181
引用信息:
[1]吴疆,董婷,蒋平.半监督学习算法拉普拉斯支持向量机应用于蛋白质结构类预测[J].微型电脑应用,2020,36(08):5-8.
基金信息:
国家自然科学基金(51864046);; 陕西省科技厅项目(2019NY-182)
2020-08-20
2020-08-20