HRL2,1-NMF-DF
摘要
Proteins can perform their correct and stable biological functions only in specific subcellular structural regions. If the protein localization is abnormal, it leads to dysfunction of the organism and is involved in the pathogenesis of many human diseases. In this paper, firstly, the MULocDeep dataset was used as the benchmark dataset, and quantitative characterization of the dataset was performed using BLOSUM62 scoring matrix, position-specific scoring matrix, amino acid physicochemical properties and word embedding as coding methods. Secondly, the feature extraction of the data is performed using the idea of jump connection in convolutional neural networks and residual networks, and is effectively combined with the GRU network model. The multi-headed self-attention mechanism and cross-attention mechanism are applied to maximize the use of long-range sequence information as much as possible. The training and prediction of the model are performed with the eight-fold cross-validation method, and then the average of the eight model results is taken as the final prediction result. The experimental results show that the prediction accuracy of the models based on bidirectional GRU network and cross-attention mechanism reach 94% or higher, while the prediction accuracy at each subcellular position is high, which proves the feasibility of the models in comparison with the existing algorithms.