LNCRI: Long Non-Coding RNA Identifier in Multiple Species

<p>The pervasive nature of long non-coding RNA (lncRNA) transcription in the mammalian genomes has changed our protein-centric view of genomes. But the identification of lncRNAs is an important task to discover their functional role in species. The rapid development of next-generation sequenci...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Saleh Musleh (15279190) (author)
مؤلفون آخرون: Mohammad Tariqul Islam (7854059) (author), Tanvir Alam (638619) (author)
منشور في: 2021
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513506031697920
author Saleh Musleh (15279190)
author2 Mohammad Tariqul Islam (7854059)
Tanvir Alam (638619)
author2_role author
author
author_facet Saleh Musleh (15279190)
Mohammad Tariqul Islam (7854059)
Tanvir Alam (638619)
author_role author
dc.creator.none.fl_str_mv Saleh Musleh (15279190)
Mohammad Tariqul Islam (7854059)
Tanvir Alam (638619)
dc.date.none.fl_str_mv 2021-11-30T09:00:00Z
dc.identifier.none.fl_str_mv 10.1109/access.2021.3131846
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/LNCRI_Long_Non-Coding_RNA_Identifier_in_Multiple_Species/26975530
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Biological sciences
Bioinformatics and computational biology
Proteins
Mice
RNA
Encoding
Tools
Genomics
Task analysis
Long non-coding RNA
lncRNA
mRNA
machine learning
sequence analysis
dc.title.none.fl_str_mv LNCRI: Long Non-Coding RNA Identifier in Multiple Species
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p>The pervasive nature of long non-coding RNA (lncRNA) transcription in the mammalian genomes has changed our protein-centric view of genomes. But the identification of lncRNAs is an important task to discover their functional role in species. The rapid development of next-generation sequencing technology leveraged the opportunity to discover many lncRNA transcripts. However, the cost and time-consuming nature of transcriptomics verification techniques barred the research community from focusing on lncRNA identification. To overcome these challenges we developed LNCRI (Long Non-Coding RNA Identifier), a novel machine learning (ML)-based tool for the identification of lncRNA transcripts. We leveraged weighted k-mer, pseudo nucleotide composition, hexamer usage bias, Fickett score, information of open reading frame, UTR regions, and HMMER score as a feature set to develop LNCRI. LNCRI outperformed other existing models in the task of distinguishing lncRNA transcripts from protein-coding mRNA transcripts with high accuracy in human and mouse. LNCRI also outperformed the existing tools for cross-species prediction on chimpanzee, monkey, gorilla, orangutan, cow, pig, frog and zebrafish. We applied the SHAP algorithm to demonstrate the importance of most dominating features that were leveraged in the model. We believe our tool will support the research community to identify the lncRNA transcripts in a highly accurate manner. The benchmark datasets and source code are available in GitHub: http://github.com/smusleh/LNCRI.</p> <p> </p> <h2>Other Information</h2> <p>Published in: Pathogens<br> License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br> See article on publisher's website: <a href="https://doi.org/10.1109/access.2021.3131846" target="_blank"><u>https://doi.org/10.1109/access.2021.3131846</u></a></p>
eu_rights_str_mv openAccess
id Manara2_d882089e4ed83aae287831481bbb7cae
identifier_str_mv 10.1109/access.2021.3131846
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/26975530
publishDate 2021
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling LNCRI: Long Non-Coding RNA Identifier in Multiple SpeciesSaleh Musleh (15279190)Mohammad Tariqul Islam (7854059)Tanvir Alam (638619)Biological sciencesBioinformatics and computational biologyProteinsMiceRNAEncodingToolsGenomicsTask analysisLong non-coding RNAlncRNAmRNAmachine learningsequence analysis<p>The pervasive nature of long non-coding RNA (lncRNA) transcription in the mammalian genomes has changed our protein-centric view of genomes. But the identification of lncRNAs is an important task to discover their functional role in species. The rapid development of next-generation sequencing technology leveraged the opportunity to discover many lncRNA transcripts. However, the cost and time-consuming nature of transcriptomics verification techniques barred the research community from focusing on lncRNA identification. To overcome these challenges we developed LNCRI (Long Non-Coding RNA Identifier), a novel machine learning (ML)-based tool for the identification of lncRNA transcripts. We leveraged weighted k-mer, pseudo nucleotide composition, hexamer usage bias, Fickett score, information of open reading frame, UTR regions, and HMMER score as a feature set to develop LNCRI. LNCRI outperformed other existing models in the task of distinguishing lncRNA transcripts from protein-coding mRNA transcripts with high accuracy in human and mouse. LNCRI also outperformed the existing tools for cross-species prediction on chimpanzee, monkey, gorilla, orangutan, cow, pig, frog and zebrafish. We applied the SHAP algorithm to demonstrate the importance of most dominating features that were leveraged in the model. We believe our tool will support the research community to identify the lncRNA transcripts in a highly accurate manner. The benchmark datasets and source code are available in GitHub: http://github.com/smusleh/LNCRI.</p> <p> </p> <h2>Other Information</h2> <p>Published in: Pathogens<br> License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br> See article on publisher's website: <a href="https://doi.org/10.1109/access.2021.3131846" target="_blank"><u>https://doi.org/10.1109/access.2021.3131846</u></a></p>2021-11-30T09:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1109/access.2021.3131846https://figshare.com/articles/journal_contribution/LNCRI_Long_Non-Coding_RNA_Identifier_in_Multiple_Species/26975530CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/269755302021-11-30T09:00:00Z
spellingShingle LNCRI: Long Non-Coding RNA Identifier in Multiple Species
Saleh Musleh (15279190)
Biological sciences
Bioinformatics and computational biology
Proteins
Mice
RNA
Encoding
Tools
Genomics
Task analysis
Long non-coding RNA
lncRNA
mRNA
machine learning
sequence analysis
status_str publishedVersion
title LNCRI: Long Non-Coding RNA Identifier in Multiple Species
title_full LNCRI: Long Non-Coding RNA Identifier in Multiple Species
title_fullStr LNCRI: Long Non-Coding RNA Identifier in Multiple Species
title_full_unstemmed LNCRI: Long Non-Coding RNA Identifier in Multiple Species
title_short LNCRI: Long Non-Coding RNA Identifier in Multiple Species
title_sort LNCRI: Long Non-Coding RNA Identifier in Multiple Species
topic Biological sciences
Bioinformatics and computational biology
Proteins
Mice
RNA
Encoding
Tools
Genomics
Task analysis
Long non-coding RNA
lncRNA
mRNA
machine learning
sequence analysis