LNCRI: Long Non-Coding RNA Identifier in Multiple Species
<p>The pervasive nature of long non-coding RNA (lncRNA) transcription in the mammalian genomes has changed our protein-centric view of genomes. But the identification of lncRNAs is an important task to discover their functional role in species. The rapid development of next-generation sequenci...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| مؤلفون آخرون: | , |
| منشور في: |
2021
|
| الموضوعات: | |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1864513506031697920 |
|---|---|
| author | Saleh Musleh (15279190) |
| author2 | Mohammad Tariqul Islam (7854059) Tanvir Alam (638619) |
| author2_role | author author |
| author_facet | Saleh Musleh (15279190) Mohammad Tariqul Islam (7854059) Tanvir Alam (638619) |
| author_role | author |
| dc.creator.none.fl_str_mv | Saleh Musleh (15279190) Mohammad Tariqul Islam (7854059) Tanvir Alam (638619) |
| dc.date.none.fl_str_mv | 2021-11-30T09:00:00Z |
| dc.identifier.none.fl_str_mv | 10.1109/access.2021.3131846 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/journal_contribution/LNCRI_Long_Non-Coding_RNA_Identifier_in_Multiple_Species/26975530 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Biological sciences Bioinformatics and computational biology Proteins Mice RNA Encoding Tools Genomics Task analysis Long non-coding RNA lncRNA mRNA machine learning sequence analysis |
| dc.title.none.fl_str_mv | LNCRI: Long Non-Coding RNA Identifier in Multiple Species |
| dc.type.none.fl_str_mv | Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal |
| description | <p>The pervasive nature of long non-coding RNA (lncRNA) transcription in the mammalian genomes has changed our protein-centric view of genomes. But the identification of lncRNAs is an important task to discover their functional role in species. The rapid development of next-generation sequencing technology leveraged the opportunity to discover many lncRNA transcripts. However, the cost and time-consuming nature of transcriptomics verification techniques barred the research community from focusing on lncRNA identification. To overcome these challenges we developed LNCRI (Long Non-Coding RNA Identifier), a novel machine learning (ML)-based tool for the identification of lncRNA transcripts. We leveraged weighted k-mer, pseudo nucleotide composition, hexamer usage bias, Fickett score, information of open reading frame, UTR regions, and HMMER score as a feature set to develop LNCRI. LNCRI outperformed other existing models in the task of distinguishing lncRNA transcripts from protein-coding mRNA transcripts with high accuracy in human and mouse. LNCRI also outperformed the existing tools for cross-species prediction on chimpanzee, monkey, gorilla, orangutan, cow, pig, frog and zebrafish. We applied the SHAP algorithm to demonstrate the importance of most dominating features that were leveraged in the model. We believe our tool will support the research community to identify the lncRNA transcripts in a highly accurate manner. The benchmark datasets and source code are available in GitHub: http://github.com/smusleh/LNCRI.</p> <p> </p> <h2>Other Information</h2> <p>Published in: Pathogens<br> License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br> See article on publisher's website: <a href="https://doi.org/10.1109/access.2021.3131846" target="_blank"><u>https://doi.org/10.1109/access.2021.3131846</u></a></p> |
| eu_rights_str_mv | openAccess |
| id | Manara2_d882089e4ed83aae287831481bbb7cae |
| identifier_str_mv | 10.1109/access.2021.3131846 |
| network_acronym_str | Manara2 |
| network_name_str | Manara2 |
| oai_identifier_str | oai:figshare.com:article/26975530 |
| publishDate | 2021 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | LNCRI: Long Non-Coding RNA Identifier in Multiple SpeciesSaleh Musleh (15279190)Mohammad Tariqul Islam (7854059)Tanvir Alam (638619)Biological sciencesBioinformatics and computational biologyProteinsMiceRNAEncodingToolsGenomicsTask analysisLong non-coding RNAlncRNAmRNAmachine learningsequence analysis<p>The pervasive nature of long non-coding RNA (lncRNA) transcription in the mammalian genomes has changed our protein-centric view of genomes. But the identification of lncRNAs is an important task to discover their functional role in species. The rapid development of next-generation sequencing technology leveraged the opportunity to discover many lncRNA transcripts. However, the cost and time-consuming nature of transcriptomics verification techniques barred the research community from focusing on lncRNA identification. To overcome these challenges we developed LNCRI (Long Non-Coding RNA Identifier), a novel machine learning (ML)-based tool for the identification of lncRNA transcripts. We leveraged weighted k-mer, pseudo nucleotide composition, hexamer usage bias, Fickett score, information of open reading frame, UTR regions, and HMMER score as a feature set to develop LNCRI. LNCRI outperformed other existing models in the task of distinguishing lncRNA transcripts from protein-coding mRNA transcripts with high accuracy in human and mouse. LNCRI also outperformed the existing tools for cross-species prediction on chimpanzee, monkey, gorilla, orangutan, cow, pig, frog and zebrafish. We applied the SHAP algorithm to demonstrate the importance of most dominating features that were leveraged in the model. We believe our tool will support the research community to identify the lncRNA transcripts in a highly accurate manner. The benchmark datasets and source code are available in GitHub: http://github.com/smusleh/LNCRI.</p> <p> </p> <h2>Other Information</h2> <p>Published in: Pathogens<br> License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br> See article on publisher's website: <a href="https://doi.org/10.1109/access.2021.3131846" target="_blank"><u>https://doi.org/10.1109/access.2021.3131846</u></a></p>2021-11-30T09:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1109/access.2021.3131846https://figshare.com/articles/journal_contribution/LNCRI_Long_Non-Coding_RNA_Identifier_in_Multiple_Species/26975530CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/269755302021-11-30T09:00:00Z |
| spellingShingle | LNCRI: Long Non-Coding RNA Identifier in Multiple Species Saleh Musleh (15279190) Biological sciences Bioinformatics and computational biology Proteins Mice RNA Encoding Tools Genomics Task analysis Long non-coding RNA lncRNA mRNA machine learning sequence analysis |
| status_str | publishedVersion |
| title | LNCRI: Long Non-Coding RNA Identifier in Multiple Species |
| title_full | LNCRI: Long Non-Coding RNA Identifier in Multiple Species |
| title_fullStr | LNCRI: Long Non-Coding RNA Identifier in Multiple Species |
| title_full_unstemmed | LNCRI: Long Non-Coding RNA Identifier in Multiple Species |
| title_short | LNCRI: Long Non-Coding RNA Identifier in Multiple Species |
| title_sort | LNCRI: Long Non-Coding RNA Identifier in Multiple Species |
| topic | Biological sciences Bioinformatics and computational biology Proteins Mice RNA Encoding Tools Genomics Task analysis Long non-coding RNA lncRNA mRNA machine learning sequence analysis |