Summary of datasets.
<div><p>Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of cellular heterogeneity. A major challenge, however, lies in the prevalence of non-biological zeros—false measurements caused by technical limitations that mask a cell’s true transcriptome. This fundamental iss...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| مؤلفون آخرون: | , , |
| منشور في: |
2025
|
| الموضوعات: | |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1852014361775702016 |
|---|---|
| author | Siyi Huang (8562174) |
| author2 | Linfeng Jiang (2416375) Ming Yi (15051) Yuan Zhu (148570) |
| author2_role | author author author |
| author_facet | Siyi Huang (8562174) Linfeng Jiang (2416375) Ming Yi (15051) Yuan Zhu (148570) |
| author_role | author |
| dc.creator.none.fl_str_mv | Siyi Huang (8562174) Linfeng Jiang (2416375) Ming Yi (15051) Yuan Zhu (148570) |
| dc.date.none.fl_str_mv | 2025-12-01T18:50:59Z |
| dc.identifier.none.fl_str_mv | 10.1371/journal.pcbi.1013744.t002 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/dataset/Summary_of_datasets_/30756103 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Genetics Molecular Biology Plant Biology Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified Information Systems not elsewhere classified xlink "> single uses bulk rna recovers expression values providing clear guidelines including cell clustering false measurements caused extensive practical evaluation essential downstream analyses differential expression detection cell rna sequencing cell &# 8217 three key innovations guided imputation engine distort biological signals aware normalization step d3impute demonstrates consistent accurately identify non true biological zeros seq data analysis biological zeros true transcriptome key hurdle guide imputation biological reference aware modeling aware discrimination seq data inflated data data recovery trajectory inference technical limitations specific characteristics significant improvements oriented solution optimal application network discriminator major challenge introduce d3impute handling zero genuinely absent generalizable framework fundamental issue computational methods comprehensive benchmarking cellular heterogeneity biologically informed also offers 12 state |
| dc.title.none.fl_str_mv | Summary of datasets. |
| dc.type.none.fl_str_mv | Dataset info:eu-repo/semantics/publishedVersion dataset |
| description | <div><p>Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of cellular heterogeneity. A major challenge, however, lies in the prevalence of non-biological zeros—false measurements caused by technical limitations that mask a cell’s true transcriptome. This fundamental issue of distinguishing these artifacts from true biological zeros, where a gene is genuinely absent, remains a key hurdle for computational methods, as misclassification can distort biological signals during data recovery. To overcome this, we introduce D3Impute, a discriminative imputation framework built on three key innovations: (1) a distribution-aware normalization step that adapts to dataset-specific characteristics while preserving meaningful biological variation; (2) a dual-network discriminator that uses bulk RNA-seq data as a biological reference to accurately identify non-biological zeros while retaining the true biological zeros; and (3) a density-guided imputation engine that recovers expression values while maintaining local cellular neighborhood structures. Through comprehensive benchmarking against 12 state-of-the-art methods across six diverse datasets, D3Impute demonstrates consistent and significant improvements in essential downstream analyses, including cell clustering, trajectory inference, and differential expression detection. Furthermore, we provide an extensive practical evaluation of D3Impute, demonstrating its robustness across varying data qualities and providing clear guidelines for optimal application. By offering a robust, biologically informed, and user-oriented solution, D3Impute not only enhances scRNA-seq data analysis but also offers a generalizable framework for handling zero-inflated data in computational biology.</p></div> |
| eu_rights_str_mv | openAccess |
| id | Manara_0cc0803ffcfc007238552695a67eb97a |
| identifier_str_mv | 10.1371/journal.pcbi.1013744.t002 |
| network_acronym_str | Manara |
| network_name_str | ManaraRepo |
| oai_identifier_str | oai:figshare.com:article/30756103 |
| publishDate | 2025 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | Summary of datasets.Siyi Huang (8562174)Linfeng Jiang (2416375)Ming Yi (15051)Yuan Zhu (148570)GeneticsMolecular BiologyPlant BiologyBiological Sciences not elsewhere classifiedMathematical Sciences not elsewhere classifiedInformation Systems not elsewhere classifiedxlink "> singleuses bulk rnarecovers expression valuesproviding clear guidelinesincluding cell clusteringfalse measurements causedextensive practical evaluationessential downstream analysesdifferential expression detectioncell rna sequencingcell &# 8217three key innovationsguided imputation enginedistort biological signalsaware normalization stepd3impute demonstrates consistentaccurately identify nontrue biological zerosseq data analysisbiological zerostrue transcriptomekey hurdleguide imputationbiological referenceaware modelingaware discriminationseq datainflated datadata recoverytrajectory inferencetechnical limitationsspecific characteristicssignificant improvementsoriented solutionoptimal applicationnetwork discriminatormajor challengeintroduce d3imputehandling zerogenuinely absentgeneralizable frameworkfundamental issuecomputational methodscomprehensive benchmarkingcellular heterogeneitybiologically informedalso offers12 state<div><p>Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of cellular heterogeneity. A major challenge, however, lies in the prevalence of non-biological zeros—false measurements caused by technical limitations that mask a cell’s true transcriptome. This fundamental issue of distinguishing these artifacts from true biological zeros, where a gene is genuinely absent, remains a key hurdle for computational methods, as misclassification can distort biological signals during data recovery. To overcome this, we introduce D3Impute, a discriminative imputation framework built on three key innovations: (1) a distribution-aware normalization step that adapts to dataset-specific characteristics while preserving meaningful biological variation; (2) a dual-network discriminator that uses bulk RNA-seq data as a biological reference to accurately identify non-biological zeros while retaining the true biological zeros; and (3) a density-guided imputation engine that recovers expression values while maintaining local cellular neighborhood structures. Through comprehensive benchmarking against 12 state-of-the-art methods across six diverse datasets, D3Impute demonstrates consistent and significant improvements in essential downstream analyses, including cell clustering, trajectory inference, and differential expression detection. Furthermore, we provide an extensive practical evaluation of D3Impute, demonstrating its robustness across varying data qualities and providing clear guidelines for optimal application. By offering a robust, biologically informed, and user-oriented solution, D3Impute not only enhances scRNA-seq data analysis but also offers a generalizable framework for handling zero-inflated data in computational biology.</p></div>2025-12-01T18:50:59ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1371/journal.pcbi.1013744.t002https://figshare.com/articles/dataset/Summary_of_datasets_/30756103CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/307561032025-12-01T18:50:59Z |
| spellingShingle | Summary of datasets. Siyi Huang (8562174) Genetics Molecular Biology Plant Biology Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified Information Systems not elsewhere classified xlink "> single uses bulk rna recovers expression values providing clear guidelines including cell clustering false measurements caused extensive practical evaluation essential downstream analyses differential expression detection cell rna sequencing cell &# 8217 three key innovations guided imputation engine distort biological signals aware normalization step d3impute demonstrates consistent accurately identify non true biological zeros seq data analysis biological zeros true transcriptome key hurdle guide imputation biological reference aware modeling aware discrimination seq data inflated data data recovery trajectory inference technical limitations specific characteristics significant improvements oriented solution optimal application network discriminator major challenge introduce d3impute handling zero genuinely absent generalizable framework fundamental issue computational methods comprehensive benchmarking cellular heterogeneity biologically informed also offers 12 state |
| status_str | publishedVersion |
| title | Summary of datasets. |
| title_full | Summary of datasets. |
| title_fullStr | Summary of datasets. |
| title_full_unstemmed | Summary of datasets. |
| title_short | Summary of datasets. |
| title_sort | Summary of datasets. |
| topic | Genetics Molecular Biology Plant Biology Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified Information Systems not elsewhere classified xlink "> single uses bulk rna recovers expression values providing clear guidelines including cell clustering false measurements caused extensive practical evaluation essential downstream analyses differential expression detection cell rna sequencing cell &# 8217 three key innovations guided imputation engine distort biological signals aware normalization step d3impute demonstrates consistent accurately identify non true biological zeros seq data analysis biological zeros true transcriptome key hurdle guide imputation biological reference aware modeling aware discrimination seq data inflated data data recovery trajectory inference technical limitations specific characteristics significant improvements oriented solution optimal application network discriminator major challenge introduce d3impute handling zero genuinely absent generalizable framework fundamental issue computational methods comprehensive benchmarking cellular heterogeneity biologically informed also offers 12 state |