Summary of datasets.

<div><p>Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of cellular heterogeneity. A major challenge, however, lies in the prevalence of non-biological zeros—false measurements caused by technical limitations that mask a cell’s true transcriptome. This fundamental iss...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Siyi Huang (8562174) (author)
مؤلفون آخرون: Linfeng Jiang (2416375) (author), Ming Yi (15051) (author), Yuan Zhu (148570) (author)
منشور في: 2025
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1852014361775702016
author Siyi Huang (8562174)
author2 Linfeng Jiang (2416375)
Ming Yi (15051)
Yuan Zhu (148570)
author2_role author
author
author
author_facet Siyi Huang (8562174)
Linfeng Jiang (2416375)
Ming Yi (15051)
Yuan Zhu (148570)
author_role author
dc.creator.none.fl_str_mv Siyi Huang (8562174)
Linfeng Jiang (2416375)
Ming Yi (15051)
Yuan Zhu (148570)
dc.date.none.fl_str_mv 2025-12-01T18:50:59Z
dc.identifier.none.fl_str_mv 10.1371/journal.pcbi.1013744.t002
dc.relation.none.fl_str_mv https://figshare.com/articles/dataset/Summary_of_datasets_/30756103
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Genetics
Molecular Biology
Plant Biology
Biological Sciences not elsewhere classified
Mathematical Sciences not elsewhere classified
Information Systems not elsewhere classified
xlink "> single
uses bulk rna
recovers expression values
providing clear guidelines
including cell clustering
false measurements caused
extensive practical evaluation
essential downstream analyses
differential expression detection
cell rna sequencing
cell &# 8217
three key innovations
guided imputation engine
distort biological signals
aware normalization step
d3impute demonstrates consistent
accurately identify non
true biological zeros
seq data analysis
biological zeros
true transcriptome
key hurdle
guide imputation
biological reference
aware modeling
aware discrimination
seq data
inflated data
data recovery
trajectory inference
technical limitations
specific characteristics
significant improvements
oriented solution
optimal application
network discriminator
major challenge
introduce d3impute
handling zero
genuinely absent
generalizable framework
fundamental issue
computational methods
comprehensive benchmarking
cellular heterogeneity
biologically informed
also offers
12 state
dc.title.none.fl_str_mv Summary of datasets.
dc.type.none.fl_str_mv Dataset
info:eu-repo/semantics/publishedVersion
dataset
description <div><p>Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of cellular heterogeneity. A major challenge, however, lies in the prevalence of non-biological zeros—false measurements caused by technical limitations that mask a cell’s true transcriptome. This fundamental issue of distinguishing these artifacts from true biological zeros, where a gene is genuinely absent, remains a key hurdle for computational methods, as misclassification can distort biological signals during data recovery. To overcome this, we introduce D3Impute, a discriminative imputation framework built on three key innovations: (1) a distribution-aware normalization step that adapts to dataset-specific characteristics while preserving meaningful biological variation; (2) a dual-network discriminator that uses bulk RNA-seq data as a biological reference to accurately identify non-biological zeros while retaining the true biological zeros; and (3) a density-guided imputation engine that recovers expression values while maintaining local cellular neighborhood structures. Through comprehensive benchmarking against 12 state-of-the-art methods across six diverse datasets, D3Impute demonstrates consistent and significant improvements in essential downstream analyses, including cell clustering, trajectory inference, and differential expression detection. Furthermore, we provide an extensive practical evaluation of D3Impute, demonstrating its robustness across varying data qualities and providing clear guidelines for optimal application. By offering a robust, biologically informed, and user-oriented solution, D3Impute not only enhances scRNA-seq data analysis but also offers a generalizable framework for handling zero-inflated data in computational biology.</p></div>
eu_rights_str_mv openAccess
id Manara_0cc0803ffcfc007238552695a67eb97a
identifier_str_mv 10.1371/journal.pcbi.1013744.t002
network_acronym_str Manara
network_name_str ManaraRepo
oai_identifier_str oai:figshare.com:article/30756103
publishDate 2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Summary of datasets.Siyi Huang (8562174)Linfeng Jiang (2416375)Ming Yi (15051)Yuan Zhu (148570)GeneticsMolecular BiologyPlant BiologyBiological Sciences not elsewhere classifiedMathematical Sciences not elsewhere classifiedInformation Systems not elsewhere classifiedxlink "> singleuses bulk rnarecovers expression valuesproviding clear guidelinesincluding cell clusteringfalse measurements causedextensive practical evaluationessential downstream analysesdifferential expression detectioncell rna sequencingcell &# 8217three key innovationsguided imputation enginedistort biological signalsaware normalization stepd3impute demonstrates consistentaccurately identify nontrue biological zerosseq data analysisbiological zerostrue transcriptomekey hurdleguide imputationbiological referenceaware modelingaware discriminationseq datainflated datadata recoverytrajectory inferencetechnical limitationsspecific characteristicssignificant improvementsoriented solutionoptimal applicationnetwork discriminatormajor challengeintroduce d3imputehandling zerogenuinely absentgeneralizable frameworkfundamental issuecomputational methodscomprehensive benchmarkingcellular heterogeneitybiologically informedalso offers12 state<div><p>Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of cellular heterogeneity. A major challenge, however, lies in the prevalence of non-biological zeros—false measurements caused by technical limitations that mask a cell’s true transcriptome. This fundamental issue of distinguishing these artifacts from true biological zeros, where a gene is genuinely absent, remains a key hurdle for computational methods, as misclassification can distort biological signals during data recovery. To overcome this, we introduce D3Impute, a discriminative imputation framework built on three key innovations: (1) a distribution-aware normalization step that adapts to dataset-specific characteristics while preserving meaningful biological variation; (2) a dual-network discriminator that uses bulk RNA-seq data as a biological reference to accurately identify non-biological zeros while retaining the true biological zeros; and (3) a density-guided imputation engine that recovers expression values while maintaining local cellular neighborhood structures. Through comprehensive benchmarking against 12 state-of-the-art methods across six diverse datasets, D3Impute demonstrates consistent and significant improvements in essential downstream analyses, including cell clustering, trajectory inference, and differential expression detection. Furthermore, we provide an extensive practical evaluation of D3Impute, demonstrating its robustness across varying data qualities and providing clear guidelines for optimal application. By offering a robust, biologically informed, and user-oriented solution, D3Impute not only enhances scRNA-seq data analysis but also offers a generalizable framework for handling zero-inflated data in computational biology.</p></div>2025-12-01T18:50:59ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1371/journal.pcbi.1013744.t002https://figshare.com/articles/dataset/Summary_of_datasets_/30756103CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/307561032025-12-01T18:50:59Z
spellingShingle Summary of datasets.
Siyi Huang (8562174)
Genetics
Molecular Biology
Plant Biology
Biological Sciences not elsewhere classified
Mathematical Sciences not elsewhere classified
Information Systems not elsewhere classified
xlink "> single
uses bulk rna
recovers expression values
providing clear guidelines
including cell clustering
false measurements caused
extensive practical evaluation
essential downstream analyses
differential expression detection
cell rna sequencing
cell &# 8217
three key innovations
guided imputation engine
distort biological signals
aware normalization step
d3impute demonstrates consistent
accurately identify non
true biological zeros
seq data analysis
biological zeros
true transcriptome
key hurdle
guide imputation
biological reference
aware modeling
aware discrimination
seq data
inflated data
data recovery
trajectory inference
technical limitations
specific characteristics
significant improvements
oriented solution
optimal application
network discriminator
major challenge
introduce d3impute
handling zero
genuinely absent
generalizable framework
fundamental issue
computational methods
comprehensive benchmarking
cellular heterogeneity
biologically informed
also offers
12 state
status_str publishedVersion
title Summary of datasets.
title_full Summary of datasets.
title_fullStr Summary of datasets.
title_full_unstemmed Summary of datasets.
title_short Summary of datasets.
title_sort Summary of datasets.
topic Genetics
Molecular Biology
Plant Biology
Biological Sciences not elsewhere classified
Mathematical Sciences not elsewhere classified
Information Systems not elsewhere classified
xlink "> single
uses bulk rna
recovers expression values
providing clear guidelines
including cell clustering
false measurements caused
extensive practical evaluation
essential downstream analyses
differential expression detection
cell rna sequencing
cell &# 8217
three key innovations
guided imputation engine
distort biological signals
aware normalization step
d3impute demonstrates consistent
accurately identify non
true biological zeros
seq data analysis
biological zeros
true transcriptome
key hurdle
guide imputation
biological reference
aware modeling
aware discrimination
seq data
inflated data
data recovery
trajectory inference
technical limitations
specific characteristics
significant improvements
oriented solution
optimal application
network discriminator
major challenge
introduce d3impute
handling zero
genuinely absent
generalizable framework
fundamental issue
computational methods
comprehensive benchmarking
cellular heterogeneity
biologically informed
also offers
12 state