Cellwise Outlier Detection in Heterogeneous Populations
<p>Real-world applications may be affected by outlying values. In the model-based clustering literature, several methodologies have been proposed to detect units that deviate from the majority of the data (rowwise outliers) and trim them from the parameter estimates. However, the discarded obs...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| مؤلفون آخرون: | , , |
| منشور في: |
2025
|
| الموضوعات: | |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1852018897395384320 |
|---|---|
| author | Giorgia Zaccaria (21245577) |
| author2 | Luis A. García-Escudero (21245580) Francesca Greselin (14203066) Agustín Mayo-Íscar (9571212) |
| author2_role | author author author |
| author_facet | Giorgia Zaccaria (21245577) Luis A. García-Escudero (21245580) Francesca Greselin (14203066) Agustín Mayo-Íscar (9571212) |
| author_role | author |
| dc.creator.none.fl_str_mv | Giorgia Zaccaria (21245577) Luis A. García-Escudero (21245580) Francesca Greselin (14203066) Agustín Mayo-Íscar (9571212) |
| dc.date.none.fl_str_mv | 2025-06-30T13:40:07Z |
| dc.identifier.none.fl_str_mv | 10.6084/m9.figshare.28931076.v2 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/dataset/Cellwise_outlier_detection_in_heterogeneous_populations/28931076 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Biotechnology Ecology Cancer Mathematical Sciences not elsewhere classified Information Systems not elsewhere classified Cellwise contamination EM algorithm Imputation Missing data Model-based clustering Robustness |
| dc.title.none.fl_str_mv | Cellwise Outlier Detection in Heterogeneous Populations |
| dc.type.none.fl_str_mv | Dataset info:eu-repo/semantics/publishedVersion dataset |
| description | <p>Real-world applications may be affected by outlying values. In the model-based clustering literature, several methodologies have been proposed to detect units that deviate from the majority of the data (rowwise outliers) and trim them from the parameter estimates. However, the discarded observations can encompass valuable information in some observed features. Following the more recent cellwise contamination paradigm, we introduce a Gaussian mixture model for cellwise outlier detection. The proposal is estimated via an Expectation-Maximization (EM) algorithm with an additional step for flagging the contaminated <i>cells</i> of a data matrix and then imputing—instead of discarding—them before the parameter estimation. This procedure adheres to the spirit of the EM algorithm by treating the contaminated cells as missing values. We analyze the performance of the proposed model in comparison with other existing methodologies through a simulation study with different scenarios and illustrate its potential use for clustering, outlier detection, and imputation on three real datasets. Additional applications include socio-economic studies, environmental analysis, healthcare, and any domain where the aim is to cluster data affected by missing information and outlying values within features.</p> |
| eu_rights_str_mv | openAccess |
| id | Manara_fcc03421f03fe792f5be2a973b8580c8 |
| identifier_str_mv | 10.6084/m9.figshare.28931076.v2 |
| network_acronym_str | Manara |
| network_name_str | ManaraRepo |
| oai_identifier_str | oai:figshare.com:article/28931076 |
| publishDate | 2025 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | Cellwise Outlier Detection in Heterogeneous PopulationsGiorgia Zaccaria (21245577)Luis A. García-Escudero (21245580)Francesca Greselin (14203066)Agustín Mayo-Íscar (9571212)BiotechnologyEcologyCancerMathematical Sciences not elsewhere classifiedInformation Systems not elsewhere classifiedCellwise contaminationEM algorithmImputationMissing dataModel-based clusteringRobustness<p>Real-world applications may be affected by outlying values. In the model-based clustering literature, several methodologies have been proposed to detect units that deviate from the majority of the data (rowwise outliers) and trim them from the parameter estimates. However, the discarded observations can encompass valuable information in some observed features. Following the more recent cellwise contamination paradigm, we introduce a Gaussian mixture model for cellwise outlier detection. The proposal is estimated via an Expectation-Maximization (EM) algorithm with an additional step for flagging the contaminated <i>cells</i> of a data matrix and then imputing—instead of discarding—them before the parameter estimation. This procedure adheres to the spirit of the EM algorithm by treating the contaminated cells as missing values. We analyze the performance of the proposed model in comparison with other existing methodologies through a simulation study with different scenarios and illustrate its potential use for clustering, outlier detection, and imputation on three real datasets. Additional applications include socio-economic studies, environmental analysis, healthcare, and any domain where the aim is to cluster data affected by missing information and outlying values within features.</p>2025-06-30T13:40:07ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.6084/m9.figshare.28931076.v2https://figshare.com/articles/dataset/Cellwise_outlier_detection_in_heterogeneous_populations/28931076CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/289310762025-06-30T13:40:07Z |
| spellingShingle | Cellwise Outlier Detection in Heterogeneous Populations Giorgia Zaccaria (21245577) Biotechnology Ecology Cancer Mathematical Sciences not elsewhere classified Information Systems not elsewhere classified Cellwise contamination EM algorithm Imputation Missing data Model-based clustering Robustness |
| status_str | publishedVersion |
| title | Cellwise Outlier Detection in Heterogeneous Populations |
| title_full | Cellwise Outlier Detection in Heterogeneous Populations |
| title_fullStr | Cellwise Outlier Detection in Heterogeneous Populations |
| title_full_unstemmed | Cellwise Outlier Detection in Heterogeneous Populations |
| title_short | Cellwise Outlier Detection in Heterogeneous Populations |
| title_sort | Cellwise Outlier Detection in Heterogeneous Populations |
| topic | Biotechnology Ecology Cancer Mathematical Sciences not elsewhere classified Information Systems not elsewhere classified Cellwise contamination EM algorithm Imputation Missing data Model-based clustering Robustness |