A three stage unsupervised learning pipeline for data analysis.

<p>The first stage involves data preprocessing, where the dataset undergoes standardization, normalization, and cleaning procedures to address missing values and noise. The second stage consists of sequential dimensionality reduction by PCA to identify principal components to capture global da...

Full description

Saved in:
Bibliographic Details
Main Author: Alberto García-Rodríguez (18019079) (author)
Other Authors: Matias Núñez (20868478) (author), Miguel Robles Pérez (20868481) (author), Tzipe Govezensky (267401) (author), Rafael A Barrio (20868484) (author), Carlos Gershenson (215221) (author), Kimmo K Kaski (20868487) (author), Julia Tagüeña (20868490) (author)
Published: 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:<p>The first stage involves data preprocessing, where the dataset undergoes standardization, normalization, and cleaning procedures to address missing values and noise. The second stage consists of sequential dimensionality reduction by PCA to identify principal components to capture global data structure, followed by t-SNE which preserves local relationships in the reduced dimensional space. The final stage applies a clustering algorithm (DBSCAN) to identify distinct groups within the processed data. Countries in each cluster are mapped and the mean trajectories towards ideal scores are calculated. Arrows between the stages indicate the sequential flow of data through the pipeline, with the output of each stage serving as the input for the subsequent stage.</p>