Comprehensive evaluation of data preprocessing and visualization techniques for enhanced classification and sampling

dc.authorid0000-0002-4353-1261
dc.authorid0000-0002-4049-0716
dc.authorid0000-0002-2073-8956
dc.authorid0000-0003-1848-427X
dc.contributor.authorDagal, Idriss
dc.contributor.authorHarrison, Ambe
dc.contributor.authorIbrahim, AL-Wesabi
dc.contributor.authorMbasso, Wulfran Fendzi
dc.date.accessioned2026-01-31T15:08:11Z
dc.date.available2026-01-31T15:08:11Z
dc.date.issued2025
dc.departmentİstanbul Beykent Üniversitesi
dc.description.abstractEffective representation and visualization of data are critical components of data analysis, particularly in classification tasks. This paper presents a comprehensive evaluation of various functions employed in data preprocessing and visualization, emphasizing their roles in enhancing data representation, facilitating classification, and optimizing sampling techniques. We explore the Jitter function, which mitigates overplotting in visualizations by introducing small random variations to data points, thereby improving clarity in the depiction of class distributions. The hexagonal binning function aggregates data into hexagonal grids, enabling the identification of density patterns and enhancing the understanding of class separability in two-dimensional space. The center function is examined for its utility in computing centroids of data clusters, aiding in visualizing class distributions and enhancing clustering algorithms. Additionally, we investigate the swarm function, which serves dual purposes as an optimization technique in particle swarm optimization for feature selection and as a visualization tool to illustrate data point distributions without overlap. The random function is discussed for its role in generating synthetic datasets and initializing parameters, crucial for achieving balanced and representative training samples. Lastly, the square function is evaluated for its application in distance calculations and error metrics, essential for assessing model performance in classification tasks. The experimental results reveal that the random function consistently shows the highest means and variability across most distributions, while the center function, despite exhibiting lower means, demonstrates higher variability (CV) and entropy, indicating greater uncertainty. Conversely, the Jitter function displays lower means and variances, typically exhibiting more predictability and less uncertainty. This comprehensive evaluation highlights the importance of these functions in preprocessing and visualizing data, ultimately contributing to improved classification outcomes and enhanced interpretability of data-driven insights.
dc.description.sponsorshipBeykent University
dc.description.sponsorshipThe author declared that this work does not receive any funding.
dc.identifier.doi10.1007/s10586-025-05512-9
dc.identifier.issn1386-7857
dc.identifier.issn1573-7543
dc.identifier.issue7
dc.identifier.scopus2-s2.0-105013185247
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org./10.1007/s10586-025-05512-9
dc.identifier.urihttps://hdl.handle.net/20.500.12662/10614
dc.identifier.volume28
dc.identifier.wosWOS:001548407400002
dc.identifier.wosqualityQ1
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherSpringer
dc.relation.ispartofCluster Computing-The Journal of Networks Software Tools And Applications
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_WoS_20260128
dc.subjectData representation
dc.subjectData visualization
dc.subjectClassification
dc.subjectError metrics
dc.subjectOverplotting
dc.titleComprehensive evaluation of data preprocessing and visualization techniques for enhanced classification and sampling
dc.typeArticle

Dosyalar