Assessing the Discriminatory Power of Leaf Morphometric Traits in Capsicum annuum by means of a Linear Discriminant Analysis

E.M.B.M. Karunathilake; Jinhyun Ahn; Piya Kittipadakul; Supachai Vuttipongchaikij; Hyun Jo

doi:10.7235/HORT.20260011

Preview

Research Article

Horticultural Science and Technology. 30 June 2026. 304-318
https://doi.org/10.7235/HORT.20260011

Assessing the Discriminatory Power of Leaf Morphometric Traits in Capsicum annuum by means of a Linear Discriminant Analysis

E.M.B.M. Karunathilake¹^†

Jinhyun Ahn²^†

Piya Kittipadakul³

Supachai Vuttipongchaikij⁴

Hyun Jo⁵^*

¹Department of Plant Resources and Environment, Jeju National University, Jeju 63243, Korea

²Department of Management Information Systems, Jeju National University, Jeju 63243, Korea

³Department of Agronomy, Faculty of Agriculture, Kasetsart University, Bangkok 10900, Thailand

⁴Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand

⁵Department of Applied Biosciences, Kyungpook National University, Daegu 41566, Korea

^{*Corresponding Author}

^{†These authors contributed equally to this work.}

License (open-access, https://creativecommons.org/licenses/by-nc/4.0/):

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

ABSTRACT

Leaf morphometric traits are widely utilized to identify plants, yet their effectiveness when used to discriminate large Capsicum collections remains unclear. This study assessed whether eight leaf size and shape traits could reliably classify 371 and 352 C. annuum accessions collected in 2021 and 2022. Comprehensive diagnostic testing revealed major violations of the assumptions required for linear discriminant analysis (LDA), including non-normal trait distributions, unequal covariance matrices, and severe multicollinearity among predictors. Shape descriptors further exhibited nonlinear dependencies incompatible with LDA. As a result, LDA models produced unstable discriminant functions and extremely low agreement between predicted and true accession labels. Given these limitations, unsupervised methods were employed. K-means clustering identified eight phenotypic groups in 2021 and three in 2022, and a principal component analysis (PCA) supported these patterns, with the first two components capturing the primary axes of the leaf size and shape variation. However, strong overlap among accessions in the PCA space indicated weak morphological differentiation, likely reflecting strong environmental influence and limited genetic divergence with regard to the leaf form. These findings indicate that basic leaf morphometric traits alone are insufficient for accession-level classification in C. annuum. Robust discrimination will require integrative phenotyping frameworks that combine geometric, physiological, biochemical, and genomic traits, supported by measurement technologies.

Keywords

chili pepper

genetic resources

k-mean cluster

leaf imagery

leaf morphology

principal component analysis

MAIN

Introduction
Materials and Methods
Leaf sampling
Leaf image acquisition
Image processing and trait extraction
Statistical analysis
Results
Discussion

Introduction

Global agriculture faces increasing pressure to enhance food security while adapting to rapid climate change (Wen et al. 2021). This challenge demands germplasm characterization methods that are rapid, cost effective, and non-destructive to support the development of resilient crop varieties (Mansoor et al. 2024; Baloch et al. 2025). Leaf morphometric traits are an appealing phenotyping resource because they are stable, easy to measure, and widely applied in the fields of plant biology, agronomy, paleobotany, and taxonomy (De Luna-Bonilla et al. 2024; Park et al. 2025). These traits capture variations in the leaf size, shape and structural complexity, providing essential insights into the photosynthetic efficiency, transpiration, stress tolerance and development patterns (El‐Hendawy et al. 2007; Dong et al. 2020; Thai et al. 2025; Toprak and Coşkun 2025). They are also influenced by environmental factors, such as the climate, geology, altitude and soil conditions, making them sensitive indicators of ecological variations (Cornelissen et al. 2003; Marron et al. 2007; Royer 2012). Leaf morphometry has been successfully employed for cultivar identification in crops such as grapevines and for reconstructing paleoenvironments from fossil leaves (Royer 2012; Chitwood et al. 2014).

Despite its broad utility, the potential of leaf morphometric descriptors if used to discriminate Capsicum accessions remains underexplored (Nishani et al. 2025). Chili leaves generally exhibit a consistent ovate-lanceolate shape with smooth margins and an acute apex, suggesting that subtle quantitative variations rather than gross morphological differences must be leveraged for phenotypic discrimination (Idrees et al. 2020). To evaluate such subtle variations, statistical and machine leaning tools, including clustering, principal component analysis (PCA), random forest (RF), and discriminant analysis are commonly used for morphometric classification (Al Hiary et al. 2011; Graf et al. 2024). Among these approaches, linear discriminant analysis (LDA) is one of the most widely used supervised techniques for dimensionality reduction and group discrimination. LDA maximizes the separation between predefined groups and identifies traits contributing most strongly to the class structure (Gardner-Lubbe 2021; Jameson 2024).

In this study, we evaluated eight quantitative leaf morphometric descriptors to examine ability to discriminate Capsicumannum accessions effectively. We hypothesized that leaf morphometric traits alone are insufficient to discriminate among a large number of C. annum accessions reliably if using LDA. Our dataset includes over 350 accessions, each intended to be a unique class/group for discrimination. In particular, the suitability of LDA as a classification method for chili germplasm based solely on leaf morphometric traits was assessed to gain insight into the development of a prediction method. Our objectives were to determine whether LDA can reliably separate accessions using these traits and to evaluate its potential as a rapid, nondestructive phenotyping approach for chili germplasm characterization.

Materials and Methods

Leaf sampling

In 2021 and 2022, leaf samples were collected from 371 and 357 Capsicum annuum accessions, respectively, from mid-June through the fruiting period for image-based phenotyping. All accessions were cultivated under the standardized management practices recommended by the National Seed Resources (https://www.seed.go.kr/sites/seed/index.do), with cultivation taking place at the Rural Development Administration (RDA), Deokjin-gu, Jeonju-si, Jeonbuk-do, Republic of Korea (35°49'51.6"N 127°03'46.0"E) following uniform schedules for irrigation, weeding, and general field management to ensure consistent growing conditions across genotypes.

For each accession, six fully expanded leaves free from visible pathogen symptoms, herbivore damage, or substantial epiphyll coverage were collected. The petiole was included in the leaf boundary for all subsequent image analysis and trait extraction steps. All samples were obtained from plants exposed to full sunlight to minimize leaf morphology variations associated with shading. Leaf sampling was conducted during daytime under natural field conditions. Immediately after collection, leaves were placed in plastic bags without additional moisture and transported indoors to minimize moisture loss and prevent deformation prior to imaging.

Leaf image acquisition

Leaf imaging was performed in an indoor studio (800 × 800 × 800 mm) equipped with an 18 W white LED light source (5600 K; CN-T96, Plastic, Republic of Korea) that provided stable and uniform illumination, thereby minimizing shadows and optical distortion. Images were captured using a Canon EOS D200II digital camera (Canon, Japan) fitted with an EF-S 18 - 55 mm lens and a 24.1-megapixel CMOS sensor. The camera settings included an exposure time of 1/25 s. Because convex lenses are prone to central thickening that can introduce geometric distortion, the camera’s built-in distortion correction function was activated to reduce measurement errors. Leaves were placed on a custom-made white background plate to ensure strong contrast with the leaf surface. All samples were arranged flat, with no overlapping and with the adaxial surface facing upward. Imaging was conducted under controlled lighting conditions, and the camera was positioned at a fixed distance to maintain a consistent scale across all accessions.

Image processing and trait extraction

Leaf image processing was conducted using ImageJ software (National Institutes of Health, USA). All images were initially inspected for clarity and proper leaf placement. Subsequently, the contrast of each image was adjusted when necessary to ensure clear separation between the leaf surface and the background. A uniform thresholding procedure was applied to segment the leaf from the background, followed by binary conversion. Any minor artifacts or noise were removed using the “Outside,” “Fill Holes” and “Despeckle” functions to ensure accurate shape reconstruction. For each leaf, the outline was extracted using the “Analyze Particles” tool with standardized size and shape criteria to avoid any accidental selection of non-leaf objects. The scale for each image was calibrated using the pixel resolution prior to measurement. All trait extractions (Table 1) were performed using ImageJ’s built-in measurement suite. Eight quantitative leaf morphometric descriptors—leaf area, perimeter, circularity, leaf length, leaf width, aspect ratio, roundness, and solidity—were used to evaluate their ability to discriminate effectively among Capsicum annuum accessions (Table 1 and Fig. 1).

Table 1.

Morphometric traits obtained from chili (Capsicum annuum) accessions

Trait	Description		Formula	References
Leaf length	Maximum distance along the leaf’s major axis	cm	L = Major axis length	Park et al. 2025
Leaf width	Maximum distance perpendicular to the major axis	cm	W = Minor axis length	Park et al. 2025
Leaf Area	Total projected area of the leaf surface	cm²	A = N_pixels × (pixel size)² ; N ‒ Number of Pixels	Bankhead 2025
Perimeter	Length of the leaf boundary	cm	P = N_{boundary pixels}× pixel size ; N ‒ Number of Pixels	Bankhead 2025
Aspect Ratio	Ratio of major to minor axis lengths, representing leaf elongation	No unit	AR = L/W	Fanourakis et al. 2021
Circularity	Parameter that describes how closely a leaf’s shape resembles a perfect circle	No unit	Circularity = 4π×A/P²	Fanourakis et al. 2021
Solidity	Computed as the ratio of leaf area to its convex hull area, indicating leaf margin smoothness or the degree of lobing	No unit	Solidity = A/A_{convex hull}	Fanourakis et al. 2021
Roundness	Defined as the ratio of the minor axis to the major axis, describing the overall compactness of the leaf	No unit	Roundness = W/L	Fanourakis et al. 2021

https://cdn.apub.kr/journalsite/sites/kshs/2026-044-03/N020260011/images/HST_20260011_F1.jpg

Fig. 1.

Chili leaf morphometric parameters related to size and shape.

Statistical analysis

All analyses were performed in R v4.5.1 using RStudio (integrated development environment for R, RStudio Inc., USA). The dataset was initially examined for missing entries, duplicated records, and apparent data-entry errors. Outliers were identified using the interquartile range (IQR) criterion and were removed to enhance the robustness of the subsequent analyses. Descriptive statistics (mean, median, and range) were calculated to summarize the dataset. Data distributions were evaluated using the Shapiro-Wilk test for normality, alongside skewness and kurtosis, and Levene’s test for homogeneity of variance. The performance of LDA depends on key assumptions, normally distributed predictors, homogeneity of variance-covariance matrices across groups, linear class boundaries and low multicollinearity among features (Jameson 2024). As chili leaf morphometric variables deviated significantly from a Gaussian distribution and exhibited substantial covariance heterogeneity (violating key LDA assumptions), and given the analytical intractability of classifying over 350 individual accessions with only eight highly correlated predictors, the following strategy was adopted: standardization and PCA, cluster definition (grouping factor), and discriminant analysis.

First, leaf morphometric data were standardized (z-scored) and Pearson correlation was used to construct the covariance matrix. PCA was then performed on the standardized data to capture major phenotypic variations and address multicollinearity. Second, k-means clustering was applied to the principal component (PC) scores to define broad phenotypic groups instead of using individual accessions as classes. The optimal number of clusters (k) was determined using the silhouette method, which yielded k = 8 clusters for the 2021 data and k = 3 clusters for the 2022 data. These clusters served as the predefined groups for the subsequent supervised analysis. Finally, Fisher’s linear discriminant analysis (LDA) was conducted using the PC as predictors and the derived k-means clusters as the classification factor, an approach more robust when data are non-normal, highly correlated, or exhibit unequal covariance structures.

Results

Descriptive statistics of leaf morphological traits revealed clear differences between the 2021 and 2022 seasons (Table 2). Leaves were generally larger in 2021, with a higher mean leaf area (0.589 cm²) compared to 2022 (0.510 cm²). The maximum leaf area also differed substantially between the years, reaching 3.066 cm² in 2021, whereas it was 2.185 cm² in 2022. Leaf width followed this pattern, whereas leaf length was slightly greater in 2022, indicating that leaves in 2021 were broader while those in 2022 tended to be longer and narrower. Additionally, leaf perimeter values were greater in 2022 than in 2021, suggesting that although leaves were narrower in 2022, they had more elongated or complex margins.

Table 2.

Summary statistics of chili leaf morphometric data

	2021								2022
	Area (cm²)	Perimeter (cm)	Circularity	Length (cm)	Width (cm)	Aspect ratio	Roundness	Solidity	Area (cm²)	Perimeter (cm)	Circularity	Length (cm)	Width (cm)	Aspect ratio	Roundness	Solidity
Minimum	0.042	0.002	0.200	0.676	0.147	1.200	0.300	0.600	0.043	1.514	0.147	0.679	0.128	1.498	0.215	0.430
1st quartile	0.402	0.004	0.400	1.415	0.538	1.900	0.500	0.800	0.346	3.986	0.238	1.690	0.480	2.260	0.341	0.659
Median	0.529	0.005	0.500	1.585	0.633	2.000	0.500	0.900	0.457	4.660	0.268	1.976	0.567	2.550	0.392	0.694
Mean	0.589	0.005	0.457	1.642	0.651	2.068	0.493	0.861	0.510	4.755	0.274	2.004	0.585	2.616	0.396	0.694
3rd quartile	0.711	0.005	0.500	1.831	0.748	2.200	0.500	0.900	0.633	5.464	0.303	2.310	0.672	2.929	0.443	0.729
Maximum	3.066	0.006	0.600	3.045	1.707	4.000	0.800	1.000	2.185	7.638	0.574	3.162	1.441	4.650	0.668	0.936

Shape descriptors (circularity, aspect ratio, roundness, and solidity) also supported these differences. Circularity and roundness values were higher in 2021 than in 2022 (Table 2), indicating more circular and broader leaf shapes in 2021. In contrast, leaves in 2022 were more elongated and less circular. The aspect ratio increased from 2.07 in 2021 to 2.62 in 2022, confirming that the 2022 foliage was more elongated. Solidity was slightly lower in 2022, suggesting more irregular or serrated leaf margins.

These results showed that both datasets fail to satisfy the key assumptions of LDA (Table 3, Figs. 2 and 3). All leaf morphometric traits deviated strongly from normality (Shapiro–Wilk, p < 0.001) with evident skewness and leptokurtosis, and variances were significantly heterogeneous across the groups (Levene’s test, p < 0.001). The results of the correlation analysis in Fig. 4 reveal pronounced multicollinearity in both years, with the area, perimeter, length, and width exhibiting very strong positive associations (r ≥ 0.9 in 2021; r > 0.8 in 2022). Additionally, circularity and solidity were highly correlated in both years (r > 0.7), and circularity showed a strong positive correlation with roundness in 2022 (r = 0.81). Conversely, the aspect ratio displayed strong negative correlations with roundness (r < ‒0.8), reflecting inherent geometric redundancy. The overall covariance structure was nearly identical between the seasons, indicating that both datasets contain tightly coupled shape and size descriptors that violate the independence assumptions required for LDA.

Table 3.

Summary of statistical tests assessing normality, skewness, kurtosis, and homogeneity of covariance among chili leaf traits

		Shapiro-Wilk test		Skewness	Kurtosis	Levene’s vest
		W	p-value	Skewness	Kurtosis	Degree of freedom	F-value	p-value
2021	Area (cm²)	0.9083	***	1.4800	7.3406	370	4.3210	***
	Perimeter (cm)	0.9654	***	0.7727	4.0441	370	3.1598	***
	Circularity (cm)	0.7811	***	‒0.7534	3.8866	370	3.1068	***
	Length (cm)	0.9635	***	0.7943	4.0629	370	3.1061	***
	Width (cm)	0.9795	***	0.6100	4.3488	370	2.8956	***
	Aspect ratio	0.9122	***	1.4182	7.6798	370	2.3150	***
	Roundness	0.8275	***	0.0024	3.6812	370	3.4629	***
	Solidity	0.6609	***	‒1.0629	3.2116	370	3.6115	***
2022	Area (cm²)	0.8900	***	1.7637	9.5541	356	2.2052	***
	Perimeter (cm)	0.9925	***	0.2376	2.7700	356	1.9271	***
	Circularity (cm)	0.9587	***	0.9445	5.1034	356	1.7892	***
	Length (cm)	0.9946	***	0.1344	2.6651	356	1.7488	***
	Width (cm)	0.9554	***	0.9851	5.7592	356	1.7727	***
	Aspect ratio	0.9745	***	0.6631	3.5735	356	1.6034	***
	Roundness	0.9887	***	0.4178	3.1396	356	1.6066	***
	Solidity	0.9932	***	‒0.0301	3.9542	356	1.6122	***

***significant at p-value < 0.001.

https://cdn.apub.kr/journalsite/sites/kshs/2026-044-03/N020260011/images/HST_20260011_F2.jpg

Fig. 2.

Histograms and QQ plots showing the distribution and normality of chili leaf morphometric parameters for the 2021 dataset.

https://cdn.apub.kr/journalsite/sites/kshs/2026-044-03/N020260011/images/HST_20260011_F3.jpg

Fig. 3.

Histograms and QQ plots showing the distribution and normality of chili leaf morphometric parameters for the 2022 dataset.

https://cdn.apub.kr/journalsite/sites/kshs/2026-044-03/N020260011/images/HST_20260011_F4.jpg

Fig. 4.

Correlation analysis of chili leaf morphometric parameters in 2021 (A) and 2022 (B).

Given the aforementioned failures to satisfy the normality, equal covariance, or low multicollinearity criteria assumptions of LDA and considering the very large number of accessions (371 accessions in 2021; 357 accessions in 2022) combined with substantial overlap among individuals, it is unrealistic for LDA to discriminate more than 350 classes using only eight correlated quantitative predictors. To address these constraints, Fisher’s linear discriminant analysis was applied using principal components (PCs) derived from standardized data as predictors, with k-means clusters serving as the grouping factor (Fig. 5). This approach is more robust when data are non-normal, highly correlated, or exhibit unequal covariance structures (Qu and Pei 2024). The PCA indicated that the first four PCs captured more than 97% of the total variance in 2021, while the first three PCs accounted for over 97% of the variance in 2022 (Table 4, Fig. 6C and 6D). For both years, PC1 was strongly influenced by nearly all traits except for the aspect ratio, whereas PC2 reflected opposing contributions, with the area, perimeter, length, width, and aspect ratio loaded in one direction and the circularity, solidity, and roundness loaded in the opposite, highlighting a clear separation between size- and shape-related descriptors (Table 4, Fig. 6A and 6B).

https://cdn.apub.kr/journalsite/sites/kshs/2026-044-03/N020260011/images/HST_20260011_F5.jpg

Fig. 5.

K-means clustering plots with optimal number of clusters for chili leaf data of 2021 (A) and 2022 (B)

Table 4.

Principal component analysis with chili leaf traits across two years

	2021								2022
	PC1	PC2	PC3	PC4	PC5	PC6	PC7	PC8	PC1	PC2	PC3	PC4	PC5	PC6	PC7	PC8
Area	0.4406	‒0.1926	‒0.1381	0.0539	0.1778	0.6455	0.5435	‒0.0544	0.4517	‒0.1959	‒0.1900	‒0.4366	‒0.4147	0.3374	0.0802	0.4885
Perimeter	0.4053	‒0.3437	‒0.0648	‒0.0194	‒0.0565	‒0.3242	0.0164	0.7775	0.3436	‒0.4018	‒0.0534	0.2737	0.0249	‒0.1365	0.7449	‒0.2619
Circularity	0.2510	0.5028	‒0.2602	0.7496	0.1340	‒0.1897	0.0175	0.0187	0.2718	0.4495	‒0.2249	‒0.4048	0.6703	0.0834	0.2264	‒0.0461
Leaf length	0.3802	‒0.3888	‒0.1123	‒0.0163	‒0.0765	‒0.5402	0.1323	‒0.6134	0.3035	‒0.4303	‒0.1864	0.3345	0.4918	‒0.1488	‒0.4244	0.3632
Leaf width	0.4652	‒0.0853	0.0137	0.0160	‒0.0211	0.3324	‒0.8059	‒0.1245	0.4786	‒0.1355	0.0549	‒0.2354	‒0.1192	‒0.0322	‒0.4484	‒0.6913
Aspect ratio	‒0.3212	‒0.3790	‒0.4076	0.0838	0.7378	‒0.0339	‒0.1856	0.0114	‒0.3578	‒0.3243	‒0.4378	‒0.4933	‒0.0210	‒0.5741	0.0230	‒0.0355
Roundness	0.3089	0.3453	0.4978	‒0.3235	0.6277	‒0.1903	0.0503	‒0.0098	0.3644	0.3321	0.4007	‒0.0310	‒0.1563	‒0.7039	0.0237	0.2740
Solidity	0.1381	0.4132	‒0.6945	‒0.5679	‒0.0514	‒0.0496	‒0.0084	0.0143	0.1483	0.4233	‒0.7214	0.3987	‒0.3117	‒0.1112	‒0.0720	‒0.0677
Importance of components:
Standard deviation	2.1201	1.4553	1.0078	0.4401	0.3364	0.2329	0.0926	0.0455	2.0148	1.7704	0.7969	0.2945	0.2100	0.1544	0.0998	0.0815
Proportion of variance	0.5618	0.2647	0.1270	0.0242	0.0141	0.0068	0.0011	0.0003	0.5074	0.3918	0.0794	0.0108	0.0055	0.0030	0.0012	0.0008
Cumulative proportion	0.5618	0.8266	0.9535	0.9777	0.9919	0.9987	0.9997	1.0000	0.5074	0.8992	0.9786	0.9894	0.9950	0.9979	0.9992	1.0000

https://cdn.apub.kr/journalsite/sites/kshs/2026-044-03/N020260011/images/HST_20260011_F6.jpg

Fig. 6.

Principal component analysis (PCA) and cluster visualization of chili leaf data. PCA biplots showing the contributions of leaf traits for 2021 (A) and 2022 (B). Scree plots illustrating the variance explained by principal components in 2021 (C) and 2022 (D).

The LDA revealed that most discriminatory power was concentrated in the first linear discriminant (LD1) for both years, explaining 63.26% of the variation in 2021 and 66.71% in 2022, followed by LD2 (23.59% in 2021; 23.19% in 2022) and LD3, which accounted for only minor proportions (13.15% in 2021 and 10.10% in 2022) (Table 5). Therefore, the first two LDs are sufficient for visualizing the separation of clusters (Fig. 7). In 2021, LD1 was strongly influenced by PC1 (loading = ‒1.26) and moderately by PC3 (‒0.51), indicating that the primary separation among clusters was driven by size-related traits. LD2 was dominated by PC2 (1.23), while LD3 was mainly defined by PC3 (1.45), reflecting secondary contributions from shape descriptors. In contrast, the 2022 dataset showed a reversed direction for PC1 on LD1 (1.47), suggesting a shift in how size traits contributed to group separation, while PC2 loaded negatively and strongly on LD2 (‒1.03). LD3 in 2022 was heavily influenced by PC3 (‒1.78), indicating that fine-scale shape variations contributed most to the third discriminant axis.

https://cdn.apub.kr/journalsite/sites/kshs/2026-044-03/N020260011/images/HST_20260011_F7.jpg

Fig. 7.

Linear discriminant analysis (LDA) plots of clustered data. LDA plots visualize the separation of eight identified clusters 2021: (A) three identified clusters 2022 (B), based on their projection onto the first two linear discriminant functions (LD1 and LD2).

The LDA models showed very low classification performance for both years (Table 5). The overall accuracy rates from the hold-out test were only 7.8% in 2021 and 8.8% in 2022, far below what would be expected by chance when classifying several numbers of clusters. Cross-validation results confirmed this pattern. Leave-one-out cross-validation (LOOCV) produced accuracy rates of 8.9% in 2021 and 11.4% in 2022, while 10-fold cross-validation yielded similarly low rates (8.4% in 2021 and 9.5% in 2022). The corresponding Kappa values (0.078 in 202 and 0.093 in 2022) were close to zero, indicating almost no agreement between predicted and true groups beyond random chance.

Table 5.

Discriminant function coefficients, proportions of trace and performance from the linear discriminant analysis (LDA)

	2021			2022
	LD1	LD2	LD3	LD1	LD2	LD3
PC1	‒1.2632	‒0.1371	‒0.1166	1.4680	‒0.2950	0.0199
PC2	‒0.2406	1.2314	‒0.1906	‒0.5266	‒1.0256	0.1940
PC3	‒0.5149	0.2884	1.4515	‒0.1787	‒0.5743	‒1.7758
Proportion of trace	0.6326	0.2359	0.1315	0.6671	0.2319	0.1010
Overall accuracy (hold-out split)	0.0777			0.0879
Leave-one-out cross-validation (LOOCV) accuracy	0.0893			0.1137
10-fold cross-validation (CV) accuracy	0.0836			0.0951
10-fold cross-validation (CV) kappa	0.0776			0.0929

PC, principal component; LD, linear discriminant.

Discussion

This study evaluated whether basic leaf morphometric traits can reliably support the classification of a large collection of C. annuum accessions and whether LDA is an appropriate predictive tool for such datasets. The results strongly support the hypothesis that they do not, as both statistical diagnostics and classification performance outcomes indicated that these traits lack the discriminatory power required for accession-level separation. Despite evaluating over 350 accessions, the models failed to achieve meaningful classification accuracy, indicating substantial overlap in the morphometric trait space among accessions.

The poor performance of LDA can be attributed to strong violations of its underlying assumptions, specifically non-normal trait distributions, covariance heterogeneity, and high multicollinearity among descriptors. Moreover, the large number of classes combined with limited samples per accession contributed to unstable discriminant functions and weak predictive performance (Jensen 2018; Lapanowski and Gaynanova 2020; Gardner-Lubbe 2021; Ali et al. 2022; Qu and Pei 2024). These conditions led to extensive overlap among groups, confirming that the selected traits do not provide a sufficient signal for reliable classification under a linear framework.

From a biological perspective, the limited discriminatory capacity of these traits reflects the inherently conserved and environmentally plastic nature of leaf morphology in C. annuum. Traits such as leaf area, length, width, and perimeter primarily describe vegetative growth and resource acquisition strategies, which are strongly influenced by environmental conditions rather than genotype alone (Marron et al. 2007; Nishani et al. 2025). For instance, larger and broader leaves, as observed in 2021, are typically associated with favorable growing conditions that promote photosynthetic capacity, whereas narrower and more elongated leaves, as observed in 2022, may represent adaptive responses to environmental stress, such as higher temperatures (Fig. 8) or reduced water availability. These environmentally driven responses are consistent with previous findings in Capsicum and other plant species, where the leaf morphology exhibits substantial phenotypic plasticity (Royer 2012; Romero-Higareda et al. 2022).

https://cdn.apub.kr/journalsite/sites/kshs/2026-044-03/N020260011/images/HST_20260011_F8.jpg

Fig. 8.

Daily maximum and minimum air temperatures (°C) recorded in Jeonju, South Korea (study location) during the chili (Capsicum annuum) growing period from March to June in (A) 2021 and (B) 2022. These data represent environmental conditions during the vegetative and early fruiting stages and provide context for the observed variations in leaf morphological traits (Source: https://www.timeanddate.com/weather/south-korea/jeonju).

The structure revealed by PCA and k-means clustering further supports this interpretation. Variations were primarily organized along axes representing size-related and shape-related traits, reflecting coordinated developmental processes rather than accession-specific differences. The strong multicollinearity observed among size-related traits (area, perimeter, length, and width) indicates a tightly integrated growth module, while shape descriptors such as circularity, roundness, and solidity represent a secondary module associated with leaf geometry. Although the aspect ratio contributed independently to shape variations, the overall morphospace remained highly overlapped among accessions. This suggests that the leaf morphology in C. annuum is governed more by general developmental and environmental gradients than by distinct genetic signatures (Baumgartner et al. 2020; Yang et al. 2025). Moreover, the relatively simple leaf architecture of chili, typically ovate to lanceolate and lacking pronounced lobes or serrations, further constrains the discriminatory potential of basic geometric descriptors (Royer 2012; Idrees et al. 2020; Fanourakis et al. 2021).

From a practical and horticultural perspective, these findings highlight an important limitation for germplasm characterization and breeding applications. Although leaf morphometric traits are easy to measure and well suited for high-throughput phenotyping, they do not adequately capture variations in economically important traits such as fruit morphology, yield, pungency, or resistance to pests and diseases (Poljak et al. 2024; Zhao et al. 2026). These agriculturally valuable characteristics are typically governed by more complex genetic, physiological, and biochemical processes and therefore require integrative phenotyping approaches. Previous studies have consistently demonstrated that fruit traits, biochemical profiles, and genomic markers provide stronger discriminatory power at the cultivar and accession level in Capsicum (Jones et al. 2011; Tripodi and Greco 2018; Alvares Bianchi et al. 2020; Hong et al. 2020; Lozada et al. 2023).

To improve classification performance capabilities, future studies should incorporate phenotypic descriptors with higher resolutions. Advanced morphometric approaches such as elliptic Fourier analysis (EFA) can capture detailed leaf boundary complexities and have been shown to outperform simple geometric traits with regard to cultivar discrimination (Viáfara-Vega et al. 2025). In addition, traits related to the venation architecture, stomatal characteristics, leaf texture, and spectral or colorimetric properties may provide more robust and genotype-specific signals (Chitwood et al. 2014; Thai et al. 2025). These features are increasingly accessible through modern high-throughput phenotyping platforms, including imaging systems and low-cost smartphone-based tools, enabling rapid and nondestructive data acquisition even in large germplasm collections (Tomaszewski and Kołakowski 2023; Park et al. 2025; Thai et al. 2025).

The limitations observed in this study do not invalidate LDA as a classification method but rather emphasize its dependence on an appropriate data structure. LDA remains effective when trait distributions approximate normality, covariance structures are homogeneous, and class boundaries are linearly separable. However, in datasets characterized by high class numbers, limited replication, non-linear trait relationships, and substantial overlap among groups, more flexible classifiers such as random forests, support vector machines, or other non-parametric approaches may yield improved predictive performance (Breiman 2001; Graf et al. 2024; Uniyal 2024). In this study, LDA was intentionally selected as a baseline model to evaluate the intrinsic discriminatory capacity of plant traits under a linear framework.

This study demonstrates that basic leaf morphometric traits are insufficient for reliable accession-level discrimination in C. annuum, thereby supporting the initial hypothesis. While these traits effectively describe general patterns of vegetative growth and environmental responses, they lack the specificity required for germplasm classification. Consequently, reliance on simple leaf descriptors alone in high-throughput phenotyping pipelines may lead to limited or misleading classification outcomes. Future research should integrate detailed morphological, physiological, and genomic data to enhance discriminatory power and improve the effectiveness of phenotyping approaches for better breeding and germplasm management.

Acknowledgements

This research was supported by Kyungpook National University Research Fund, 2023.

References

Al Hiary H, Bani Ahmad S, Reyalat M, Braik M, ALRahamneh Z (2011) Fast and accurate detection and classification of plant diseases. Int J Comput Appl 17:31-38. https://doi.org/10.5120/2183-2754

10.5120/2183-2754

Ali S, Hassan M, Kim JY, Farid MI, Sanaullah M, Mufti H (2022) FF-PCA-LDA: Intelligent feature fusion based PCA-LDA classification system for plant leaf diseases. Appl Sci 12:3514. https://doi.org/10.3390/app12073514

10.3390/app12073514

Alvares Bianchi P, Renata Almeida Da Silva L, André Da Silva Alencar A, Henrique Araújo Diniz Santos P, Pimenta S, Pombo Sudré C, Erpen-Dalla Corte L, Simões Azeredo Gonçalves L, Rodrigues R (2020) Biomorphological characterization of brazilian capsicum chinense Jacq. germplasm. Agronomy 10:447. https://doi.org/10.3390/agronomy10030447

10.3390/agronomy10030447

Baloch FS, Lee SM, Mansoor S, Morales A, Karunathilake EMBM, Nadeem MA, Cavagnaro PF, Chung YS (2025) GBS-derived SNP and SilicoDArT markers reveals the genetic variation and population structure of Korean buckwheat (Fagopyrum esculentum) an underutilised crop. BMC Plant Biol 25:1479. https://doi.org/10.1186/s12870-025-07633-0

10.1186/s12870-025-07633-041168792PMC12574297

Bankhead P (2025) Pixel size & dimensions. Introd. Bioimage Anal. URL https://bioimagebook.github.io/chapters/1-concepts/5-pixel_size/pixel_size.html (accessed 11.30.25).

Baumgartner A, Donahoo M, Chitwood DH, Peppe DJ (2020) The influences of environmental change and development on leaf shape in Vitis. Am J Bot 107:676-688. https://doi.org/10.1002/ajb2.1460

10.1002/ajb2.146032270876PMC7217169

Breiman L (2001) Random Forests. Mach Learn 45:5-32. https://doi.org/10.1023/A:1010933404324

10.1023/A:1010933404324

Chitwood DH, Ranjan A, Martinez CC, Headland LR, Thiem T, Kumar R, Covington MF, Hatcher T, Naylor DT, et al. (2014) A Modern Ampelography: A genetic basis for leaf shape and venation patterning in grape. PLANT Physiol 164:259-272. https://doi.org/10.1104/pp.113.229708

10.1104/pp.113.22970824285849PMC3875807

Cornelissen JHC, Lavorel S, Garnier E, Díaz S, Buchmann N, Gurvich DE, Reich PB, Steege HT, Morgan HD, et al. (2003) A handbook of protocols for standardised and easy measurement of plant functional traits worldwide. Aust J Bot 51:335-380. https://doi.org/10.1071/BT02124

10.1071/BT02124

De Luna-Bonilla OÁ, Valencia-Á S, Ibarra-Manríquez G, Morales-Saldaña S, Tovar-Sánchez E, González-Rodríguez A (2024) Leaf morphometric analysis and potential distribution modelling contribute to taxonomic differentiation in the Quercus microphylla complex. J Plant Res 137:3-19. https://doi.org/10.1007/s10265-023-01495-z

10.1007/s10265-023-01495-z37740854PMC10764464

Dong N, Prentice IC, Wright IJ, Evans BJ, Togashi HF, Caddy‐Retalic S, McInerney FA, Sparrow B, Leitch E, et al. (2020) Components of leaf‐trait variation along environmental gradients. New Phytol 228:82-94. https://doi.org/10.1111/nph.16558

10.1111/nph.16558

El‐Hendawy SE, Hu Y, Schmidhalter U (2007) Assessing the Suitability of Various Physiological Traits to Screen Wheat Genotypes for Salt Tolerance. J Integr Plant Biol 49:1352-1360. https://doi.org/10.1111/j.1744-7909.2007.00533.x

10.1111/j.1744-7909.2007.00533.x

Fanourakis D, Kazakos F, Nektarios PA (2021) Allometric Individual Leaf Area Estimation in Chrysanthemum. Agronomy 11:795. https://doi.org/10.3390/agronomy11040795

10.3390/agronomy11040795

Gardner-Lubbe S (2021) Linear discriminant analysis for multiple functional data analysis. J Appl Stat 48:1917-1933. https://doi.org/10.1080/02664763.2020.1780569

10.1080/02664763.2020.178056935706433PMC9042036

Graf R, Zeldovich M, Friedrich S (2024) Comparing linear discriminant analysis and supervised learning algorithms for binary classification—A method comparison study. Biom J 66:2200098. https://doi.org/10.1002/bimj.202200098

10.1002/bimj.202200098

Hong JP, Ro N, Lee HY, Kim GW, Kwon JK, Yamamoto E, Kang BC (2020) Genomic Selection for Prediction of Fruit-Related Traits in Pepper (Capsicum spp.). Front Plant Sci 11:570871. https://doi.org/10.3389/fpls.2020.570871

10.3389/fpls.2020.57087133193503PMC7655793

Idrees S, Hanif MA, Ayub MA, Hanif A, Ansari TM (2020) Chili Pepper, in: Medicinal Plants of South Asia. Elsevier, pp 113-124. https://doi.org/10.1016/B978-0-08-102659-5.00009-4

10.1016/B978-0-08-102659-5.00009-4

Jameson J (2024) What is Linear Discriminant Analysis (LDA)? Jacob C Jameson. URL https://jacobjameson.com/posts/2024-02-15-LDA/2024-02-15-LDA.html#:~:text=Assumptions%20of%20LDA,-LDA%20makes%20a&text=Normal%20Distribution:%20It%20assumes%20that,move%20away%20from%20the%20center. (accessed 10.20.25).

Jensen G (2018) AI: Weak AI vs Strong AI. Gravitron. URL https://www.gavinjensen.com/blog/2018/ai-weak-strong

Jones AMP, Ragone D, Aiona K, Lane WA, Murch SJ (2011) Nutritional and morphological diversity of breadfruit (Artocarpus, Moraceae): Identification of elite cultivars for food security. J Food Compos Anal 24:1091-1102. https://doi.org/10.1016/j.jfca.2011.04.002

10.1016/j.jfca.2011.04.002

Lapanowski AF, Gaynanova I (2020) Compressing Large Sample Data for Discriminant Analysis. https://doi.org/10.1109/BigData52589.2021.9671676

10.1109/BigData52589.2021.9671676

Lozada DN, Sandhu KS, Bhatta M (2023) Ridge regression and deep learning models for genome-wide selection of complex traits in New Mexican Chile peppers. BMC Genomic Data 24:80. https://doi.org/10.1186/s12863-023-01179-6

10.1186/s12863-023-01179-638110866PMC10726521

Mansoor S, Karunathilake EMBM, Tuan TT, Chung YS (2024) Genomics, phenomics, and machine learning in transforming plant research: Advancements and challenges. Hortic Plant J S2468014124000098. https://doi.org/10.1016/j.hpj.2023.09.005

10.1016/j.hpj.2023.09.005

Marron N, Dillen SY, Ceulemans R (2007) Evaluation of leaf traits for indirect selection of high yielding poplar hybrids. Environ Exp Bot 61:103-116. https://doi.org/10.1016/j.envexpbot.2007.04.002

10.1016/j.envexpbot.2007.04.002

Nishani YAR, Shyamalee HAPA, Ranawake AL (2025) Characterization and diversity analysis of underutilized Capsicum chinense L. accessions in sri lanka. Trop Agric Res Ext 28:73-91. https://doi.org/10.4038/tare.v28i2.5754

10.4038/tare.v28i2.5754

Park JE, Mansoor S, Ku K, Le AT, Tuan TT, Ko HC, Min OSS, Baloch FS, Chung YS (2025) Cost effective image-based phenotyping in Capsicum annuum germplasm for rapid assessment of traits. Plant Biotechnol Rep 19:181-196. https://doi.org/10.1007/s11816-025-00968-y

10.1007/s11816-025-00968-y

Poljak I, Vidaković A, Benić L, Tumpa K, Idžojtić M, Šatović Z (2024) Patterns of leaf and fruit morphological variation in marginal populations of Acer tataricum L. subsp. tataricum. Plants 13:320. https://doi.org/10.3390/plants13020320

10.3390/plants1302032038276777PMC10818317

Qu L, Pei Y (2024) A comprehensive review on discriminant analysis for addressing challenges of class-level limitations, small sample size, and robustness. Processes 12:1382. https://doi.org/10.3390/pr12071382

10.3390/pr12071382

Romero-Higareda CE, Hernández-Verdugo S, Pacheco-Olvera A, Núñez-Farfán J, Retes-Manjarrez E, López-Orona C, Osuna-Enciso T (2022) ttADAPTIVE PHNEOTYPIC plasticity of wild Capsicum annuum (Solanaceae) to variable environments of water-light availability. Acta Oecol 114:103807. https://doi.org/10.1016/j.actao.2021.103807

10.1016/j.actao.2021.103807

Royer DL (2012) Climate reconstruction from leaf size and shape: new developments and challenges. Paleontol Soc Pap 18:195-212. https://doi.org/10.1017/S1089332600002618

10.1017/S1089332600002618

Thai TT, Mansoor S, Van HT, Vu VG, Karunathilake EMBM, Le AT, Baloch FS, Chung YS, Kim J (2025) Advanced phenotyping features utilizing deep learning techniques for automated analysis of stomatal guard cell orientation. Sci Rep 15:38578. https://doi.org/10.1038/s41598-025-22412-5

10.1038/s41598-025-22412-541188350PMC12586445

Tomaszewski L, Kołakowski R (2023) Mobile services for smart agriculture and forestry, biodiversity monitoring, and water management: challenges for 5G/6G networks. Telecom 4:67-99. https://doi.org/10.3390/telecom4010006

10.3390/telecom4010006

Toprak S, Coşkun ÖF (2025) Characterization of pepper (Capsicum annuum L.) genotypes for zinc stress tolerance based on morphological traits. Harran Tarım Ve Gıda Bilim Derg 29:477-486. https://doi.org/10.29050/harranziraat.1700013

10.29050/harranziraat.1700013

Tripodi P, Greco B (2018) Large scale phenotyping provides insight into the diversity of vegetative and reproductive organs in a wide collection of wild and domesticated peppers (Capsicum spp.). Plants 7:103. https://doi.org/10.3390/plants7040103

10.3390/plants704010330463212PMC6313902

Uniyal M (2024) Linear Discriminant Analysis in Machine Learning. Appl. Roots. URL https://www.appliedaicourse.com/blog/linear-discriminant-analysis-in-machine-learning/#:~:text=Key%20Assumptions:&text=Equal%20Covariance%20Matrices:%20It%20assumes,be%20independent%20of%20each%20other. (accessed 10.20.25)

Viáfara-Vega RA, Granada-Agudelo M, Cárdenas-Henao H (2025) Characterization and discrimination of Colombian Capsicum accessions using elliptic fourier analysis. Genet Resour Crop Evol 72:8973-8984. https://doi.org/10.1007/s10722-025-02482-0

10.1007/s10722-025-02482-0

Wen W, Li Z, Shao J, Tang Y, Zhao Z, Yang J, Ding M, Zhu X, Zhou M (2021) The distribution and sustainable utilization of buckwheat resources under climate change in China. Plants 10:2081. https://doi.org/10.3390/plants10102081

10.3390/plants1010208134685889PMC8538749

Yang Z, Dai G, Qin K, Wu J, Wang Z, Wang C (2025) Comprehensive evaluation of germplasm resources in various goji cultivars based on leaf anatomical traits. Forests 16:187. https://doi.org/10.3390/f16010187

10.3390/f16010187

Zhao W, Wang L, Song Y, Jiang H, Guo X (2026) Leaf-fruit trait decoupling along environmental gradients in tropical cryptocaryeae (Lauraceae). Plants 15:126. https://doi.org/10.3390/plants15010126

10.3390/plants1501012641515071PMC12787473

Horticultural Science and Technology 원예과학기술지 ISSN:1226-8763(Print) 2465-8588(Online)

Preview

Assessing the Discriminatory Power of Leaf Morphometric Traits in Capsicum annuum by means of a Linear Discriminant Analysis

ABSTRACT

MAIN

Table 1.

Morphometric traits obtained from chili (Capsicum annuum) accessions

Fig. 1.

Chili leaf morphometric parameters related to size and shape.

Table 2.

Summary statistics of chili leaf morphometric data

Table 3.

Summary of statistical tests assessing normality, skewness, kurtosis, and homogeneity of covariance among chili leaf traits

Fig. 2.

Histograms and QQ plots showing the distribution and normality of chili leaf morphometric parameters for the 2021 dataset.

Fig. 3.

Histograms and QQ plots showing the distribution and normality of chili leaf morphometric parameters for the 2022 dataset.

Fig. 4.

Correlation analysis of chili leaf morphometric parameters in 2021 (A) and 2022 (B).

Fig. 5.

K-means clustering plots with optimal number of clusters for chili leaf data of 2021 (A) and 2022 (B)

Table 4.

Principal component analysis with chili leaf traits across two years

Fig. 6.

Principal component analysis (PCA) and cluster visualization of chili leaf data. PCA biplots showing the contributions of leaf traits for 2021 (A) and 2022 (B). Scree plots illustrating the variance explained by principal components in 2021 (C) and 2022 (D).

Fig. 7.

Linear discriminant analysis (LDA) plots of clustered data. LDA plots visualize the separation of eight identified clusters 2021: (A) three identified clusters 2022 (B), based on their projection onto the first two linear discriminant functions (LD1 and LD2).

Table 5.

Discriminant function coefficients, proportions of trace and performance from the linear discriminant analysis (LDA)

Fig. 8.

Acknowledgements

References