Research Article

Horticultural Science and Technology. 2026.
https://doi.org/10.7235/HORT.20260011

ABSTRACT


MAIN

  • Introduction

  • Materials and Methods

  •   Leaf sampling

  •   Leaf image acquisition

  •   Image processing and trait extraction

  •   Statistical analysis

  • Results

  • Discussion

Introduction

Global agriculture faces increasing pressure to enhance food security while adapting to rapid climate change (Wen et al. 2021). This challenge demands germplasm characterization methods that are rapid, cost effective, and non-destructive to support the development of resilient crop varieties (Mansoor et al. 2024; Baloch et al. 2025). Leaf morphometric traits are an appealing phenotyping resource because they are stable, easy to measure, and widely applied in the fields of plant biology, agronomy, paleobotany, and taxonomy (De Luna-Bonilla et al. 2024; Park et al. 2025). These traits capture variations in the leaf size, shape and structural complexity, providing essential insights into the photosynthetic efficiency, transpiration, stress tolerance and development patterns (El‐Hendawy et al. 2007; Dong et al. 2020; Thai et al. 2025; Toprak and Coşkun 2025). They are also influenced by environmental factors, such as the climate, geology, altitude and soil conditions, making them sensitive indicators of ecological variations (Cornelissen et al. 2003; Marron et al. 2007; Royer 2012). Leaf morphometry has been successfully employed for cultivar identification in crops such as grapevines and for reconstructing paleoenvironments from fossil leaves (Royer 2012; Chitwood et al. 2014).

Despite its broad utility, the potential of leaf morphometric descriptors if used to discriminate Capsicum accessions remains underexplored (Nishani et al. 2025). Chili leaves generally exhibit a consistent ovate-lanceolate shape with smooth margins and an acute apex, suggesting that subtle quantitative variations rather than gross morphological differences must be leveraged for phenotypic discrimination (Idrees et al. 2020). To evaluate such subtle variations, statistical and machine leaning tools, including clustering, principal component analysis (PCA), random forest (RF), and discriminant analysis are commonly used for morphometric classification (Al Hiary et al. 2011; Graf et al. 2024). Among these approaches, linear discriminant analysis (LDA) is one of the most widely used supervised techniques for dimensionality reduction and group discrimination. LDA maximizes the separation between predefined groups and identifies traits contributing most strongly to the class structure (Gardner-Lubbe 2021; Jameson 2024).

In this study, we evaluated eight quantitative leaf morphometric descriptors to examine ability to discriminate Capsicumannum accessions effectively. We hypothesized that leaf morphometric traits alone are insufficient to discriminate among a large number of C. annum accessions reliably if using LDA. Our dataset includes over 350 accessions, each intended to be a unique class/group for discrimination. In particular, the suitability of LDA as a classification method for chili germplasm based solely on leaf morphometric traits was assessed to gain insight into the development of a prediction method. Our objectives were to determine whether LDA can reliably separate accessions using these traits and to evaluate its potential as a rapid, nondestructive phenotyping approach for chili germplasm characterization.

Materials and Methods

Leaf sampling

In 2021 and 2022, leaf samples were collected from 371 and 357 Capsicum annuum accessions, respectively, from mid-June through the fruiting period for image-based phenotyping. All accessions were cultivated under the standardized management practices recommended by the National Seed Resources (https://www.seed.go.kr/sites/seed/index.do), with cultivation taking place at the Rural Development Administration (RDA), Deokjin-gu, Jeonju-si, Jeonbuk-do, Republic of Korea (35°49'51.6"N 127°03'46.0"E) following uniform schedules for irrigation, weeding, and general field management to ensure consistent growing conditions across genotypes.

For each accession, six fully expanded leaves free from visible pathogen symptoms, herbivore damage, or substantial epiphyll coverage were collected. The petiole was included in the leaf boundary for all subsequent image analysis and trait extraction steps. All samples were obtained from plants exposed to full sunlight to minimize leaf morphology variations associated with shading. Leaf sampling was conducted during daytime under natural field conditions. Immediately after collection, leaves were placed in plastic bags without additional moisture and transported indoors to minimize moisture loss and prevent deformation prior to imaging.

Leaf image acquisition

Leaf imaging was performed in an indoor studio (800 × 800 × 800 mm) equipped with an 18 W white LED light source (5600 K; CN-T96, Plastic, Republic of Korea) that provided stable and uniform illumination, thereby minimizing shadows and optical distortion. Images were captured using a Canon EOS D200II digital camera (Canon, Japan) fitted with an EF-S 18 - 55 mm lens and a 24.1-megapixel CMOS sensor. The camera settings included an exposure time of 1/25 s. Because convex lenses are prone to central thickening that can introduce geometric distortion, the camera’s built-in distortion correction function was activated to reduce measurement errors. Leaves were placed on a custom-made white background plate to ensure strong contrast with the leaf surface. All samples were arranged flat, with no overlapping and with the adaxial surface facing upward. Imaging was conducted under controlled lighting conditions, and the camera was positioned at a fixed distance to maintain a consistent scale across all accessions.

Image processing and trait extraction

Leaf image processing was conducted using ImageJ software (National Institutes of Health, USA). All images were initially inspected for clarity and proper leaf placement. Subsequently, the contrast of each image was adjusted when necessary to ensure clear separation between the leaf surface and the background. A uniform thresholding procedure was applied to segment the leaf from the background, followed by binary conversion. Any minor artifacts or noise were removed using the “Outside,” “Fill Holes” and “Despeckle” functions to ensure accurate shape reconstruction. For each leaf, the outline was extracted using the “Analyze Particles” tool with standardized size and shape criteria to avoid any accidental selection of non-leaf objects. The scale for each image was calibrated using the pixel resolution prior to measurement. All trait extractions (Table 1) were performed using ImageJ’s built-in measurement suite. Eight quantitative leaf morphometric descriptors—leaf area, perimeter, circularity, leaf length, leaf width, aspect ratio, roundness, and solidity—were used to evaluate their ability to discriminate effectively among Capsicum annuum accessions (Table 1 and Fig. 1).

Table 1.

Morphometric traits obtained from chili (Capsicum annuum) accessions

Trait Description Formula References
Leaf length Maximum distance along the leaf’s major axis cm L = Major axis length Park et al. 2025
Leaf width Maximum distance perpendicular to the major axis cm W = Minor axis length Park et al. 2025
Leaf Area Total projected area of the leaf surface cm2 A = Npixels​ × (pixel size)2
; N ‒ Number of Pixels
Bankhead 2025
Perimeter Length of the leaf boundary cm P = Nboundary pixels​ × pixel size
; N ‒ Number of Pixels
Bankhead 2025
Aspect Ratio Ratio of major to minor axis lengths, representing leaf elongation No unit AR = L/W​ Fanourakis et al. 2021
Circularity Parameter that describes how closely a leaf’s shape resembles a perfect circle No unit Circularity = 4π×A/P2​Fanourakis et al. 2021
Solidity Computed as the ratio of leaf area to its convex hull area, indicating leaf margin smoothness or the degree of lobing No unit Solidity = A/Aconvex hull​​Fanourakis et al. 2021
Roundness Defined as the ratio of the minor axis to the major axis, describing the overall compactness of the leaf No unit Roundness = W/L Fanourakis et al. 2021

https://cdn.apub.kr/journalsite/sites/kshs/2026-044-00/N020260011/images/HST_20260011_F1.jpg
Fig. 1.

Chili leaf morphometric parameters related to size and shape.

Statistical analysis

All analyses were performed in R v4.5.1 using RStudio (integrated development environment for R, RStudio Inc., USA). The dataset was initially examined for missing entries, duplicated records, and apparent data-entry errors. Outliers were identified using the interquartile range (IQR) criterion and were removed to enhance the robustness of the subsequent analyses. Descriptive statistics (mean, median, and range) were calculated to summarize the dataset. Data distributions were evaluated using the Shapiro-Wilk test for normality, alongside skewness and kurtosis, and Levene’s test for homogeneity of variance. The performance of LDA depends on key assumptions, normally distributed predictors, homogeneity of variance-covariance matrices across groups, linear class boundaries and low multicollinearity among features (Jameson 2024). As chili leaf morphometric variables deviated significantly from a Gaussian distribution and exhibited substantial covariance heterogeneity (violating key LDA assumptions), and given the analytical intractability of classifying over 350 individual accessions with only eight highly correlated predictors, the following strategy was adopted: standardization and PCA, cluster definition (grouping factor), and discriminant analysis.

First, leaf morphometric data were standardized (z-scored) and Pearson correlation was used to construct the covariance matrix. PCA was then performed on the standardized data to capture major phenotypic variations and address multicollinearity. Second, k-means clustering was applied to the principal component (PC) scores to define broad phenotypic groups instead of using individual accessions as classes. The optimal number of clusters (k) was determined using the silhouette method, which yielded k = 8 clusters for the 2021 data and k = 3 clusters for the 2022 data. These clusters served as the predefined groups for the subsequent supervised analysis. Finally, Fisher’s linear discriminant analysis (LDA) was conducted using the PC as predictors and the derived k-means clusters as the classification factor, an approach more robust when data are non-normal, highly correlated, or exhibit unequal covariance structures.

Results

Descriptive statistics of leaf morphological traits revealed clear differences between the 2021 and 2022 seasons (Table 2). Leaves were generally larger in 2021, with a higher mean leaf area (0.589 cm2) compared to 2022 (0.510 cm2). The maximum leaf area also differed substantially between the years, reaching 3.066 cm2 in 2021, whereas it was 2.185 cm2 in 2022. Leaf width followed this pattern, whereas leaf length was slightly greater in 2022, indicating that leaves in 2021 were broader while those in 2022 tended to be longer and narrower. Additionally, leaf perimeter values were greater in 2022 than in 2021, suggesting that although leaves were narrower in 2022, they had more elongated or complex margins.

Table 2.

Summary statistics of chili leaf morphometric data

2021 2022
Area 
(cm2)
Perimeter (cm) Circularity Length (cm) Width (cm) Aspect ratio Roundness Solidity Area 
(cm2)
Perimeter (cm) Circularity Length (cm) Width (cm) Aspect ratio Roundness Solidity
Minimum 0.042 0.002 0.200 0.676 0.147 1.200 0.300 0.600 0.043 1.514 0.147 0.679 0.128 1.498 0.215 0.430
1st quartile 0.402 0.004 0.400 1.415 0.538 1.900 0.500 0.800 0.346 3.986 0.238 1.690 0.480 2.260 0.341 0.659
Median 0.529 0.005 0.500 1.585 0.633 2.000 0.500 0.900 0.457 4.660 0.268 1.976 0.567 2.550 0.392 0.694
Mean 0.589 0.005 0.457 1.642 0.651 2.068 0.493 0.861 0.510 4.755 0.274 2.004 0.585 2.616 0.396 0.694
3rd quartile 0.711 0.005 0.500 1.831 0.748 2.200 0.500 0.900 0.633 5.464 0.303 2.310 0.672 2.929 0.443 0.729
Maximum 3.066 0.006 0.600 3.045 1.707 4.000 0.800 1.000 2.185 7.638 0.574 3.162 1.441 4.650 0.668 0.936

Shape descriptors (circularity, aspect ratio, roundness, and solidity) also supported these differences. Circularity and roundness values were higher in 2021 than in 2022 (Table 2), indicating more circular and broader leaf shapes in 2021. In contrast, leaves in 2022 were more elongated and less circular. The aspect ratio increased from 2.07 in 2021 to 2.62 in 2022, confirming that the 2022 foliage was more elongated. Solidity was slightly lower in 2022, suggesting more irregular or serrated leaf margins.

These results showed that both datasets fail to satisfy the key assumptions of LDA (Table 3, Figs. 2 and 3). All leaf morphometric traits deviated strongly from normality (Shapiro–Wilk, p < 0.001) with evident skewness and leptokurtosis, and variances were significantly heterogeneous across the groups (Levene’s test, p < 0.001). The results of the correlation analysis in Fig. 4 reveal pronounced multicollinearity in both years, with the area, perimeter, length, and width exhibiting very strong positive associations (r ≥ 0.9 in 2021; r > 0.8 in 2022). Additionally, circularity and solidity were highly correlated in both years (r > 0.7), and circularity showed a strong positive correlation with roundness in 2022 (r = 0.81). Conversely, the aspect ratio displayed strong negative correlations with roundness (r < ‒0.8), reflecting inherent geometric redundancy. The overall covariance structure was nearly identical between the seasons, indicating that both datasets contain tightly coupled shape and size descriptors that violate the independence assumptions required for LDA.

Table 3.

Summary of statistical tests assessing normality, skewness, kurtosis, and homogeneity of covariance among chili leaf traits

Shapiro-Wilk test Skewness Kurtosis Levene’s vest
Wp-value Degree of freedom F-value p-value
2021 Area (cm2) 0.9083 *** 1.4800 7.3406 370 4.3210 ***
Perimeter (cm) 0.9654 *** 0.7727 4.0441 370 3.1598 ***
Circularity (cm) 0.7811 *** ‒0.7534 3.8866 370 3.1068 ***
Length (cm) 0.9635 *** 0.7943 4.0629 370 3.1061 ***
Width (cm) 0.9795 *** 0.6100 4.3488 370 2.8956 ***
Aspect ratio 0.9122 *** 1.4182 7.6798 370 2.3150 ***
Roundness 0.8275 *** 0.0024 3.6812 370 3.4629 ***
Solidity 0.6609 *** ‒1.0629 3.2116 370 3.6115 ***
2022 Area (cm2) 0.8900 *** 1.7637 9.5541 356 2.2052 ***
Perimeter (cm) 0.9925 *** 0.2376 2.7700 356 1.9271 ***
Circularity (cm) 0.9587 *** 0.9445 5.1034 356 1.7892 ***
Length (cm) 0.9946 *** 0.1344 2.6651 356 1.7488 ***
Width (cm) 0.9554 *** 0.9851 5.7592 356 1.7727 ***
Aspect ratio 0.9745 *** 0.6631 3.5735 356 1.6034 ***
Roundness 0.9887 *** 0.4178 3.1396 356 1.6066 ***
Solidity 0.9932 *** ‒0.0301 3.9542 356 1.6122 ***

***significant at p-value < 0.001.

https://cdn.apub.kr/journalsite/sites/kshs/2026-044-00/N020260011/images/HST_20260011_F2.jpg
Fig. 2.

Histograms and QQ plots showing the distribution and normality of chili leaf morphometric parameters for the 2021 dataset.

https://cdn.apub.kr/journalsite/sites/kshs/2026-044-00/N020260011/images/HST_20260011_F3.jpg
Fig. 3.

Histograms and QQ plots showing the distribution and normality of chili leaf morphometric parameters for the 2022 dataset.

https://cdn.apub.kr/journalsite/sites/kshs/2026-044-00/N020260011/images/HST_20260011_F4.jpg
Fig. 4.

Correlation analysis of chili leaf morphometric parameters in 2021 (A) and 2022 (B).

Given the aforementioned failures to satisfy the normality, equal covariance, or low multicollinearity criteria assumptions of LDA and considering the very large number of accessions (371 accessions in 2021; 357 accessions in 2022) combined with substantial overlap among individuals, it is unrealistic for LDA to discriminate more than 350 classes using only eight correlated quantitative predictors. To address these constraints, Fisher’s linear discriminant analysis was applied using principal components (PCs) derived from standardized data as predictors, with k-means clusters serving as the grouping factor (Fig. 5). This approach is more robust when data are non-normal, highly correlated, or exhibit unequal covariance structures (Qu and Pei 2024). The PCA indicated that the first four PCs captured more than 97% of the total variance in 2021, while the first three PCs accounted for over 97% of the variance in 2022 (Table 4, Fig. 6C and 6D). For both years, PC1 was strongly influenced by nearly all traits except for the aspect ratio, whereas PC2 reflected opposing contributions, with the area, perimeter, length, width, and aspect ratio loaded in one direction and the circularity, solidity, and roundness loaded in the opposite, highlighting a clear separation between size- and shape-related descriptors (Table 4, Fig. 6A and 6B).

https://cdn.apub.kr/journalsite/sites/kshs/2026-044-00/N020260011/images/HST_20260011_F5.jpg
Fig. 5.

K-means clustering plots with optimal number of clusters for chili leaf data of 2021 (A) and 2022 (B)

Table 4.

Principal component analysis with chili leaf traits across two years

2021 2022
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
Area 0.4406 ‒0.1926 ‒0.1381 0.0539 0.1778 0.6455 0.5435 ‒0.0544 0.4517 ‒0.1959 ‒0.1900 ‒0.4366 ‒0.4147 0.3374 0.0802 0.4885
Perimeter 0.4053 ‒0.3437 ‒0.0648 ‒0.0194 ‒0.0565 ‒0.3242 0.0164 0.7775 0.3436 ‒0.4018 ‒0.0534 0.2737 0.0249 ‒0.1365 0.7449 ‒0.2619
Circularity 0.2510 0.5028 ‒0.2602 0.7496 0.1340 ‒0.1897 0.0175 0.0187 0.2718 0.4495 ‒0.2249 ‒0.4048 0.6703 0.0834 0.2264 ‒0.0461
Leaf length 0.3802 ‒0.3888 ‒0.1123 ‒0.0163 ‒0.0765 ‒0.5402 0.1323 ‒0.6134 0.3035 ‒0.4303 ‒0.1864 0.3345 0.4918 ‒0.1488 ‒0.4244 0.3632
Leaf width 0.4652 ‒0.0853 0.0137 0.0160 ‒0.0211 0.3324 ‒0.8059 ‒0.1245 0.4786 ‒0.1355 0.0549 ‒0.2354 ‒0.1192 ‒0.0322 ‒0.4484 ‒0.6913
Aspect ratio ‒0.3212 ‒0.3790 ‒0.4076 0.0838 0.7378 ‒0.0339 ‒0.1856 0.0114 ‒0.3578 ‒0.3243 ‒0.4378 ‒0.4933 ‒0.0210 ‒0.5741 0.0230 ‒0.0355
Roundness 0.3089 0.3453 0.4978 ‒0.3235 0.6277 ‒0.1903 0.0503 ‒0.0098 0.3644 0.3321 0.4007 ‒0.0310 ‒0.1563 ‒0.7039 0.0237 0.2740
Solidity 0.1381 0.4132 ‒0.6945 ‒0.5679 ‒0.0514 ‒0.0496 ‒0.0084 0.0143 0.1483 0.4233 ‒0.7214 0.3987 ‒0.3117 ‒0.1112 ‒0.0720 ‒0.0677
Importance of components:
Standard deviation 2.1201 1.4553 1.0078 0.4401 0.3364 0.2329 0.0926 0.0455 2.0148 1.7704 0.7969 0.2945 0.2100 0.1544 0.0998 0.0815
Proportion of variance 0.5618 0.2647 0.1270 0.0242 0.0141 0.0068 0.0011 0.0003 0.5074 0.3918 0.0794 0.0108 0.0055 0.0030 0.0012 0.0008
Cumulative proportion 0.5618 0.8266 0.9535 0.9777 0.9919 0.9987 0.9997 1.0000 0.5074 0.8992 0.9786 0.9894 0.9950 0.9979 0.9992 1.0000

https://cdn.apub.kr/journalsite/sites/kshs/2026-044-00/N020260011/images/HST_20260011_F6.jpg
Fig. 6.

Principal component analysis (PCA) and cluster visualization of chili leaf data. PCA biplots showing the contributions of leaf traits for 2021 (A) and 2022 (B). Scree plots illustrating the variance explained by principal components in 2021 (C) and 2022 (D).

The LDA revealed that most discriminatory power was concentrated in the first linear discriminant (LD1) for both years, explaining 63.26% of the variation in 2021 and 66.71% in 2022, followed by LD2 (23.59% in 2021; 23.19% in 2022) and LD3, which accounted for only minor proportions (13.15% in 2021 and 10.10% in 2022) (Table 5). Therefore, the first two LDs are sufficient for visualizing the separation of clusters (Fig. 7). In 2021, LD1 was strongly influenced by PC1 (loading = ‒1.26) and moderately by PC3 (‒0.51), indicating that the primary separation among clusters was driven by size-related traits. LD2 was dominated by PC2 (1.23), while LD3 was mainly defined by PC3 (1.45), reflecting secondary contributions from shape descriptors. In contrast, the 2022 dataset showed a reversed direction for PC1 on LD1 (1.47), suggesting a shift in how size traits contributed to group separation, while PC2 loaded negatively and strongly on LD2 (‒1.03). LD3 in 2022 was heavily influenced by PC3 (‒1.78), indicating that fine-scale shape variations contributed most to the third discriminant axis.

https://cdn.apub.kr/journalsite/sites/kshs/2026-044-00/N020260011/images/HST_20260011_F7.jpg
Fig. 7.

Linear discriminant analysis (LDA) plots of clustered data. LDA plots visualize the separation of eight identified clusters 2021: (A) three identified clusters 2022 (B), based on their projection onto the first two linear discriminant functions (LD1 and LD2).

The LDA models showed very low classification performance for both years (Table 5). The overall accuracy rates from the hold-out test were only 7.8% in 2021 and 8.8% in 2022, far below what would be expected by chance when classifying several numbers of clusters. Cross-validation results confirmed this pattern. Leave-one-out cross-validation (LOOCV) produced accuracy rates of 8.9% in 2021 and 11.4% in 2022, while 10-fold cross-validation yielded similarly low rates (8.4% in 2021 and 9.5% in 2022). The corresponding Kappa values (0.078 in 202 and 0.093 in 2022) were close to zero, indicating almost no agreement between predicted and true groups beyond random chance.

Table 5.

Discriminant function coefficients, proportions of trace and performance from the linear discriminant analysis (LDA)

2021 2022
LD1 LD2 LD3 LD1 LD2 LD3
PC1 ‒1.2632 ‒0.1371 ‒0.1166 1.4680 ‒0.2950 0.0199
PC2 ‒0.2406 1.2314 ‒0.1906 ‒0.5266 ‒1.0256 0.1940
PC3 ‒0.5149 0.2884 1.4515 ‒0.1787 ‒0.5743 ‒1.7758
Proportion of trace 0.6326 0.2359 0.1315 0.6671 0.2319 0.1010
Overall accuracy (hold-out split) 0.0777 0.0879
Leave-one-out cross-validation (LOOCV) accuracy 0.0893 0.1137
10-fold cross-validation (CV) accuracy 0.0836 0.0951
10-fold cross-validation (CV) kappa 0.0776 0.0929

PC, principal component; LD, linear discriminant.

Discussion

This study evaluated whether basic leaf morphometric traits can reliably support the classification of a large collection of C. annuum accessions and whether LDA is an appropriate predictive tool for such datasets. The results strongly support the hypothesis that they do not, as both statistical diagnostics and classification performance outcomes indicated that these traits lack the discriminatory power required for accession-level separation. Despite evaluating over 350 accessions, the models failed to achieve meaningful classification accuracy, indicating substantial overlap in the morphometric trait space among accessions.

The poor performance of LDA can be attributed to strong violations of its underlying assumptions, specifically non-normal trait distributions, covariance heterogeneity, and high multicollinearity among descriptors. Moreover, the large number of classes combined with limited samples per accession contributed to unstable discriminant functions and weak predictive performance (Jensen 2018; Lapanowski and Gaynanova 2020; Gardner-Lubbe 2021; Ali et al. 2022; Qu and Pei 2024). These conditions led to extensive overlap among groups, confirming that the selected traits do not provide a sufficient signal for reliable classification under a linear framework.

From a biological perspective, the limited discriminatory capacity of these traits reflects the inherently conserved and environmentally plastic nature of leaf morphology in C. annuum. Traits such as leaf area, length, width, and perimeter primarily describe vegetative growth and resource acquisition strategies, which are strongly influenced by environmental conditions rather than genotype alone (Marron et al. 2007; Nishani et al. 2025). For instance, larger and broader leaves, as observed in 2021, are typically associated with favorable growing conditions that promote photosynthetic capacity, whereas narrower and more elongated leaves, as observed in 2022, may represent adaptive responses to environmental stress, such as higher temperatures (Fig. 8) or reduced water availability. These environmentally driven responses are consistent with previous findings in Capsicum and other plant species, where the leaf morphology exhibits substantial phenotypic plasticity (Royer 2012; Romero-Higareda et al. 2022).

https://cdn.apub.kr/journalsite/sites/kshs/2026-044-00/N020260011/images/HST_20260011_F8.jpg
Fig. 8.

Daily maximum and minimum air temperatures (°C) recorded in Jeonju, South Korea (study location) during the chili (Capsicum annuum) growing period from March to June in (A) 2021 and (B) 2022. These data represent environmental conditions during the vegetative and early fruiting stages and provide context for the observed variations in leaf morphological traits (Source: https://www.timeanddate.com/weather/south-korea/jeonju).

The structure revealed by PCA and k-means clustering further supports this interpretation. Variations were primarily organized along axes representing size-related and shape-related traits, reflecting coordinated developmental processes rather than accession-specific differences. The strong multicollinearity observed among size-related traits (area, perimeter, length, and width) indicates a tightly integrated growth module, while shape descriptors such as circularity, roundness, and solidity represent a secondary module associated with leaf geometry. Although the aspect ratio contributed independently to shape variations, the overall morphospace remained highly overlapped among accessions. This suggests that the leaf morphology in C. annuum is governed more by general developmental and environmental gradients than by distinct genetic signatures (Baumgartner et al. 2020; Yang et al. 2025). Moreover, the relatively simple leaf architecture of chili, typically ovate to lanceolate and lacking pronounced lobes or serrations, further constrains the discriminatory potential of basic geometric descriptors (Royer 2012; Idrees et al. 2020; Fanourakis et al. 2021).

From a practical and horticultural perspective, these findings highlight an important limitation for germplasm characterization and breeding applications. Although leaf morphometric traits are easy to measure and well suited for high-throughput phenotyping, they do not adequately capture variations in economically important traits such as fruit morphology, yield, pungency, or resistance to pests and diseases (Poljak et al. 2024; Zhao et al. 2026). These agriculturally valuable characteristics are typically governed by more complex genetic, physiological, and biochemical processes and therefore require integrative phenotyping approaches. Previous studies have consistently demonstrated that fruit traits, biochemical profiles, and genomic markers provide stronger discriminatory power at the cultivar and accession level in Capsicum (Jones et al. 2011; Tripodi and Greco 2018; Alvares Bianchi et al. 2020; Hong et al. 2020; Lozada et al. 2023).

To improve classification performance capabilities, future studies should incorporate phenotypic descriptors with higher resolutions. Advanced morphometric approaches such as elliptic Fourier analysis (EFA) can capture detailed leaf boundary complexities and have been shown to outperform simple geometric traits with regard to cultivar discrimination (Viáfara-Vega et al. 2025). In addition, traits related to the venation architecture, stomatal characteristics, leaf texture, and spectral or colorimetric properties may provide more robust and genotype-specific signals (Chitwood et al. 2014; Thai et al. 2025). These features are increasingly accessible through modern high-throughput phenotyping platforms, including imaging systems and low-cost smartphone-based tools, enabling rapid and nondestructive data acquisition even in large germplasm collections (Tomaszewski and Kołakowski 2023; Park et al. 2025; Thai et al. 2025).

The limitations observed in this study do not invalidate LDA as a classification method but rather emphasize its dependence on an appropriate data structure. LDA remains effective when trait distributions approximate normality, covariance structures are homogeneous, and class boundaries are linearly separable. However, in datasets characterized by high class numbers, limited replication, non-linear trait relationships, and substantial overlap among groups, more flexible classifiers such as random forests, support vector machines, or other non-parametric approaches may yield improved predictive performance (Breiman 2001; Graf et al. 2024; Uniyal 2024). In this study, LDA was intentionally selected as a baseline model to evaluate the intrinsic discriminatory capacity of plant traits under a linear framework.

This study demonstrates that basic leaf morphometric traits are insufficient for reliable accession-level discrimination in C. annuum, thereby supporting the initial hypothesis. While these traits effectively describe general patterns of vegetative growth and environmental responses, they lack the specificity required for germplasm classification. Consequently, reliance on simple leaf descriptors alone in high-throughput phenotyping pipelines may lead to limited or misleading classification outcomes. Future research should integrate detailed morphological, physiological, and genomic data to enhance discriminatory power and improve the effectiveness of phenotyping approaches for better breeding and germplasm management.

Acknowledgements

This research was supported by Kyungpook National University Research Fund, 2023.

References

1

Al Hiary H, Bani Ahmad S, Reyalat M, Braik M, ALRahamneh Z (2011) Fast and accurate detection and classification of plant diseases. Int J Comput Appl 17:31-38. https://doi.org/10.5120/2183-2754

10.5120/2183-2754
2

Ali S, Hassan M, Kim JY, Farid MI, Sanaullah M, Mufti H (2022) FF-PCA-LDA: Intelligent feature fusion based PCA-LDA classification system for plant leaf diseases. Appl Sci 12:3514. https://doi.org/10.3390/app12073514

10.3390/app12073514
3

Alvares Bianchi P, Renata Almeida Da Silva L, André Da Silva Alencar A, Henrique Araújo Diniz Santos P, Pimenta S, Pombo Sudré C, Erpen-Dalla Corte L, Simões Azeredo Gonçalves L, Rodrigues R (2020) Biomorphological characterization of brazilian capsicum chinense Jacq. germplasm. Agronomy 10:447. https://doi.org/10.3390/agronomy10030447

10.3390/agronomy10030447
4

Baloch FS, Lee SM, Mansoor S, Morales A, Karunathilake EMBM, Nadeem MA, Cavagnaro PF, Chung YS (2025) GBS-derived SNP and SilicoDArT markers reveals the genetic variation and population structure of Korean buckwheat (Fagopyrum esculentum) an underutilised crop. BMC Plant Biol 25:1479. https://doi.org/10.1186/s12870-025-07633-0

10.1186/s12870-025-07633-041168792PMC12574297
5

Bankhead P (2025) Pixel size & dimensions. Introd. Bioimage Anal. URL https://bioimagebook.github.io/chapters/1-concepts/5-pixel_size/pixel_size.html (accessed 11.30.25).

6

Baumgartner A, Donahoo M, Chitwood DH, Peppe DJ (2020) The influences of environmental change and development on leaf shape in Vitis. Am J Bot 107:676-688. https://doi.org/10.1002/ajb2.1460

10.1002/ajb2.146032270876PMC7217169
7

Breiman L (2001) Random Forests. Mach Learn 45:5-32. https://doi.org/10.1023/A:1010933404324

10.1023/A:1010933404324
8

Chitwood DH, Ranjan A, Martinez CC, Headland LR, Thiem T, Kumar R, Covington MF, Hatcher T, Naylor DT, et al. (2014) A Modern Ampelography: A genetic basis for leaf shape and venation patterning in grape. PLANT Physiol 164:259-272. https://doi.org/10.1104/pp.113.229708

10.1104/pp.113.22970824285849PMC3875807
9

Cornelissen JHC, Lavorel S, Garnier E, Díaz S, Buchmann N, Gurvich DE, Reich PB, Steege HT, Morgan HD, et al. (2003) A handbook of protocols for standardised and easy measurement of plant functional traits worldwide. Aust J Bot 51:335-380. https://doi.org/10.1071/BT02124

10.1071/BT02124
10

De Luna-Bonilla OÁ, Valencia-Á S, Ibarra-Manríquez G, Morales-Saldaña S, Tovar-Sánchez E, González-Rodríguez A (2024) Leaf morphometric analysis and potential distribution modelling contribute to taxonomic differentiation in the Quercus microphylla complex. J Plant Res 137:3-19. https://doi.org/10.1007/s10265-023-01495-z

10.1007/s10265-023-01495-z37740854PMC10764464
11

Dong N, Prentice IC, Wright IJ, Evans BJ, Togashi HF, Caddy‐Retalic S, McInerney FA, Sparrow B, Leitch E, et al. (2020) Components of leaf‐trait variation along environmental gradients. New Phytol 228:82-94. https://doi.org/10.1111/nph.16558

10.1111/nph.16558
12

El‐Hendawy SE, Hu Y, Schmidhalter U (2007) Assessing the Suitability of Various Physiological Traits to Screen Wheat Genotypes for Salt Tolerance. J Integr Plant Biol 49:1352-1360. https://doi.org/10.1111/j.1744-7909.2007.00533.x

10.1111/j.1744-7909.2007.00533.x
13

Fanourakis D, Kazakos F, Nektarios PA (2021) Allometric Individual Leaf Area Estimation in Chrysanthemum. Agronomy 11:795. https://doi.org/10.3390/agronomy11040795

10.3390/agronomy11040795
14

Gardner-Lubbe S (2021) Linear discriminant analysis for multiple functional data analysis. J Appl Stat 48:1917-1933. https://doi.org/10.1080/02664763.2020.1780569

10.1080/02664763.2020.178056935706433PMC9042036
15

Graf R, Zeldovich M, Friedrich S (2024) Comparing linear discriminant analysis and supervised learning algorithms for binary classification—A method comparison study. Biom J 66:2200098. https://doi.org/10.1002/bimj.202200098

10.1002/bimj.202200098
16

Hong JP, Ro N, Lee HY, Kim GW, Kwon JK, Yamamoto E, Kang BC (2020) Genomic Selection for Prediction of Fruit-Related Traits in Pepper (Capsicum spp.). Front Plant Sci 11:570871. https://doi.org/10.3389/fpls.2020.570871

10.3389/fpls.2020.57087133193503PMC7655793
17

Idrees S, Hanif MA, Ayub MA, Hanif A, Ansari TM (2020) Chili Pepper, in: Medicinal Plants of South Asia. Elsevier, pp 113-124. https://doi.org/10.1016/B978-0-08-102659-5.00009-4

10.1016/B978-0-08-102659-5.00009-4
19

Jensen G (2018) AI: Weak AI vs Strong AI. Gravitron. URL https://www.gavinjensen.com/blog/2018/ai-weak-strong

20

Jones AMP, Ragone D, Aiona K, Lane WA, Murch SJ (2011) Nutritional and morphological diversity of breadfruit (Artocarpus, Moraceae): Identification of elite cultivars for food security. J Food Compos Anal 24:1091-1102. https://doi.org/10.1016/j.jfca.2011.04.002

10.1016/j.jfca.2011.04.002
21

Lapanowski AF, Gaynanova I (2020) Compressing Large Sample Data for Discriminant Analysis. https://doi.org/10.1109/BigData52589.2021.9671676

10.1109/BigData52589.2021.9671676
22

Lozada DN, Sandhu KS, Bhatta M (2023) Ridge regression and deep learning models for genome-wide selection of complex traits in New Mexican Chile peppers. BMC Genomic Data 24:80. https://doi.org/10.1186/s12863-023-01179-6

10.1186/s12863-023-01179-638110866PMC10726521
23

Mansoor S, Karunathilake EMBM, Tuan TT, Chung YS (2024) Genomics, phenomics, and machine learning in transforming plant research: Advancements and challenges. Hortic Plant J S2468014124000098. https://doi.org/10.1016/j.hpj.2023.09.005

10.1016/j.hpj.2023.09.005
24

Marron N, Dillen SY, Ceulemans R (2007) Evaluation of leaf traits for indirect selection of high yielding poplar hybrids. Environ Exp Bot 61:103-116. https://doi.org/10.1016/j.envexpbot.2007.04.002

10.1016/j.envexpbot.2007.04.002
25

Nishani YAR, Shyamalee HAPA, Ranawake AL (2025) Characterization and diversity analysis of underutilized Capsicum chinense L. accessions in sri lanka. Trop Agric Res Ext 28:73-91. https://doi.org/10.4038/tare.v28i2.5754

10.4038/tare.v28i2.5754
26

Park JE, Mansoor S, Ku K, Le AT, Tuan TT, Ko HC, Min OSS, Baloch FS, Chung YS (2025) Cost effective image-based phenotyping in Capsicum annuum germplasm for rapid assessment of traits. Plant Biotechnol Rep 19:181-196. https://doi.org/10.1007/s11816-025-00968-y

10.1007/s11816-025-00968-y
27

Poljak I, Vidaković A, Benić L, Tumpa K, Idžojtić M, Šatović Z (2024) Patterns of leaf and fruit morphological variation in marginal populations of Acer tataricum L. subsp. tataricum. Plants 13:320. https://doi.org/10.3390/plants13020320

10.3390/plants1302032038276777PMC10818317
28

Qu L, Pei Y (2024) A comprehensive review on discriminant analysis for addressing challenges of class-level limitations, small sample size, and robustness. Processes 12:1382. https://doi.org/10.3390/pr12071382

10.3390/pr12071382
29

Romero-Higareda CE, Hernández-Verdugo S, Pacheco-Olvera A, Núñez-Farfán J, Retes-Manjarrez E, López-Orona C, Osuna-Enciso T (2022) ttADAPTIVE PHNEOTYPIC plasticity of wild Capsicum annuum (Solanaceae) to variable environments of water-light availability. Acta Oecol 114:103807. https://doi.org/10.1016/j.actao.2021.103807

10.1016/j.actao.2021.103807
30

Royer DL (2012) Climate reconstruction from leaf size and shape: new developments and challenges. Paleontol Soc Pap 18:195-212. https://doi.org/10.1017/S1089332600002618

10.1017/S1089332600002618
31

Thai TT, Mansoor S, Van HT, Vu VG, Karunathilake EMBM, Le AT, Baloch FS, Chung YS, Kim J (2025) Advanced phenotyping features utilizing deep learning techniques for automated analysis of stomatal guard cell orientation. Sci Rep 15:38578. https://doi.org/10.1038/s41598-025-22412-5

10.1038/s41598-025-22412-541188350PMC12586445
32

Tomaszewski L, Kołakowski R (2023) Mobile services for smart agriculture and forestry, biodiversity monitoring, and water management: challenges for 5G/6G networks. Telecom 4:67-99. https://doi.org/10.3390/telecom4010006

10.3390/telecom4010006
33

Toprak S, Coşkun ÖF (2025) Characterization of pepper (Capsicum annuum L.) genotypes for zinc stress tolerance based on morphological traits. Harran Tarım Ve Gıda Bilim Derg 29:477-486. https://doi.org/10.29050/harranziraat.1700013

10.29050/harranziraat.1700013
34

Tripodi P, Greco B (2018) Large scale phenotyping provides insight into the diversity of vegetative and reproductive organs in a wide collection of wild and domesticated peppers (Capsicum spp.). Plants 7:103. https://doi.org/10.3390/plants7040103

10.3390/plants704010330463212PMC6313902
36

Viáfara-Vega RA, Granada-Agudelo M, Cárdenas-Henao H (2025) Characterization and discrimination of Colombian Capsicum accessions using elliptic fourier analysis. Genet Resour Crop Evol 72:8973-8984. https://doi.org/10.1007/s10722-025-02482-0

10.1007/s10722-025-02482-0
37

Wen W, Li Z, Shao J, Tang Y, Zhao Z, Yang J, Ding M, Zhu X, Zhou M (2021) The distribution and sustainable utilization of buckwheat resources under climate change in China. Plants 10:2081. https://doi.org/10.3390/plants10102081

10.3390/plants1010208134685889PMC8538749
38

Yang Z, Dai G, Qin K, Wu J, Wang Z, Wang C (2025) Comprehensive evaluation of germplasm resources in various goji cultivars based on leaf anatomical traits. Forests 16:187. https://doi.org/10.3390/f16010187

10.3390/f16010187
39

Zhao W, Wang L, Song Y, Jiang H, Guo X (2026) Leaf-fruit trait decoupling along environmental gradients in tropical cryptocaryeae (Lauraceae). Plants 15:126. https://doi.org/10.3390/plants15010126

10.3390/plants1501012641515071PMC12787473
페이지 상단으로 이동하기