Development of SNP Markers for the Identification of Commercial Korean Watermelon Cultivars Using Fluidigm Genotyping Analysis

Jun-Young Park; Yoon Jeong Jang; Jin-Kee Jung; Eun-Jo Shim; Sung-Chr Sim; Sang-Min Chung; Gung Pyo Lee

doi:10.7235/HORT.20220008

Preview

Research Article

Horticultural Science and Technology. 28 February 2022. 75-84
https://doi.org/10.7235/HORT.20220008

Development of SNP Markers for the Identification of Commercial Korean Watermelon Cultivars Using Fluidigm Genotyping Analysis

Jun-Young Park¹^†

Yoon Jeong Jang¹^†

Jin-Kee Jung²

Eun-Jo Shim²

Sung-Chr Sim³

Sang-Min Chung⁴

Gung Pyo Lee¹^*

¹Department of Plant Science and Technology, Chung-Ang University, Anseong 17546, Korea

²Seed Testing and Research Center, Korea Seed & Variety Service, Gimcheon 39660, Korea

³Department of Bioresources Engineering, Sejong University, Seoul 05006, Korea

⁴Department of Life Sciences, Dongguk University, Seoul 04620, Korea

^{*Corresponding Author}

^{†These authors contributed equally to this work.}

ABSTRACT

Molecular markers based on simple sequence repeats (SSRs) from expressed sequence tags (ESTs) have been used to identify the registered commercial cultivars in watermelon (Citrullus lanatus). To characterize diversity in watermelon cultivars, molecular markers based on genome-wide single nucleotide polymorphisms (SNPs) are needed. Here, we used Fluidigm genotyping for the development of a core set of SNPs to differentiate Korean watermelon commercial cultivars. A candidate subset of SNPs was discovered by genotyping-by-sequencing using 48 F₁ cultivars. The cultivars could be divided into three subpopulations with 97.5% similarity by unweighted pair group method with arithmetic mean (UPGMA) clustering based on 2,300 SNPs. After filtering loci with a high polymorphism information content (PIC), a subset of 238 SNPs was selected and analyzed in 92 F₁ cultivars on the Fluidigm genotyping system. Among these, 141 polymorphic, bi-allelic SNPs were obtained. To evaluate genetic diversity in 92 cultivars, a principal component analysis (PCA) and hierarchical clustering analysis were performed. The first two axes of the PCA explained only 30.5% of the total variance; however, the 92 cultivars clustered into three subgroups with similar sizes. Approximately 90% of the subgroups remained unchanged in the UPGMA clustering analysis. In addition, when the subset of 141 SNPs was filtered with a threshold PIC of 0.36, a core set of 96 SNP markers were obtained, without identical clustering results. The core set of SNP markers can be used for the identification of commercial Korean watermelon cultivars and for the protection of proprietary rights for new cultivar development.

Keywords

F₁ hybrids

genetic relationship

genotyping-by-sequencing

PCA

population structure

MAIN

Introduction
Materials and Methods
Plant Materials and DNA Extraction
GBS Library Construction and Illumina Sequencing
Variant Calling and Marker Development
Population Structure and Genetic Relationships
Results
GBS Analysis
Population Structure and Genetic Relationships
Fluidigm SNP Assay and Marker Development
Genetic Analysis and Core Marker Set Development for 92 F₁ Cultivars
Discussion

Introduction

Watermelon (Citrullus lanatus, 2n = 2x = 22) is an economically important horticultural crop in the Cucurbitaceae family. Watermelon belongs to a xerophytic genus and originated in northeastern Africa 5,000 years ago. Watermelon has been cultivated for over 4,000 years. Cultivated watermelons, so-called desert watermelons, have many desirable traits, such as sweet taste or nutritional value. Thus, genetic diversity tends to be low; however, variation among watermelon accessions has not been studied extensively (Guo et al., 2019). Watermelon breeders have identified variation in consumer preferences among countries. For example, in East Asia, there is a preference for small-sized fruits and thin rinds. In contrast, consumers in the United States tend to prefer large, oblong watermelons with a thick rind (Wu et al., 2019).

According to the FAO, global watermelon production was about 104 million tons. In South Korea, 535,000 tons of watermelon were produced in 2018 (FAO, 2018). Watermelon is one of the popular horticulture crops and is cultivated in central and southern regions of South Korea. The priorities for watermelon breeders in Korea are improving fruit shape and texture, sugar content, and biotic and abiotic stress resistance.

In an analysis of genetic polymorphisms in watermelon cultivars and Plant Introduction (PI) accessions using RAPD markers, cultivated watermelon showed high similarity at the genetic level (92–99%) (Levi et al., 2001). Another analysis of 49 watermelon varieties using a simple sequence repeat (SSR) marker approach (Kwon et al., 2010) revealed that SSR markers are not suitable for commercial watermelons owing to the limited marker number and low variation. However, they also used EST-SSR and genomic BAC library-based SSR markers to identify highly polymorphic markers in 49 cultivars.

Molecular markers have a number of advantages in plant breeding. They provide an easy way to identify the genetic basis of valuable traits and to predict breeding results quickly and precisely (Nasab et al., 2020). The development of fast and accurate next-generation sequencing (NGS) technology has made a significant contribution to plant breeding. Single nucleotide polymorphisms (SNPs) provide comprehensive genome-wide data, with an even distribution across the whole genome; they can be used for high-throughput genotyping assays as a tool in breeding programs (Ganal et al., 2009; Mammadov et al., 2012). Genotyping-by-sequencing (GBS) is a powerful, cost-effective NGS-based approach that uses restriction enzymes to collect only digested genomic regions, reducing genomic complexity. Accordingly the GBS approach can be used for SNP discovery, high-throughput genotyping, and linkage mapping in various plants (Poland and Rife, 2012; Sonah et al., 2013; Peterson et al., 2014).

In this study, we analyzed the genetic diversity in domestic commercial watermelon cultivars and determined a core set of SNPs to differentiate among the cultivars. In particular, we adopted GBS and Fluidigm genotyping to discover genome-wide SNPs. Robust core-SNP markers were selected based on analyses of molecular diversity and genetic relationships.

Materials and Methods

Plant Materials and DNA Extraction

A total of 100 F1 watermelon cultivars were obtained from the Korea Seed and Variety Service (KSVS), Republic of Korea (Table 1). Among F₁cultivars, 48 and 92 cultivars were used for GBS and Fluidigm genotyping, respectively. Five seedlings of each accession were grown in a growth room at 24 ± 1°C. Young leaves of 3-week-old plants were sampled and ground in liquid nitrogen. Total genomic DNA was extracted using the DNeasy Plant Pro Kit (Qiagen, Germantown, MD, USA) according to the manufacturer’s protocol. The quality of the DNA was checked using a DeNovix DS-11 spectrophotometer (DeNovix, Wilmington, DE, USA) and the amount of double-stranded DNA was measured using the Quant-iT^TM Kit, according to the manufacturer’s instructions (Themo Fisher Scientific, Waltham, MA, USA).

Table 1.

Watermelon F₁ commercial cultivars used for GBS and a Fluidigm analysis

Sample	Cultivar	Company	Analysis type
WM01	Speedggul	Nongwoo Bio	GBS
WM02	Lycosweet_2ho	Partner-Seeds	GBS
WM03	Smartggul	Asia Seeds	GBS
WM04	SeedlessPlusggul	Nongwoo Bio	GBS
WM05	SS_ggul_plus	Samsung Seeds	GBS
WM06	Blackbeta	I-Green	GBS
WM07	Sugarwon	Farm Hannong	GBS
WM08	Gangnamggul	Yiseo	GBS
WM09	Dalgonaggul	Syngenta Seeds	GBS, Fluidigm
WM10	Lycofresh_2ho	Partner Seeds	GBS, Fluidigm
WM11	Gamsooggul	Hyundae Seeds	GBS, Fluidigm
WM12	Hwansangggul	Nongwoo Bio	GBS, Fluidigm
WM13	Santaggul	Nongwoo Bio	GBS, Fluidigm
WM14	12weol	Koregon	GBS, Fluidigm
WM15	Speedplusggul	Nongwoo Bio	GBS, Fluidigm
WM16	Wonderfulggul	Nongwoo Bio	GBS, Fluidigm
WM17	Gangta	RDA, Korea	GBS, Fluidigm
WM18	Hwanhiggul	Dongbu Hannong	GBS, Fluidigm
WM19	Jinhansambokggul	Dongbufarm Hannong	GBS, Fluidigm
WM20	Heukdongja	Kwonnong Seeds	GBS, Fluidigm
WM21	Aceggul	Seminis Korea	GBS, Fluidigm
WM22	Jeoktoma	Dongbufarm Hannong	GBS, Fluidigm
WM23	Seolgang	Asia Seeds	GBS, Fluidigm
WM24	Sparkplus	Jangchoon Seeds	GBS, Fluidigm
WM25	HoneyQ_alpha	Asia Seeds	GBS, Fluidigm
WM26	Jijonggul	Dongbu Hitech	GBS, Fluidigm
WM27	Onsesang	Joongang Seeds	GBS, Fluidigm
WM28	Heukmi	Samsung Seeds	GBS, Fluidigm
WM29	Crimsonwave	Dongbufarm Hannong	GBS, Fluidigm
WM30	Dangdanghan	Dongbufarm Hannong	GBS, Fluidigm
WM31	Semiggul	Sakada Korea	GBS, Fluidigm
WM32	Hambakggul	Danong	GBS, Fluidigm
WM33	Hwangryongpo	Samsung Seeds	GBS, Fluidigm
WM34	KD617	Koregon	GBS, Fluidigm
WM35	Mamedeun	Dongbufarm Hannong	GBS, Fluidigm
WM36	Manbokggul	Jeiljongmyonongsan	GBS, Fluidigm
WM37	Miniblackcall	Dongbufarm Hannong	GBS, Fluidigm
WM38	Hero	Dongbufarm Hannong	GBS, Fluidigm
WM39	Sinus	Yiseo	GBS, Fluidigm
WM40	Gangryeoksambokggul	Seminis Korea	GBS, Fluidigm
WM41	Megaspeedggul	Nongwoo Bio	GBS, Fluidigm
WM42	Sinseolgang102	Asia Seeds	GBS, Fluidigm
WM43	Bakmagold	Koregon	GBS, Fluidigm
WM44	Nuneddineggul	Syngenta Seeds	GBS, Fluidigm
WM45	Uriggul	Nongwoo Bio	GBS, Fluidigm
WM46	Blackking	Nongwoo Bio	GBS, Fluidigm
WM47	Creamstar	Dongbufarm Hannong	GBS, Fluidigm
wm96	Seolwhaggul	Koregon	GBS, Fluidigm
WM49	Gigachanggul	Sangrok Seed Farm	Fluidigm
WM50	Sunnyjunior	Kyeongsin Seeds	Fluidigm
WM51	Apul	Syngenta Korea	Fluidigm
WM52	Aehong	Kwonnong Seeds	Fluidigm
WM53	Chamjoeun	Taesung Seeds	Fluidigm
WM54	Chosaengheukmi	Samsung Seeds	Fluidigm
WM55	Eliteggul	Sinnong	Fluidigm
WM56	Blackboss	Danong	Fluidigm
WM57	Dandanhan_blackyellow	Daeyeon breeding Inst.	Fluidigm
WM58	Renaplus	Dongbufarm Hannong	Fluidigm
WM59	Sijeogeun	Partner Seeds	Fluidigm
WM60	WM38	Chun Seeds	Fluidigm
WM61	Minigold	Friends & Biz	Fluidigm
WM62	Jalnanggul	Jenong	Fluidigm
WM63	Goodchoice	Farm Hannong	Fluidigm
WM64	Chookbok	Namnongwonye	Fluidigm
WM65	Palio	Hyundae Seeds	Fluidigm
WM66	Orangesugar	Partner Seeds	Fluidigm
WM67	Blackbeta	I-Green	Fluidigm
WM68	Blackinred	Sangrok Seed Farm	Fluidigm
WM69	Blackinyellow	Sangrok Seed Farm	Fluidigm
WM70	Dynamicggul	Sinnong	Fluidigm
WM71	Sweetbongbong	Kyeongwon Seeds	Fluidigm
WM72	Yeorumen	Asia Seeds	Fluidigm
WM73	Luckyplus	Daeilbio Seeds	Fluidigm
WM74	Seongboggul	Sangrok Seed Farm	Fluidigm
WM75	Superballgold	Friends & Biz	Fluidigm
WM76	Managgul	Koregon	Fluidigm
WM77	Dandanhan_blackred	Daeyeon breeding Inst.	Fluidigm
WM78	Babybox	Jinjong Bio	Fluidigm
WM79	Smallhoney	Daerim Seeds	Fluidigm
WM80	Nokboseok	Daeilbio Seeds	Fluidigm
WM81	Habokggul	Jangchoon Seeds	Fluidigm
WM82	Eomjichukggul	Sangrok Seed Farm	Fluidigm
WM83	Onlyyou	Friends & Biz	Fluidigm
WM84	Choonsangrokggul	Sangrok Seed Farm	Fluidigm
WM85	Dandanhan_miniplus	Daeyeon breeding Inst.	Fluidigm
WM86	Minimi	Bayer AG	Fluidigm
WM87	Royalblack	Jinjong Bio	Fluidigm
WM88	Bakangsggul	Sinnong	Fluidigm
WM89	Epyeonhanred	Dana Seeds	Fluidigm
WM90	Hiddencard	Yuan Seeds	Fluidigm
WM91	Jeilblack	Jeiljongmyonongsan	Fluidigm
WM92	Bellocheggul	Sinnong	Fluidigm
WM93	Heukok	NongHyeb	Fluidigm
WM94	Babylemon	Jinjong Bio	Fluidigm
WM95	Sangnongbok	Saengnong	Fluidigm
WM96	Chamhanbok	KMS Seeds	Fluidigm
WM97	Joeunheukggul	Jeiljongmyonongsan	Fluidigm
WM98	Goodchance	Farm Hannong	Fluidigm
WM99	AW1508	Asia Seeds	Fluidigm
WM100	Arongbok	Green Heart Bio	Fluidigm

GBS Library Construction and Illumina Sequencing

To construct the GBS library, each genomic DNA sample was digested with ApeKI restriction enzyme (NEB, Ipswich, MA, USA) and ligated to barcoded and common adaptors with T4 DNA ligase. The ligated DNA fragments were pooled and cleaned using the QIAquick PCR purification kit (Qiagen). The cleaned DNA was amplified to prepare the GBS library, and the quantity and quality of which were measured using an Agilent Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). The GBS library was sequenced with the Illumina HiSeq 2500 system (Illumina, San Diego, CA, USA) by Macrogen (South Korea).

Variant Calling and Marker Development

The raw reads from the GBS library were analyzed using the TASSEL-GBS pipeline (https://bitbucket.org/tasseladmin/tassel-5-source/wiki/Tassel5GBSv2Pipeline). Good barcoded read-tags were mapped on the watermelon reference sequences (97103v2) of the Cucurbit Genomes Database (CuGenDB; http://cucurbitgenomics.org/) using the Burrows-Wheeler Aligner v0.6.1-r104 (BWA) (Li and Durbin, 2010). The resulting SNPs were filtered under the conditions of minor allele frequency (MAF) > 5%, missing data < 20%, and minimum depth > 10 to produce a variant calling format file. The functional effects of the SNPs were predicted using the snpEff toolbox (Cingolani et al., 2012). The filtered SNPs from 48 F₁ cultivars were selected based on the polymorphism information content (PIC) (Bostein et al., 1980) value to obtain a core marker set, which was applied to the Fluidigm assay for discriminating 92 watermelon cultivars. The PIC value was calculated using PowerMarker V3.25 (Liu and Muse, 2005). The Fluidigm assay was performed using the Fluidigm Juno system (Fluidigm, San Francisco, CA, USA). Allele-specific primers were used for PCR amplification and SNP genotyping using the Juno 96.96 Dynamic Array IFC (Integrated Fluidic Circuit), and genotypes were analyzed using Fluidigm SNP Genotyping Analysis v4.5.1.

Population Structure and Genetic Relationships

The population structure was evaluated based on a Bayesian model-based clustering method using STRUCTURE v2.3.4 (Pritchard et al., 2000). The K values were set from 1 to 10 with a burn-in period of 10,000 iterations. Markov chain Monte Carlo (MCMC) runs were carried out with 100,000 iterations per run. The number of replications for each K was set to 10. The best K value was predicted according to the Evanno method using STRUCTURE HARVESTER v0.6.94 (Earl et al., 2012) by calculating the ΔK. A principal component analysis (PCA) was performed using the prcomp function and results were visualized using the ggplot2 package (Wickham, 2009) in R. A hierarchical clustering analysis was performed using the poppr package (Kamvar et al., 2014) implemented in R. The poppr package was used to estimate Nei’s genetic distances (Nei, 1978) between F₁ cultivars, and a hierarchical clustering analysis was conducted by the unweighted pair group method with arithmetic means (UPGMA). The UPGMA dendrogram was generated using the dendextent package (Galili, 2015).

Results

GBS Analysis

A GBS analysis of 48 watermelon F₁ commercial cultivars, registered in the KSVS (Cimcheon, Republic of Korea), produced 204.1 million reads, which were analyzed in the Tassel-GBS pipeline (Table 2). A total of 18,888 bi-allelic SNPs were mapped to SNP positions with the good barcoded reads. After filtering the SNPs with thresholds of MAF >5% and missing data <10%, we obtained 9,397 SNPs. These SNPs were distributed across the reference genome (Suppl. Fig. 1). Then, we obtained 2,300 SNPs with >10× depth coverage, including one SNP variant for every 178,595 bases of the watermelon reference genome (Table 3).

Table 2.

Summary of genotyping-by-sequencing of the 48 F₁ watermelon cultivars

Number of cultivars	48
Total number of reads	204,148,192
Total number of good barcoded reads	102,101,726
Size of all tags	257,680
Size of all SNP positions	18,888
Size of filtered SNPs	9,397

Table 3.

Number of variants and variant rate for each chromosome

Chromosome	Length (bp)	Variants (SNPs)	Variant rate (bp)
1	36,935,898	289	127,805
2	37,915,939	150	252,772
3	31,872,261	166	192,001
4	27,110,815	126	215,165
5	35,887,987	174	206,252
6	29,507,460	245	120,438
7	31,939,013	119	268,395
8	28,201,227	139	202,886
9	37,727,573	185	203,932
10	35,099,344	141	248,931
11	30,886,124	299	103,298
Total	363,083,641	2,033	178,595^z

^zMean value of genomic variant rates: one variant for every 178,595 bases in the watermelon genome (97103v2).

Population Structure and Genetic Relationships

To identify subgroups of the 48 F₁ commercial cultivars, a population structure analysis was performed using a Bayesian clustering approach as implemented in STRUCTURE (Fig. 1). The optimal number of subpopulations was 3, as determined by the calculation of ΔK (Fig. 1A). When considering the unknown parental populations, each subpopulation was ambiguously assigned to specific subspecies or varieties. To further understand the genetic relatedness and diversity of the cultivars, we built a tree based on the UPGMA algorithm with 2,300 SNPs (Suppl. Fig. 2). Most samples were assigned to group I (similarity –97.5%), and only three cultivars were exclusively clustered to group II (‘WM41’ and ‘WM37’) and group III (‘WM42’). The three cultivars were also detected in the population structure analysis without admixture setting K= 2 and little admixture setting K = 3 (Fig. 1B).

https://static.apub.kr/journalsite/sites/kshs/2022-040-01/N0130400108/images/HST_40_01_08_F1.jpg

Fig. 1.

Population structure analysis of 48 watermelon F₁ cultivars using STRUCTURE based on 2,300 SNPs. (A) Plot of ΔK values from K=1 to 10. (B) Admixture-based substructure model. Each color indicates a presumed ancestral population.

Fluidigm SNP Assay and Marker Development

To develop highly polymorphic SNP markers to discriminate cultivars, a subset of 238 SNPs was selected based on PIC > 0.3 and physical distance on each chromosome. These 238 SNPs were tested for the genotyping of newly prepared 92 F₁ cultivars (Table 1) using the Fluidigm Juno system. The success rates of amplification per F₁ cultivar ranged from 83.5% to 100%, with an average of 92.8%. The genotype calling results for four assays are summarized in Fig. 2. A subset of 141 bi-allelic SNPs was obtained, and these SNPs were used for a genetic analysis.

https://static.apub.kr/journalsite/sites/kshs/2022-040-01/N0130400108/images/HST_40_01_08_F2.jpg

Fig. 2.

Scatterplots of the Fluidigm genotyping analysis. A total of 92 samples were analyzed based on biallelic SNPs: wm96_8S21 (left panel) and wm96_9S06 (right panel). Dots are clustered as homozygote allele 1 (red), homozygote allele 2 (green), and heterozygote (blue).

Genetic Analysis and Core Marker Set Development for 92 F₁ Cultivars

For an overview of the 141 SNPs, PCA was applied (Fig. 3). The first two principal components (PC1 and PC2) accounted for 30.5% of the total variance (18.6% and 11.9%, respectively). A cluster analysis using the silhouette test resulted in K = 3 clusters, as visualized by scatter plots (Fig. 3). Most of the cultivars were separated and clustered into three subgroups.

https://static.apub.kr/journalsite/sites/kshs/2022-040-01/N0130400108/images/HST_40_01_08_F3.jpg

Fig. 3.

Two-dimensional principal component analysis (PCA) of 92 watermelon F₁ cultivars. PC1 and PC2 are shown in the scatter plot and subgroups are indicated using colored dots. Each ellipse represents the 95% confidence interval of a subgroup.

Next, with the subset of 141 SNPs, hierarchical clustering was performed using the UPGMA algorithm based on Nei’s genetic distances (Fig. 4). The cultivars were separated into three clusters using the cutree function in R: subgroup 1 (n = 30), subgroup 2 (n = 37), and subgroup 3 (n = 23). The cluster number (K = 3) was the same; however, the subgroup membership differed slightly between the PCA and hierarchical clustering methods.

https://static.apub.kr/journalsite/sites/kshs/2022-040-01/N0130400108/images/HST_40_01_08_F4.jpg

Fig. 4.

Dendrogram using the UPGMA algorithm based on Nei’s distances. The genetic distance was calculated using 114 SNPs and 92 watermelon cultivars. Subgroups 1, 2, and 3 are shown in red, blue, and green, respectively. Differently colored dots correspond to the subgroups in Fig. 3.

The subset of 141 SNPs was reduced to 96 SNPs for the analysis using the Fluidigm Juno 96.96 system. To prepare these 96 SNPs, a threshold PIC of > 0.36 and the physical position of SNPs were considered. The core set of 96 SNP markers has a heterozygosity per F₁ cultivar of 9.2% to 82.3%, with an average of 48.9%. The average values of MAF, gene diversity, heterozygosity, and PIC were 0.596, 0.473. 0.509, and 0.361, respectively (Suppl. Table 2). The UPGMA dendrogram based on the 96 SNPs did not show any differences in topology from that based on 141 SNPs.

Discussion

In this study, the Fluidigm analysis platform was used as a cost-effective approach for the development of core SNP markers to differentiate commercial watermelon cultivars. First, we obtained variants including SNPs based on genome-wide complexity reduction by selectively sequencing the barcoded fragments digested with ApeKI restriction enzymes (Poland and Rife, 2012; He et al., 2014; Jung et al., 2020). With a filtering threshold of >10× depth coverage, we obtained a number of reasonably valid SNPs (2,300 SNPs) for downstream analyses. The SNPs were analyzed based on the annotated transcripts in the reference genome using snpEff (Cingolani et al., 2012). For example, 1,203 of 4,108 transcription-related factors (24.1%) were detected in exonic regions (Suppl. Table 1), consistent with soybean GBS results, in which 20.73% of SNPs resided in exonic regions when using a methylation-sensitive ApeKI restriction enzyme (Sonah et al., 2013).

Based on the SNPs, we analyzed the population structure and genetic relationships of 48 commercial watermelon cultivars to understand their molecular diversity. The cultivars were assigned to three subpopulations in the population structure analysis. However, a dendrogram based on genetic distances (Suppl. Fig. 2) revealed little variation in the genetic background, which is a well-known characteristic of watermelon cultivars, which have mainly been bred for a high sugar content and large fruits (Wu et al., 2019; Kim et al., 2021).

Next, we focused on the selection of polymorphic SNPs to establish a core set of markers able to discriminate among domestic watermelon cultivars registered in the KSVS. We determined a subset of 238 SNPs with an even distribution based on physical distance and PIC > 0.3 in a molecular diversity analysis. The SNPs were tested in 92 F₁ cultivars using the Fluidigm Juno system. After further filtering, 141 bi-allelic SNPs were identified. The average heterozygosity was 50.9% between the F₁ cultivars. The genotyping data showed moderate genetic distances, and the SNP markers could be used to test the purity of the registered F₁ cultivars (Kishor et al., 2020). Further, the genotyping data were analyzed by a PCA and hierarchical clustering analysis, in which the 92 cultivars were classified into three subgroups. For most cultivars, consistent results were obtained by the two approaches; however, nine cultivars showed differences in group assignments between the PCA and hierarchical clustering analysis. This inconsistency can be explained by the small number of tested SNPs and narrow genetic background of the tested cultivars. The subset of 141 SNPs was reduced to a core subset of 96 SNPs for an analysis using the Fluidigm Juno 96.96 platform. The dendrogram based on 141 SNPs was identical to the dendrogram based on 96 SNPs (data not shown). With this core SNP set, we could accurately identify the registered 92 F₁ cultivars.

In this study, we reported a core set of 96 SNP markers for the identification of domestically registered 92 F₁ watermelon cultivars. Genetic analyses revealed that the domestic cultivars have low levels of genetic diversity. However, the newly developed SNP markers are useful to differentiate domestic watermelon cultivars by a high-throughput analysis platform, such as the Fluidigm genotype system. In addition, the newly established markers could protect proprietary rights for new cultivars developed by breeders at the molecular level.

Supplementary Material

Supplementary materials are available at Horticultural Science and Technology website (https://www.hst-j.org).

Number of SNPs by the variant type and genomic region
HORT_20220008_Table_1s.pdf

Genotype summary of 96-plex watermelon SNP primers based on 92 watermelon F₁ commercial cultivars
HORT_20220008_Table_2s.pdf

SNP density of 48 watermelon cultivars.
HORT_20220008_Figure_1s.pdf

Dendrogram of hierarchical clustering results for the genetic distance analysis.
HORT_20220008_Figure_2s.pdf

Acknowledgements

This work was carried out with the support of the Korea Institute of Planning and Evaluation for Technology in Food, Agriculture, Forestry and Fisheries (IPET) through the Agri-Bioindustry Technology Development Program funded by the Ministry of Agriculture, Food and Rural Affairs (MAFRA) (No. 317011-04-4-HD050) to G.P. Lee. This work was supported by the Golden Seed Project (213006055SBV20); the Ministry of Agriculture, Food, and Rural Affairs (MAFRA); the Ministry of Oceans and Fisheries (MOF); the Rural Development Administration (RDA); and the Korean Forest Service (KFS) of the Republic of Korea. This research was supported by the Chung-Ang University Graduate Research Scholarship in 2020.

References

Bostein D, White RL, Skolnick M, Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet 32:314-331

Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6:80-92. doi:10.4161/fly.19695 10.4161/fly.1969522728672PMC3679285

Food and Agriculture Organization of the United Nations (FAO) (2018) Statistical databases. Food and Agriculture Organization of the United Nations, http://www.fao.org/faostat/en/#data/QC. Accessed 21 October 2021

Galili T (2015) dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics 31:3718-3720. doi:10.1093/bioinformatics/btv428 10.1093/bioinformatics/btv42826209431PMC4817050

Ganal MW, Altmann T, Roder MS (2009) SNP identification in crop plants. Curr Opin Plant Biol 12:211-217. doi:10.1016/j.pbi.2008.12.009 10.1016/j.pbi.2008.12.00919186095

Guo S, Zhao S, Sun H, Wang X, Wu S, Lin T, Ren Y, Gao L, Deng Y, et al (2019) Resequencing of 414 cultivated and wild watermelon accessions identifies selection for fruit quality traits. Nat Genet 51:1616-1623. doi:10.1038/s41588-019-0518-4 10.1038/s41588-019-0518-431676863

He J, Zhao X, Laroche A, Lu ZX, Liu H, Li Z (2014) Genotyping-by-sequencing (GBS), an ultimate marker-assisted selection (MAS) tool to accelerate plant breeding. Front Plant Sci 5:484. doi:10.3389/fpls.2014.00484 10.3389/fpls.2014.0048425324846PMC4179701

Jung J, Park G, Oh J, Jung JK, Shim EJ, Chung SM, Lee GP, Park Y (2020) Assessment of the current infraspecific classification scheme in melon (Cucumis melo L.) based on genome-wide single nucleotide polymorphisms. Hortic Environ Biotechnol 61:537-547. doi:10.1007/s13580-020-00230-0 10.1007/s13580-020-00230-0

Kamvar ZN, Tabima JF, Grunwald NJ (2014) Poppr: an R package for genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction. PeerJ 2:e281. doi:10.7717/peerj.281 10.7717/peerj.28124688859PMC3961149

Kim M, Jung JK, Shim EJ, Chung SM, Park Y, Lee GP, Sim SC (2021) Genome-wide SNP discovery and core marker sets for DNA barcoding and variety identification in commercial tomato cultivars. Sci Hortic 276:109734. doi:10.1016/j.scienta.2020.109734 10.1016/j.scienta.2020.109734

Kishor D, Noh Y, Song WH, Lee GP, Jung JK, Shim EJ, Chung SM (2020) Identification and Purity Test of Melon Cultivars and F1 Hybrids Using Fluidigm-based SNP Markers. Hortic Sci Technol 38:686-694

Kwon YS, Oh YH, Yi SI, Kim HY, An JM, Yang SG, Ok SH, Shin JS (2010) Informative SSR markers for commercial variety discrimination in watermelon (Citrullus lanatus). Genes Genomics 32:115-122. doi:10.1007/s13258-008-0674-x 10.1007/s13258-008-0674-x

Levi A, Thomas CE, Wehner TC, Zhang XP (2001) Low genetic diversity indicates the need to broaden the genetic base of cultivated watermelon. J Am Soc Hortic Sci 36:1096-1101. doi:10.21273/HORTSCI.36.6.1096 10.21273/HORTSCI.36.6.1096

Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26:589-595. doi:10.1093/bioinformatics/btp698 10.1093/bioinformatics/btp69820080505PMC2828108

Liu K, Muse SV (2005) PowerMarker: an integrated analysis environment for genetic marker analysis. Bioinformatics 21:2128-2129. doi:10.1093/bioinformatics/bti282 10.1093/bioinformatics/bti28215705655

Mammadov J, Aggarwal R, Buyyarapu R, Kumpatla S (2012) SNP markers and their impact on plant breeding. Int J Plant Genomics 2012. doi:10.1155/2012/728398 10.1155/2012/72839823316221PMC3536327

Nasab MA, Rahimi M, Karatas A, Ercisli S (2020) Sequential path analysis and relationships between fruit yield in watermelon. Pakistan J Agri Sci 57:1425-1430. doi:10.21162/Pakjas/20.200

Nei M (1978) Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 89:583-590. doi:10.1093/genetics/89.3.583 10.1093/genetics/89.3.58317248844PMC1213855

Peterson G, Dong Y, Horbach C, Fu YB (2014) Genotyping-By-Sequencing for Plant Genetic Diversity Analysis: A Lab Guide for SNP Genotyping. Diversity 6:665-680. doi:10.3390/d6040665 10.3390/d6040665

Poland JA, Rife TW (2012) Genotyping-by-Sequencing for Plant Breeding and Genetics. Plant Genome 5:92-102. doi:10.3835/plantgenome2012.05.0005 10.3835/plantgenome2012.05.0005

Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155. doi:10.1093/genetics/155.2.945 10.1093/genetics/155.2.94510835412PMC1461096

Sonah H, Bastien M, Iquira E, Tardivel A, Légaré G, Boyle B, Normandeau É, Laroche J, Larose S, et al (2013) An improved genotyping by sequencing (GBS) approach offering increased versatility and efficiency of SNP discovery and genotyping. PLoS ONE 8:e54603. doi:10.1371/journal.pone.0054603 10.1371/journal.pone.005460323372741PMC3553054

Wickham H (2009) ggplot2: elegant graphics for data analysis. Springer, NY, USA. doi:10.1007/978-0-387-98141-3 10.1007/978-0-387-98141-3

Wu S, Wang X, Reddy U, Sun HH, Bao K, Gao L, Mao LY, Patel T, Ortiz C, et al (2019) Genome of 'Charleston Gray', the principal American watermelon cultivar, and genetic characterization of 1,365 accessions in the US National Plant Germplasm System watermelon collection. Plant Biotechnol J 17:2246-2258. doi:10.1111/pbi.13136 10.1111/pbi.1313631022325PMC6835170

Horticultural Science and Technology 원예과학기술지 ISSN:1226-8763(Print) 2465-8588(Online)

Preview

Development of SNP Markers for the Identification of Commercial Korean Watermelon Cultivars Using Fluidigm Genotyping Analysis

ABSTRACT

MAIN

Table 1.

Watermelon F1 commercial cultivars used for GBS and a Fluidigm analysis

Table 2.

Summary of genotyping-by-sequencing of the 48 F1 watermelon cultivars

Table 3.

Number of variants and variant rate for each chromosome

Fig. 1.

Population structure analysis of 48 watermelon F1 cultivars using STRUCTURE based on 2,300 SNPs. (A) Plot of ΔK values from K=1 to 10. (B) Admixture-based substructure model. Each color indicates a presumed ancestral population.

Fig. 2.

Scatterplots of the Fluidigm genotyping analysis. A total of 92 samples were analyzed based on biallelic SNPs: wm96_8S21 (left panel) and wm96_9S06 (right panel). Dots are clustered as homozygote allele 1 (red), homozygote allele 2 (green), and heterozygote (blue).

Fig. 3.

Two-dimensional principal component analysis (PCA) of 92 watermelon F1 cultivars. PC1 and PC2 are shown in the scatter plot and subgroups are indicated using colored dots. Each ellipse represents the 95% confidence interval of a subgroup.

Fig. 4.

Dendrogram using the UPGMA algorithm based on Nei’s distances. The genetic distance was calculated using 114 SNPs and 92 watermelon cultivars. Subgroups 1, 2, and 3 are shown in red, blue, and green, respectively. Differently colored dots correspond to the subgroups in Fig. 3.

Supplementary Material

Acknowledgements

References

Watermelon F₁ commercial cultivars used for GBS and a Fluidigm analysis

Summary of genotyping-by-sequencing of the 48 F₁ watermelon cultivars

Population structure analysis of 48 watermelon F₁ cultivars using STRUCTURE based on 2,300 SNPs. (A) Plot of ΔK values from K=1 to 10. (B) Admixture-based substructure model. Each color indicates a presumed ancestral population.

Two-dimensional principal component analysis (PCA) of 92 watermelon F₁ cultivars. PC1 and PC2 are shown in the scatter plot and subgroups are indicated using colored dots. Each ellipse represents the 95% confidence interval of a subgroup.