Abstract
Soybean (Glycine max) is a major global source of protein and oil, but its narrow genetic base limits breeding progress. Pan-genome analysis captures the full gene repertoire of a species, including structural variants (SVs) absent from single reference genomes. Here, we constructed a soybean pan-genome by de novo assembly of 26 diverse accessions representing wild, landrace, and elite genotypes, using long-read sequencing and optical mapping. We identified 89,542 SVs, including presence-absence variants (PAVs), copy number variants (CNVs), and inversions, of which 34% were novel relative to the Williams 82 reference. Integration with transcriptomic data from developing seeds and pathogen-infected tissues revealed 1,247 SVs overlapping gene bodies, with 312 SVs significantly associated with seed protein and oil content in a genome-wide association study (GWAS) across a panel of 421 lines. Notably, a 15-kb deletion in the promoter of a lipid transfer protein gene (GmLTP) was linked to increased oil content (P = 2.3 × 10⁻⁸), while a tandem duplication encompassing three NBS-LRR genes conferred resistance to soybean cyst nematode (SCN) in wild accessions. Our findings demonstrate that SVs are major drivers of agronomic trait variation and provide a valuable resource for marker-assisted breeding and genomic selection.