ABSTRACT
Objective
Human leukocyte anti-gen (HLA) molecules are vital molecules because they present foreign anti-gens to immune cells and trigger the immune response. Thus, they are associated with diseases, and they play crucial roles in solid organ transplantation and hematopoietic stem cell transplantation. This study aimed to determine the HLA allele and three-locus haplotype frequencies of the Turkish population. This is the first study in Türkiye to include such a large, sizable population.
Methods
The study included 6,039 bone marrow donors. Sequence-specific oligonucleotide probes and sequence-specific primers were employed for HLA typing. After excluding related individuals, 1222 A, 1229 B, 930 DRB1, 102 C, 227 DQB1, 163 DQA1 allele couples, and 336 ABDR haplotype couples were analyzed using Arlequin v3.5 Software and the R package programs. The results were compared with those from the border countries.
Results
The most frequently observed HLA alleles were A*02, B*35, C*07, DRB1*11, DQA1*01, and DQB1*03. The most prevalent ABDRB1 haplotype was A*24 B*35 DRB1*11 in our population. For all loci, the population was in Hardy-Weinberg equilibrium. A linkage disequilibrium analysis revealed a strong link between the A-B, A-DRB1, and B-DRB1 loci. The results of the two methods were similar.
Conclusion
These findings are consistent with populations in bordering nations. Our findings will support numerous therapeutic applications, including drug discovery, disease therapy, and organ transplantation. The Arlequin and R programs are suitable for analyses.
INTRODUCTION
The human leukocyte anti-gen (HLA) gene on chromosome 6p21 encodes HLA molecules, which are cell surface proteins that are crucial for anti-gen presentation to T lymphocytes.1 HLA gene Regions are the most polymorphic on the human genome, and HLA molecules expressed from these Regions play critical roles in the immune system.2 HLA is divided into two classes: HLA Class 1 (HLA-A, HLA-B, and HLA-C) and HLA Class 2 (HLA-DR, HLA-DQ, and HLA-DP).3 Type 1 HLA molecules present the anti-genic peptides to CD8+ T-cells, whereas type 2 HLA molecules present the anti-genic peptides to CD4+ T-cells.4
HLA typing is commonly used in clinics for hematopoietic stem cell and solid organ transplantation. Furthermore, because of its roles in immune response, it is linked to a variety of diseases. Thus, community-based studies on HLA allele frequency may be valuable for detecting genetic variation and population origin, identifying vaccine-safe epitopes, and discovering disease associations.5 These studies also aid in the creation of HLA databases. The frequencies of alleles and haplotypes in various populations have been the subject of numerous studies. Research conducted both nationally and internationally has linked specific alleles to distinct clinical symptoms.6
The Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium (LD) are essential concepts in population genetics that explain genetic diversity and inheritance patterns. HWE establishes a standard for forecasting genotype frequencies in a static population, whereas LD assesses the non-random correlation of alleles across several loci. Collectively, they provide insights into genetic composition and evolutionary mechanisms.7 Deviations from HWE may signify genotyping inaccuracies or the existence of uncommon variations, especially within extensive genomic datasets.8 LD quantifies the extent to which alleles at distinct loci are co-inherited, influenced by factors including population structure and selection. The precise measurement of LDs is essential for applications such as disease association studies, and novel approaches have enhanced LD estimation under HWE conditions. The correlation between HWE and LD was significant, as variations from HWE might influence LD assessments and hence confound genetic investigations.9
In this study, we investigated the frequencies of alleles and haplotypes in bone marrow (BM) donors. We compared the most frequently observed alleles and haplotypes with those from neighboring nations. Furthermore, we evaluated two statistical tools (Arlequin and R) to analyze the data.
METHODS
Samples
Molecular HLA typing data from 6039 healthy Turkish people were retrospectively analyzed. Between 2009 and 2018, these individuals were BM donors who applied to the tissue typing laboratory at the hospital. Related individuals were excluded to ensure that genetic variation was assessed objectively and in a population-representative manner. In total, 1222 A, 1229 B, 930 DRB1, 102 C, 227 DQB1, 163 DQA1 allele couples, and 336 ABDR haplotype couples were analyzed. The results were compared with those from bordering countries.
Informed consent was provided by all patients. In accordance with the Declaration of Helsinki, our İzmir Katip Çelebi University Non-interventional Clinical Research Ethics Committee approved the study (decision no: 0097, date: 23.03.2023).
Molecular Human Leukocyte Anti-gen Typing
Sequence-specific oligonucleotide probe (SSOP) and sequence-specific primer (SSP) methods were used for the molecular typing of HLA. Deoxyribonucleic acid (DNA) was isolated using an EZ1 DNA Blood Kit (Qiagen, Hilden, Germany) via an automatic system (Geno M-6, Qiagen). Commercial SSOP (Lifecodes, Rodermark, Germany) and SSP kits (Olerup, Stockholm, Sweden) were used for the procedures. The tests were performed according to the manufacturer’s instructions. Briefly, the Regions were amplified by polymerase chain reaction (PCR), and the amplicons were hybridized with specific probes for the SSOP method. The results were analyzed using a Luminex fluoroanalyzer. For the SSP method, samples were amplified using SSPs by PCR, and the amplicons were evaluated on agarose gels using score software.
Statistical Analysis
All statistical analyses, including total allele frequencies, haplotype frequencies LD, and HWE tests, were conducted using R 4.2 programming language with the pegas and adagen packages, and Arlequin v3.5.10-13
R is a robust and versatile programming language and software environment, specifically designed for statistical analysis, data science, and visualization. Arlequin, on the other hand, is a software specifically designed for the examination and interpretation of genetic data. The tool provides a diverse range of statistical methodologies and tests to facilitate the comprehension of genetic population structure, genetic diversity, and demographic history. The congruent outcomes of these two systems are of significant importance for their application, particularly in the analysis of HLA allele and haplotype frequencies. This comparison enables the assessment of the trade-offs between a versatile, general-purpose tool like R and a specialized software platform for population genetics such as Arlequin. This facilitated the identification of the most efficient, precise, and user-centric method for evaluating HLA allele and haplotype frequencies in our study.
RESULTS
In total, 19 different HLA-A, 28 different HLA-B, 14 different HLA-C, 13 different HLA-DRB1, 6 different HLA-DQA1, and 5 different HLA-DQB1 alleles were identified. The allele with the highest frequency was HLA-A*02 [allele frequency (AF): 0.195990], followed by HLA-A*24, HLA-A*03, HLA-A*01, and HLA-A*11 (AF: 0.147709, 0.126432, 0.119885, and 0.087561), respectively. The five initial HLA-B alleles that were most frequently found were HLA-B*35, HLA-B*51, HLA-B*44, HLA-B*18, and HLA-B*38 (AF: 0.174938, 0.128152, 0.080146, 0.064279, and 0.049227), respectively. The first five frequently observed HLA-C alleles were HLA-C*07, HLA-C*04, HLA-C*12, HLA-C*15, and HLA-C*06 (AF: 0.230392, 0.137254, 0.112745, 0.102941, and 0.083333), respectively.
The highest frequency was HLA-DRB1*11 (AF: 0.229569). Subsequently, HLA-DRB1*04 (AF: 0.147311), HLA-DRB1*15 (AF: 0.110215), HLA-DRB1*03 (AF: 0.095698), and HLA-DRB1*13 (AF: 0.093010). The most common first five HLA-DQA1 alleles were HLA-DQA1*01, HLA-DQA1*05, HLA-DQA1*03, HLA-DQA1*02, and HLA-DQA1*04 (AF: 0.365030, 0.325153, 0.193251, 0.098159, and 0.012269), respectively. The most frequently observed HLA-DQB1 allele was HLA-DQB1*03 (AF: 0.458149), followed by HLA-DQB1*05 (AF: 0.202643), -DQB1*06 (AF: 0.178414), -DQB1*02 (AF: 0.145374), and -DQB1*04 (AF: 0.015418). We found the same ratios using the R program (Table 1).
We analyzed the frequencies of 336 three-locus haplotypes. Our population was in HWE for all loci using the two methods (Table 2). The haplotype frequencies were similar between the two methods. The most prevalent haplotype in our population was A*03 B*44 DRB1*04 had the highest frequency (Table 3).
The LD analysis was performed using only the Arlequin software program. According to our LD analysis, there was a significantly strong association between A-B, A-DRB1, and B-DRB1 loci (p<0.001) (Table 4).
DISCUSSION
In this study, we evaluated HLA allele and haplotype frequencies in the Aegean Region of the Turkish population. This is the first study conducted in the Aegean Region of Türkiye and the first to evaluate the HLA allele and haplotype frequencies of a large Turkish population. In addition, we also compared the two statistical methods for HLA allele and haplotype analyses.
In our study population, the most common HLA-A allele frequency was A*02, followed by A*24. The top two alleles were the same in various Turkish populations, including Greece, Bulgaria, Georgia, and in Iran.14-21 However, only the top allele was the same in the Armenian population.22 The compatibility of the population in the Aegean Region with surrounding countries may be attributed to past migration. We found that the two most frequently observed HLA-B alleles were B*35 and B*51. The HLA-B alleles with the highest frequencies (AF: 0.18) were B*05 and B*35 in the Iranian population.3
The outcomes were comparable for different Turkish populations, Armenia, Georgia, Iran, and Bulgaria.15-17,19-22 In the Greek and Turkish populations, the most common allele was HLA-B*51, followed by B*35.17, 18 This discrepancy may be attributable to their geographic location. While the most frequent alleles may differ, the predominant B alleles are consistent among populations in the Greek and Aegean Regions. The migration of individuals from Greece and Bulgaria to Türkiye, particularly to the Aegean Region, has played a significant role in fostering diversity throughout history.
The most common HLA-C alleles were HLA-C*07 and HLA-C*04, respectively, in our study cohort. However, the results were contrary in the Syrian and Greek populations.3, 18 The most common two alleles were HLA-C*12 and C*07 in the Iranian population.21 The difference in the most common allele may be due to the location of Iran in the Eastern Anatolia Region of Türkiye because our study cohort included the Aegean Region of Türkiye. Despite the replacement of the two most prevalent C alleles in the Greek and Syrian populations, these alleles remained identical to those found in the nations from which the immigrants originated. Due to the significant migration from Syria to the Aegean Region in recent times, the allele frequencies in the population have become comparable. HLA-DRB1*11 and DRB1*04 had the highest frequencies in our study group, respectively. The results were in accordance with previous Turkish population studies and in Syrian population.15-17,23 However, the only highest frequency was the same in border countries as observed in DRB1*11 and DRB1*15 in Armenian and Iranian populations, in DRB1*11 and DRB1*13 in Georgian populations, and in DRB1*11 and DRB1*16 in Greek populations.18, 20-22 It seems that at least one DRB1 allele is identical to those found in nearby nations. This is contingent on the historical migration patterns of adjacent nations. Among the HLA-DQA1 alleles, our results were similar in the Iranian population.21 However, we could not compare our results with other studies on border populations because the DQA1 locus was not investigated in these studies. The most frequently observed HLA-DQB1 alleles were similar in Iranian, Georgian, Greek, and in other Turkish population studies.17, 18, 20, 21
We also evaluated the three-locus haplotype frequencies of ABDRB1 in our population. The most common haplotype differed from Iranian, Armenian, and Georgian populations.20-22 Our most prevalent haplotype frequency was also different from another Turkish population in Central Anatolia.24 It is unsurprising that it is identical to the Greek population. These populations have coexisted for ages and have mutually contributed to genetic variety. HLA-A*02, B*35, C*07, DRB1*03, and DQB1*03 were the most common alleles in various Caucasian populations.25-28 Accordingly, this population may have an ancestral relationship with Caucasians. When we considered distant populations, the HLA allele frequencies were related to the Lebanese, Vietnamese, Argentinian, Kazakhstani Russian, Maltese, and Korean populations.29-34 The similarity of our HLA allele and haplotype frequencies with other populations can be attributed to historical migrations, genetic admixtures, shared environmental stresses, and evolutionary mechanisms. The balancing selection mechanism and geographical proximity were the primary factors that contributed to these similar characteristics.
The correlation between HLA allele and haplotype frequencies and illness risk is intricate and diverse, as demonstrated by numerous studies. The HLA area is essential for immunological response, with certain alleles potentially enhancing disease susceptibility or providing protection. The interaction is emphasized by the detection of many connections between HLA variants and various autoimmune and infectious illnesses. HLA haplotypes exhibit pleiotropic correlations with numerous diseases, suggesting that certain alleles can continuously affect the risk of different diseases.35 For example, HLA-B*08:01 and DRB1*03:01 are strongly correlated with Graves’ disease, although other alleles such as HLA-B*07:02 have protective benefits.36 HLA genes are associated with more than 100 disorders, including multiple sclerosis and type 1 diabetes, in which particular alleles may elevate or diminish disease risk. Recent research indicates that multiple sclerosis is linked to DRB1 and DQB1 alleles, which are prevalent in our country.37 A further study conducted in Russia revealed that the prevalent HLA-C, -DRB1, and -DQB1 alleles in the country were correlated with rheumatoid arthritis, type 1 diabetes, and psoriasis.38
The small variations between the two methods for HWE analysis may be due to rounding during the calculation stage. However, these variations are minimal and have no impact on the p value’s significance. Both R and Arlequin are capable of performing HLA allele and haplotype frequency, HW, and LD analyses, but they cater to different user needs and expertise levels. Arlequin is a specialized population genetics tool with user-friendly interface. This makes it accessible to researchers with limited programming experience because analyses can be performed with minimal technical input. In contrast, R is a versatile statistical platform that requires additional packages (e.g., genetics, haplo.stats, or LDheatmap). Although this approach offers significant flexibility and allows for the customization of analysis pipelines, it requires advanced programming skills and familiarity with statistical methods. For researchers who are not proficient in R, the learning curve is steep, potentially limiting its immediate usability for LD analysis. Despite these challenges, R’s open-source nature and ability to integrate diverse statistical functions makes it an attractive option for experienced users seeking a tailored analytical workflow. By contrast, Arlequin provides an out-of-the-box solution for researchers focusing on LD and other population genetics metrics, particularly when ease of use is a priority.
Study Limitations
Although our study includes a large cohort of BM donors, it may not fully represent the entire genetic diversity of the Turkish population, as the donors were primarily selected from a specific registry. While we analyzed multiple HLA loci, some additional loci (e.g., HLA-DP) were not included, which could provide further insights into genetic associations and disease susceptibility. Although we compared our results with populations from bordering countries, differences in sampling methodologies and HLA typing techniques could influence the comparability of the findings.
CONCLUSION
In conclusion, we found that, as expected, the frequencies of the HLA alleles were comparable between the bordering countries. Despite the similarities in allele frequencies, the haplotype frequencies were dissimilar. Knowing a population’s allele and haplotype frequencies is crucial for understanding its relationships to transplantation, disease resistance, and therapeutic development. Our findings may assist clinicians in diagnosing disorders, planning transplantation, and creating new medications. Furthermore, numerous statistics software tools are employed to determine allele and haplotype frequencies. Our results show the use of the two Arlequin and R statistical programs to analyze HLA diversity.