Introduction
The history of cattle in Costa Rica dates back to the late 16th century (Cordero-Solórzano et al., 2015; Martínez et al., 2012). These first cattle introduced were mainly Iberian taurine animals, which later gave rise to local Creole cattle. The first imports of modern breeds occurred in the mid-19th century, namely Devonshire and Durham cattle from England, followed by subsequent imports of other dairy breeds, such as Holstein, Jersey, and Ayrshire, by the end of the same century (Cordero-Solórzano et al., 2015).
Bos indicus cattle were introduced into the Americas approximately 150 years ago (Utsunomiya et al., 2019) and arrived in Costa Rica in the early 20th century (Vásquez-Loaiza & Molina-Coto, 2020). The Brahman breed is now the most common in the country, along with other zebu breeds such as Nelore, Gyr, Guzerá, and Indubrasil. Due to this diverse origin the current cattle population is genetically admixed.
In recent decades, numerous Bos taurus and Bos indicus breeds have been introduced into the country due to the continuous importation of genetics from different regions worldwide (Vargas & Van Arendonk, 2004). Crossbreeding has also become a common practice as a strategy to improve adaptability for production under tropical conditions. Currently, the number of cattle in Costa Rica is nearly one third of the country’s human population, and these cattle are used for beef (63.2 %), dairy (16.2 %), dual-purpose (20.5 %), or labor (0.1 % corresponding to oxen) (Instituto Nacional de Estadística y Censos (INEC), 2021; United Nations, 2024).
Interest in characterizing livestock in Costa Rica has turned to molecular analysis, which uses microsatellite markers to distinguish genetically between breeds (Martínez et al., 2015) and locations (Cordero-Solórzano et al., 2015). Microsatellites are composed of 1 to 10 nucleotides and are a subcategory of tandem repeats that constitute repetitive genomic regions (Carneiro Vieira et al., 2016). Microsatellites are useful for a broad spectrum of genomic determinations, such as kinship relation or parenthood (Webster & Reichart, 2005), criminology (Wickenheiser, 2002), genetic mapping (Ma et al., 2022), genetic diversity (Martínez et al., 2015) and population clustering from genetic structure (Dufresnes et al., 2023), among others. In cattle, microsatellites have been widely used to estimate genetic distance among populations (Agung et al., 2019).
In 2015, a representative sample (n = 1412) of the Costa Rican bovine population was genotyped using microsatellite markers (Martínez et al., 2015). Animals were assigned a priori to 16 breed types based on pedigree, phenotypic traits, and production purpose. Based on these data, several genetic diversity indexes were obtained, genetic distances between assigned breed types were measured, and clusters were formed.
An alternative approach for the genetic analysis of these genotypes is the use of genetic admixture models. Genetic admixture models represent the genetic makeup of an individual as a combination of contributions from different ancestral populations (Skotte et al., 2013). This can be inferred solely from genotype data, or by making use of additional prior information that might be available on the populations under study. Hence, admixture models offer a suitable and adaptable method for the genetic analysis of populations that comprise a significant number of individuals with varied genetic compositions.
Understanding the history and evolution of the local cattle population can provide insight into the genetic similarities between so-called pure breeds and their crosses, as well as their association with different production systems and geographical contexts within the country. Therefore, the aim of this study was to model the genetic structure of the Costa Rican cattle population by means of supervised and unsupervised genetic admixture models.
Materials and methods
Sampling and grouping
The sampling strategy was designed to obtain a representative sample of the cattle population present in different regions within the country. This approach differs from most of the previous studies describing genetic structure of cattle populations (Agung et al., 2019; Brasil et al., 2013; Edea et al., 2015; Egito et al., 2007; Kumar et al., 2003; Martínez et al., 2012; Martinez et al., 2023; Ocampo et al., 2021), which have focused mainly on characterizing specific breeds. The sampling strategy was based on the framework provided by the national Sistema Integrado de Registro de Establecimientos Agropecuarios (Integrated Registry System for Agricultural and Livestock Establishments, SIREA), which is managed by the Servicio Nacional de Salud Animal (Costa Rican Animal Health National Service, SENASA).
A total of 1,412 purebred and crossbred animals from 744 herds across Costa Rica were randomly sampled for tail hair follicles by 25 field technicians. The samples were sent to the Laboratorio de Bioseguridad (Biosafety Laboratory, SENASA) for DNA extraction and genotyping of 18 microsatellite markers. The specific details of the sampling procedure can be found in Cordero-Solórzano et al. (2015). To keep inbreeding low, the number of herds to be sampled was maximized, and samples were restricted to two animals per herd, unless several breed types were present within the same productive system.
Because the animals were randomly selected, it was not always possible to accurately determine their specific breed composition. Therefore, an approximate classification of breed types was performed based on farm records (if available), phenotype (skin, coat color, hump, ears) and the production purpose of the animal on the farm (milk, beef or dual-purpose). The number and proportion of animals sampled according to predetermined breed types are shown in Table 1, as well as the nomenclature used for this study.
Table 1 A priori defined subpopulations with associated breed types, nomenclature, sample size, and proportion of sampled bovines. Costa Rica, 2015.
| Subpopulation | Breed type | Nomenclature | Sample size | % |
| 1 | B. indicus | B_indicus | 97 | 6.9 |
| 2 | Brahman | Brahman | 263 | 18.6 |
| 3 | B. taurus | B_taurus | 22 | 1.6 |
| 4 | B. taurus×B. indicus | B_tau×B_ind | 43 | 3.0 |
| 5 | Undefined beef | aUndef_Beef | 103 | 7.3 |
| 6 | Undefined dual-purpose | aUndef_Dual | 189 | 13.4 |
| 7 | Undefined milk | aUndef_Milk | 39 | 2.8 |
| 8 | Cross beef | bCross_Beef | 68 | 4.8 |
| 9 | Cross dual-purpose | bCross_Dual | 95 | 6.7 |
| 10 | Cross milk | bCross_Milk | 64 | 4.5 |
| 11 | Guernsey | Guernsey | 49 | 3.5 |
| 12 | Holstein | Holstein | 89 | 6.3 |
| 13 | Holstein×Jersey | Hol×Jer | 34 | 2.4 |
| 14 | Jersey | Jersey | 158 | 11.2 |
| 15 | Brown Swiss | Brown_Swiss | 49 | 3.5 |
| 16 | Simmental | Simmental | 50 | 3.5 |
| Total | 1412 | 100 |
aUndef: Undefined breeds or cross breeds grouped according to production purpose. bCross: Identified cross breeds present in minor quantities and grouped according to production purpose. / aUndef: Razas o cruces no definidos agrupados según el propósito de producción. bCross: Cruces identificados presentes en cantidades menores y agrupados según el propósito de producción.
Based on this classification, breeds and crossbreeds with significant representation were assigned to seven distinct subpopulations (Table 1, subpopulations 1, 11-16). The identified breeds or cross breeds that were present in minor quantities were grouped in six subpopulations (Table 1, subpopulations 1, 3, 4, 8, 9, 10) according to genus and/or production purpose. Finally, undefined breeds or cross breeds were classified into three subpopulations according to production purpose (Table 1, subpopulations 5, 6, 7). This classification enabled the use of STRUCTURE’s functionality to incorporate prior information in the clustering process.
Genetic data analysis
All animals in the sample were genotyped for the following 18 microsatellite markers: BM1818, BM1824, BM2113, CSRM60, CSSM66, ETH10, ETH225, ETH3, ILSTS006, INRA23, MGTG4B, RM067, SPS113, SPS115, TGLA122, TGLA126, TGLA227 and TGLA53. Fifteen of these markers are among those recommended by the Food and Agriculture Organization of the United Nations (FAO), (2011) for use in genetic molecular characterization of cattle. Data analysis was performed using STRUCTURE software, which is an analysis program for investigating population structure using multilocus genotype data allele frequencies to characterize population groups and assign individuals to these groups (Pritchard et al., 2000).
STRUCTURE can detect individuals that are genetically admixed, and the genetic proportions from each population, which makes it suitable for crossbred populations. The software uses a clustering method based on simulated models from a Markov Chain Monte Carlo (MCMC) approach to estimate the posterior distribution of every individual admixture coefficient. The mean of this distribution represents an estimate of the amount of an individual’s genome that is derived from one of the inferred populations (Kumar et al., 2003).
Two alternative scenarios, unsupervised and supervised, were also compared in this analysis. The unsupervised scenario strictly relied on information provided by the microsatellite data to generate the clusters. The supervised scenario made use of the option LOCPRIOR available within STRUCTURE software. This option can be used when there is additional sample-characteristic data available to the user, including linguistic, geographical, cultural, or phenotypic information (Porras-Hurtado et al., 2013).
In this study, options LOCPRIOR and LOCISPOP were combined, which made possible the use of previously defined breed types as prior information to assist the clustering procedure, when the signal of genetic structure was relatively weak. An additional Boolean parameter (POPFLAG) was used to indicate which of the subpopulations (breeds or crossbreeds) were identified with certainty (Table 1, subpopulations 2, 11, 12, 13, 14, 15, 16). If the data suggested that these prior breed types were informative, the LOCPRIOR models from STRUCTURE used that information as “learning samples” to perform the clustering, assuming that individuals from the same group came from the same population.
For both scenarios, alternative numbers of clusters (K) were explored, ranging from a minimum of 2 to a maximum of 16. Following the recommendations of Porras-Hurtado et al. (2013), twenty replicates were simulated for each K, with a burn-in period length of 100 000 and 200 000 MCMC repeats. Allele frequencies were assumed to be correlated, as this configuration is considered best in cases of subtle population structure (Falush et al., 2003). Other parameters, such as the relative admixture level between populations (alfa), were inferred from the data, and lambda, the parameter of the distribution of allelic frequencies, was set to one, as advised by Pritchard et al. (2000). The additional parameters implemented in STRUCTURE were set to default values.
Due to computational demands, STRUCTURE analyses were conducted using a high-performance computing cluster (14 nodes, each with Xeon E5-2650v3, 64 Gb RAM, 1TB HDD) from the Research Center on Science and Engineering of Materials (CICIMA) at the Universidad de Costa Rica.
The Evanno method (Evanno et al., 2005) was used to identify the population model that best fit the genetic data as implemented by Structure Harvester software (Earl & vonHoldt, 2012). This method estimates the most likely genetic structure of the population from all the evaluated approaches, based on two criteria: LnProb(K), which provides an estimate of the posterior log probability of the data for a given K, and Delta K, which is an ad hoc quantity related to the second order rate of change of the log probability of data with respect to the number of clusters (Earl & vonHoldt, 2012), proposed by the authors as a better predictor of the real number of clusters.
The results obtained from Structure Harvester for both scenarios were further processed with CLUMPP software (Jakobsson & Rosenberg, 2007). This software deals with label switching and multimodality problems arising in population genetic cluster analyses, permuting the cluster outputs obtained from independent runs (replicates), and thus providing optimal assignment of individuals to clusters by obtaining an aligned consensus cluster.
As the number of individuals was too large (n = 1412), permutational analysis was based on the population Q-matrix in both scenarios. This matrix gives the proportion of membership of each predefined population in each of the K generated clusters. Permutational analysis was performed by using the Large K Greedy algorithm (Jakobsson & Rosenberg, 2007), with 10000 repeats, each representing a randomly selected input order. Although breed types were not used as prior information in the unsupervised scenario, the population Q-matrix was also obtained as an output, for comparison against the supervised scenario.
Optimized consensus clusters were obtained from CLUMPP software and further processed with the ggplot2 package in R to produce stacking graphs for a better perspective of similarities/differences between scenarios. To further explore the associations between a priori breed types and clusters, as well as between breed types themselves, InfoStat software was used to construct a Minimum-Spanning Tree (MST) based on the consensus individual Q-Matrix obtained from CLUMPP for K = 3, under both scenarios.
Results
Hair samples were collected in the main cattle-raising regions of the country (Figure 1). Samples from purebred dairy cattle predominated in dairy farms located in the highlands of the central region, those from crossbred cattle mainly came from dual-purpose herds frequently found in the Northern and South Pacific regions, while those from purebred B. indicus breeds were more evenly distributed across the country, mainly coming from beef or dual-purpose herds. Based on its size and wide distribution, it is possible to assert that this sample represented the structure of the national livestock population at the time of the study.
Structure Harvester
Structure Harvester results revealed differences between the unsupervised and supervised models (Figure 2). For the supervised model, there was a consistent increase in the mean of Log (K) from K = 2 to K = 7 (Figure 2A). For the unsupervised model, a similar trend was observed up to K = 5. Beyond this point, mean LnP(K) values became more variable, with consistently higher standard deviations compared to the supervised model.

Figure 2 Estimates of Log (K) (mean ± SD) (A) and Delta K criterion (mean(|ln(K)|/sd(ln(K)) (B) obtained from the Structure Harvester procedure for unsupervised and supervised admixture genetic models. Costa Rica, 2015.
In both scenarios, when K > 6, the standard deviations were greater than those at lower K values, indicating reduced cluster stability. According to the Delta K criterion proposed by the Evanno method (Figure 2B) the optimal number of clusters was three, which showed the highest Delta K value in both models. Additional lower peaks of Delta K were observed for K = 5 and K = 10 in the unsupervised model, and for K = 7 in the supervised model. Based on previous results, subsequent analysis focused on the results obtained for K < 6.
Genetic structure and clustering
For K < 6, an exploration of the solutions provided by STRUCTURE for different replicates of the same K, revealed frequent problems of “label switching”; these are replicates with similar membership coefficient estimates but with different permutations of the cluster labels. To address this issue, CLUMPP software was used to produce a consensus cluster for each K level in both unsupervised and supervised scenarios.
The genetic structure of the population when K = 2 is depicted in Figure 3. The results from both approaches (unsupervised and supervised) were similar, generating a first cluster (CL1) that was mostly represented for B. indicus and Brahman breed types, as well as for crossbred or undefined breed types used for beef or dual-purpose. The second cluster (CL2) was mainly composed of dairy breeds, dairy cross breeds, and other minor B. taurus breeds, as well as undefined breed types for dairy purposes.

Figure 3 Genetic structure of the Costa Rican bovine population, associated with breed types for K = 2, according to unsupervised (A) and supervised (B) admixture genetic models, and sorted decreasingly by proportion in CL1. Costa Rica, 2015.
A clear distinction was observed between beef (CL1) and dairy (CL2) clusters; however, it is important to note that no breed type was completely excluded from either cluster, indicating that the genetic structure of all breed types is constituted by an admixture, to lesser or greater degree.
The optimal genetic structure of the population according to the Structure Harvester was obtained when K = 3 (Figure 4). For both scenarios, a similar clustering pattern was obtained. CL1 for K = 3 remained very similar to that for K = 2. The CL2 and CL3 in K = 3 seem to be a decomposition of the previous CL2 when K = 2, where CL2 in K = 3 is highly represented by Holstein and Guernsey breeds, dairy cross breeds, Simmental and to a lesser extent, Brown Swiss and other minor B. taurus breed types. CL3 was strongly associated with Jersey and Holstein×Jersey crosses. The combined contribution of the latter mentioned crosses might be related to crossbred animals with greater proportion of one or the other breeds.

Figure 4 Genetic structure of the Costa Rican bovine population, associated with breed types for K = 3, according to unsupervised (A) and supervised (B) admixture genetic model approaches, and sorted decreasingly by proportion in CL1. Costa Rica, 2015.
The genetic structure for the unsupervised and supervised models for K = 4 also maintained similar patterns of clustering (Figure 5). CL1 was mainly related to Brahman and other minor B. indicus breeds, beef crosses, undefined beef, and dual purpose cattle, either for crosses or undefined; CL2 was more associated with Jersey and Holstein×Jersey crosses; CL3 was more represented by Simmental and Brown Swiss breeds and in a leaser way by other minor B. taurus breeds and dairy crosses; and CL4 was represented by Guernsey and Holstein breeds.

Figure 5 Genetic structure of the Costa Rican bovine population, associated with breed types for K = 4, according to unsupervised (A) and supervised (B) admixture genetic model approaches, and sorted decreasingly by proportion in CL1. Costa Rica, 2015.
For K = 5, the clustering patterns obtained for the unsupervised and supervised approaches began to differ (Figure 6). An evident difference between clustering patterns in K = 5 was that, in the unsupervised scenario, the main representation for B. indicus, Brahman, beef cross and undefined breeds, as well as dual purpose (cross and undefined) were divided into two clusters (CL1 and CL2) with comparable proportions of individuals, while in the supervised model those classifications remained mostly represented in CL1, with CL2 not showing a large association with any of the breed types. CL3 was associated with Jersey and Holstein×Jersey breed types, CL4 was related to Simmental and Brown Swiss breeds, and CL5 was mostly composed of Holstein and Guernsey breeds.

Figure 6 Genetic structure of the Costa Rican bovine population, associated with breed types for K = 5, according to unsupervised (A) and supervised (B) admixture genetic model approaches, and sorted decreasingly by proportion in CL1. Costa Rica, 2015.
For K > 5, newly formed clusters lacked clear association with predefined breed types. Preidentified breeds such as Holstein, Jersey, Guernsey, and Simmental remained heavily associated with specific clusters in both scenarios, although with significant proportions from other clusters as well. Redundant clusters were also observed, as two clusters might have almost identical genetic structure regarding the relative contribution from the predefined breed types.
This pattern is also consistent with the decrease in Log(K) observed in Figure 2A. For the supervised scenario, another major difference was that the cluster conformed by Brahman and other minor B. indicus breeds (CL1 in Figures 2 to 6) remained almost unchanged from K = 1 to 16, while for the unsupervised scenario, this cluster was divided from K = 5 onward.
A minimum spanning tree was built to explore the relationships between predefined breeds and clusters in a two-dimensional plane (Figure 7). The results for both scenarios were practically identical. CL1 was located next to beef animals (B_ind, Brahman, Cross_Beef, Undef_Beef and Undef_Dual). CL2 was close to Holstein and Guernsey breeds, dairy cross breeds, Simmental, Brown Swiss and other minor B. taurus breeds, while CL3 was oriented in the same direction as Jersey, Holstein×Jersey and dairy cross breeds.

Figure 7 Minimum spanning tree based on breed types for the unsupervised (A) and supervised (B) models with K = 3. Costa Rica, 2015.
The horizontal axis mainly separates B. taurus breeds, located mostly on the left, from B. indicus breeds, located to the right. The intermediate position of the crossbred type of cattle is also evident. The vertical axis also provides further differentiation between Jersey and Holstein×Jersey cross, located below, compared to Holstein and other B. taurus and dairy cross breeds, located above. It is likely that the Holstein×Jersey group is composed of animals with a greater influence of Jersey breed, as this breed is more common within the country.
Discussion
The genetic characterization of cattle populations in Costa Rica provides essential information for guiding national breeding and conservation strategies. By identifying the genetic diversity, population structure, and unique genetic resources within and between breeds, this study supports informed decisions to enhance productivity while preserving valuable genetic variation. Such knowledge can help prioritize the conservation of rare or locally adapted breeds, optimize crossbreeding programs, and ensure the long-term resilience of the national herd in the face of environmental and market challenges.
Most studies describing the genetic structure of cattle have focused on specific breeds of interest (Agung et al., 2019; Brasil et al., 2013; Edea et al., 2015; Egito et al., 2007; Kumar et al., 2003; Martínez et al., 2012; Martinez et al., 2023; Ocampo et al., 2021), with only a few considering crossbred cattle (Gebrehiwot et al., 2020). In contrast, real cattle populations are often diverse, due to historical events of admixture or recurrent crossbreeding. In these circumstances, the term “breed” becomes abstract and uninformative when it is desired to describe the genetic structure of a real population.
In this study, a representative sample of cattle taken from many herds distributed throughout the country was analyzed. This sample included a large proportion (43 %) of crossbred or undefined breed cattle. In this context, the use of admixture genetic models to describe the genetic structure of a population is conceptually more appropriate, especially when compared to phylogenetic trees based on predefined populations, as performed in previous studies on the same population (Cordero-Solórzano et al., 2015; Martínez et al., 2015). This analysis yielded additional information about the genetic structure and relative composition of local cattle, revealing that there is substantial genetic admixture between the a priori defined breed types, and thus providing a more complete and in-depth analysis, which is also more consistent with the historical development of local cattle.
None of the breeds was exclusively assigned to a specific single cluster. This phenomenon provides evidence of an admixture population, regardless of the breed type, which is reasonable, due to the widespread origins of the cattle population analyzed. This pattern was observed in results obtained from both, the unsupervised and supervised models.
Although B. indicus and B. taurus were separately domesticated, a small number of European cattle breeds still display shared ancestry with indicine cattle (Upadhyay et al., 2019). In fact, the B. indicus population expanded in the Americas due to the extensive use of European B. taurus cows, particularly the Creole breeds, which descended from Iberian taurine animals brought to the continent after 1492, followed by repeated backcrossing to B. indicus bulls (Utsunomiya et al., 2019).
For low values of K the study revealed strong agreement between the genetic structure obtained from unsupervised and supervised scenarios. At this level, there was also consistency between a priori breed types and the actual determined genetic structure. This provides evidence to support that prior classification based on information such as phenotypes and production purpose was useful for differentiating, at a large scale, among genetic subpopulations present within the country.
The finding that K = 3 is the optimal clustering level, according to the Delta K criterion, might seem somewhat low. However, genetic clustering algorithms are parsimonious in the sense that they choose the smallest number of ancestral populations that can explain the most salient variation in the data (Alexander & Lange, 2011; Lawson et al., 2018). Parsimony is desirable because it leads to more easily interpretable and probably more realistic parameter estimates (Alexander & Lange, 2011). In fact, for real data the assumption that there is an optimal true value of K is always incorrect, rather, the question is whether the model is a good enough approximation to be practically useful (Lawson et al., 2018).
The assignment of clusters at low levels of K was readily biologically interpretable, as the divergence between indicine cattle and taurine cattle is estimated to have occurred approximately 250 000 years ago (Upadhyay et al., 2019). Most studies describing the genetic structure of cattle populations also reported clear distinctions between B. indicus and B. taurus breeds (Brasil et al., 2013; Edea et al., 2015; Egito et al., 2007; Gebrehiwot et al., 2020; Martínez et al., 2012; Martinez et al., 2023).
For K = 3, the partitioning of B. taurus breeds into two groups can be explained by the presence of several dairy breeds within the sample, which were declared “identified” (POPFLAG = 1) for the supervised scenario. These are genetically more homogeneous and therefore can be readily differentiated from other subpopulations. Interestingly, these breeds were separated almost identically by unsupervised models, which supports the efficiency of the algorithm in confirming their genetic identity.
B. indicus remained separate in both models (unsupervised and supervised), which might indicate less genetic variability within B. indicus than within B. taurus. A possible reason for this clustering behavior might be the more recent introduction of B. indicus into the country (Cordero-Solórzano et al., 2015).
As the number of groups (K) increased, some differences became evident between supervised and unsupervised scenarios, which are likely due to the presence of a large proportion of individuals with highly heterogeneous genetic compositions, mainly those identified in the present study as crossbred or undefined. At higher levels of clustering, within-group variance becomes large compared to between-group variance and putative subtle differences among groups become difficult to detect by clustering algorithms. The consistent reduction in Log(K) values for K > 6 (Figure 1) for both scenarios also suggested that the genetic differences among these newly conformed groups were not as obvious.
Sample size can also play a role in these results; specifically, groups that contain fewer samples or have undergone little population-specific drift of their own are likely to be fit by STRUCTURE as mixes of multiple drifted groups, rather than assigned to their own ancestral population (Lawson et al., 2018). In addition, when differences are less obvious, newly conformed clusters are based on an approximately equal distribution of unassigned samples, which gave rise to redundant clusters that were observed in this study.
The use of supervised clustering procedures in the current study contributed to more stable clustering patterns, especially for higher values of K. This is important because, under real circumstances, some subpopulations are known and several reference individuals from each population are available (Alexander & Lange, 2011). In these cases, ancestry estimates can be estimated more accurately because there is less uncertainty in allele frequencies (Alexander & Lange, 2011; Gebrehiwot et al., 2020). This can be clearly appreciated in Figure 1A, where the means of log (K) for the supervised model showed a more homogeneous pattern and consistently lower standard deviations.
Crossbred and undefined breeds had a weaker relation with clusters, compared to breed types defined as pure breeds. This becomes evident from their distant position from each cluster in the minimum spanning tree (Figure 6), placing crossbred and undefined breeds far away from the clusters. Despite the distance of those categories from clusters, orientations in the minimum spanning tree of undefined or crossbred cattle according to purpose (beef or milk) were consistent with clusters containing breeds typically used for those purposes.
Microsatellites are one of the best marker techniques for the analysis of genetic structure in a population (Abdul Muneer, 2014); however, there have been reports of troubles inferring admixture populations when individuals from the founding subpopulations were not included in the analysis and the microsatellite considered too few loci (eleven) (Vaughan et al., 2009). The admixture models implemented in STRUCTURE allow the detection of admixture at any time point (Anderson & Thompson, 2002), which makes this software suitable for the analysis of admixed populations. Many real population histories, however, are not neatly separable into divergence and admixture phases but the methods can be applied to any data set, producing ancestry bar plots (Lawson et al., 2018).
There are more recent genetic tools, including single nucleotide polymorphisms (SNP), which have shown similar patterns to microsatellite markers (Coates et al., 2009), but might be more precise in clustering analysis of populations (Zimmerman et al., 2020) and more accurate in representing the distribution of genetic diversity among individuals (Pérez-González et al., 2023). In addition to the aim of the study, the selected genetic tool will depend mainly on economic solvency, with a greater density of markers (i.e. SNPs) being more expensive than microsatellite markers (Puckett, 2017).
Conclusions
The use of admixture models provided additional insights on the genetic structure of cattle in Costa Rica, compared to previous studies on the same population. The models were able to describe the complex genetic structure of the local population. This approach constitutes a valuable resource in the study of genetically admixed populations, which is the prevalent case in commercial livestock populations in the tropics.
This methodology makes also possible to use prior information, such as phenotypes or production purpose, which in this study contributed to obtaining more stable clustering patterns. This reveals the importance of considering genetic-environmental interrelationships when describing genetic structure of cattle population.
All the a priori breed types showed some degree of genetically admixed composition, revealing that the so-called pure breeds are nothing but a consensus term of breed types showing a greater proportion of microsatellite markers that set them into a defined cluster with similar purposes and phenotypes, without entirely pertaining to that group, due to the nature of the mixed composition of their genome.















