Enhanced comment feature has been enabled for all readers including those not logged in. Click on the Discussion tab (top left) to add or reply to discussions.

Genetic Evaluation: Difference between revisions

From BIF Guidelines Wiki
(Redirected page to Category:Genetic Evaluation)
Tag: New redirect
 
(27 intermediate revisions by 3 users not shown)
Line 1: Line 1:
=EPD=
#REDIRECT [[:Category:Genetic Evaluation]]
==Utility==
Predicting genetic merit for breeding animals is one of the oldest practices that mankind has used to improve food and fiber production.  Identifying animals for [[Selection and Mating | selection and mating]] has evolved from visual appraisal to sophisticated analytical models for predicting [[Glossary#A | additive genetic]] merit of animalsAdditive genetic merit is the effect of genes that are passed from parent to offspring that can be used to make genetic progress through selection.
The estimation of breeding values, which reflect the value of an animal as a parent for the next generation, or [http://guidelines.thetasolutionsllc.com/index.php/Selection_and_Mating#Expected_Progeny_Differences| Expected Progeny Differences (EPD)], which are simply half of a breeding value, was a major advancement in the ability to select animals to fit production goalsPrior to the development of EPDs the primary method for genetic improvement was some form of subjective visual appraisal<ref name="milestone" <ref>Golden, BL, DJ Garrick, and LL Benyshek.  2009.  Milestones in beef cattle genetic evaluation.  J Anim Sci.  87(E. Suppl.):E3-E10.</ref>.  Since the development of methodology to implement Genetic Evaluation in the beef industry (launched in the 1970s)<ref name="milestone" />, EPDs have been the gold standard for genetic selection.  Regardless of their associated [http://guidelines.thetasolutionsllc.com/index.php/Accuracy accuracy] value, they are the best selection tool that producers have to improve genetic merit in a single trait, though [http://guidelines.thetasolutionsllc.com/index.php/Selection_and_Mating#Indices_.28Suggested_writer:_Mike_MacNeil.29| indices] incorporate EPD information and are the best tools for multi-trait selectionNevertheless, there is often confusion surrounding the best tools and information on which to make selection decisions.  
Phenotypes for quantitative traits are a combination of influences from both genetics (additive, dominance, epistatic) and the environment (permanent and temporary).  Alternatively, we can write this as an equation as follows:


P=μ+G+E
In North America, the standard for identifying genetic merit of breeding animals is [[Expected Progeny Difference | expected progeny differences (EPDs)]].
With very few ''ad hoc'' exceptions, EPDs are produced for North American beef cattle using models based on [[Best Linear Unbiased Prediction]].  Consequently, [[BIF recommends the use of EPD]] when available.


where P represents phenotype, μ represents the average phenotypic value for all animals in the population, G is the genotypic value of the individual for the trait and E represents the environmental effect on the animal’s performance<ref name="Bourdon" <ref>Bourdon, RM. 2000. Understanding Animal Breeding.  Second edition.  Prentice Hall, Upper Saddle River, NJ.</ref>.  If we expand the equation to define genetic and environmental effects on the phenotype, we can write the equation as follows:
While not all [[Economically Relevant Traits | economically relevant traits]] in all situations and in all North American breed registries have EPDs available, the number of [[Traits | traits and trait components]] that have EPDs has increased dramatically.
Nearly all the major North American beef cattle breed organizations have migrated to weekly genetic evaluations, eliminating the need for [[Expected Progeny Difference#Interim EPDs | interim EPDs]].


P=μ+A+D+I+E<sub>P</sub>+E<sub>T</sub>+GxE
Most of the improvements in the technologies used in genetic evaluation have been motivated by an opportunity to increase [[Accuracy | accuracy of prediction]] and reduce [[Prediction Bias | bias]]. For example, the advent of [[Genotyping | genomic information]] to enhance the [[Accuracy | accuracy]] of prediction has resulted in EPDs for most traits being produced using either [[Single-step Genomic BLUP]] or [[Single-step Hybrid Marker Effects Models]].  The BIF has developed an extensive set of recommendations for the inclusion of [[Genomic Evaluation Guidelines | genomic data in genetic evaluations]].


where P and μ are as previously defined, A represents additive genetic effects, D represents dominance, I represents epistasis, E<sub>P</sub> represents permanent environmental effects, E<sub>T</sub> represents temporary environmental effects, and GxE represents interactions between genotype and environment<ref name="Bourdon"/><ref>Pierce, BA. 2016.  Genetics Essentials.  Third edition.  MacMillan, New York, New York.</ref>.
In commercial cattle production, EPDs for [[Economically Relevant Traits | economically relevant traits]] should be combined with appropriate selection tools such as [[Selection Index | selection indices]] to make optimal genetic progress toward achieving [[Breeding Objectives | breeding objectives]]. It must be remembered that EPDs are just tools to make selection decisions to make genetic progress and manage certain genetic risks.


EPDs describe the additive genetic merit of an individual and reflect its value as a parent.  It is important to remember that environmental influences are not heritable, and the only genetic influence that is known to be stably inherited at this time is additive genetic variation, though dominance can be managed through crossbreeding systems.  EPDs and indices are the best tools for genetic selection and do reflect average progeny performance<ref>Thrift, FA and TA Thrift. 2006.  Review:  Expected versus realized progeny differences for various beef cattle traits.  Prof Anim Sci. 22:413-423.</ref><ref>Kuehn, LA and RM Thallman.  2017.  Across-breed EPD tables for the year 2017 adjusted to breed differences for birth year of 2015.  Proceedings of the Beef Improvement Federation Annual Meeting and Research Symposium. Pages 112-144.</ref>.  
In some special situations in seedstock production breeders may need to make selection decisions using EPDs that are not [[Economically Relevant Traits | economically relevant traits]] in commercial settings in order to enhance the marketability of their breed or breeding animals. For example, if a breed has a perceived defect that is limiting that breed organizations' members from expanding their market for selling germplasm, then selection to improve that characteristic should be included in the seedstock breeder's [[Breeding Objectives | breeding objectives]].


The challenge with selection on measures of phenotype is that they include both genetic and environmental effects, even if weights are adjusted and/or ratios (which limit comparisons to within contemporary groups) are utilized.  When selection decisions are made on these metrics, selection emphasis is also placed on nongenetic factors, which reduces the efficacy of selection and reduces genetic progress.  Superiority of selection using EPDs (or breeding values) as compared to phenotypes has been demonstrated<ref>Gall, GAE and Y Bakar.  2002.  Application of mixed-model techniques to fish breed improvement:  analysis of breeding-value selection to increase 98-day body weight in tilapia.  Aquaculture.  212(1-4):93-113.</ref><ref>Kuhlers, DL and BW Kennedy.  1992.  Effect of culling on selection response using phenotypic selection or best linear unbiased prediction of breeding values in small, closed herds of swine.  J Anim Sci.  70(8):2338-2348.</ref><ref>Belonsky, GM and BW Kennedy.  1988.  Selection on individual phenotype and best linear unbiased predictor of breeding values in a closed swine herd.  J. Anim Sci.  66:1124-1131.</ref><ref>Hagger, C. 1991.  Effects of selecting on phenotype, on index, or on breeding values, on expected response, genetic relationships, and accuracy of breeding values in an experiment.  J Anim Breed Genet.  108:102-110.</ref>. 
Critical to genetic evaluation is having high-quality estimates of [[Variance Components | variance components]].  Knowing the heritabilities and correlations of the traits and performing [[Multiple Trait Evaluation | Multiple-Trait Evaluation]] enhances the accuracy of prediction and reduces [[Prediction Bias | bias]] from effects such as incomplete reporting. Equally critical is understanding the [[Connectedness | connectedness]] of the data in a particular data set. Disconnected data can lead to invalid comparisons.
EPDs also simplify selection decisions.  Selection using phenotypes can involve the individual’s own phenotype as well as phenotypes on relatives (including progeny, parents, and siblings, as an example).  With Genetic Evaluation, all of this information is combined and weighted appropriately in a single value, the EPD, which simplifies selection.  This same value is even more relevant in the genomics era, because genomic testing provides another source of information for selection.  The Beef Improvement Federation recommends using genomically-enhanced EPDs, as opposed to using disjoined marker scores and EPDs separately, as the best method for utilizing genomic data for selection<ref>Muir, WM.  2007.  Comparison of genomic and traditional BLUP-estimated breeding value accuracy and selection response under alternative trait and genomic parameters. J. Anim Brdg Genet.  124(6):342-355.</ref>.  Genetic Evaluation methodologies are always evolving and improving, but all of these methods incorporate all available data on an animal into EPD prediction, including genomic data, and weight it appropriately so that there is a single metric for genetic selection that represents the best estimate of that animal’s genetic merit using all available data. 
 
References: 
{{reflist}}
 
----
 
==Basic Models==
===BLUP===
===Single-step Genomic BLUP===
Single-step genomic BLUP (ssGBLUP) <ref name=Legarra> Legarra, A., I. Aguilar, and I. Misztal. 2009. A relationship matrix including full pedigree and genomic information. J. Dairy Sci. 92:4656-4663. </ref><ref name=Aguilar> Aguilar, I., I. Misztal, D. L. Johnson, A. Legarra, S. Tsuruta, and T. J. Lawlor. 2010. Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. Journal of Dairy Science 93: 743-752. </ref> is a method developed to enable the inclusion of marker genotypes into the well-known BLUP machinery. The idea of ssGBLUP came from the fact that only a small portion of the animals in the pedigree is genotyped. In this way, one approach to account for all animals (i.e., genotyped and non-genotyped) in the evaluation would be to combine pedigree and genomic relationships and use this as the covariance structure in the BLUP mixed model equations. Thus, ssGBLUP uses marker information to construct genomic relationships.
Legarra et al. (2009) <ref name=Legarra /> stated that genomic evaluations would be simpler if genomic relationships were available for all animals in the model. Then, their idea was to look at the pedigree relationship as a priori relationship and at the genomic relationship as the observed relationship. Based on that, they showed the genomic information could be extended (i.e., imputed) to non-genotyped animals. This means that in ssGBLUP pedigree relationships for non-genotyped animals are enhanced by the genomic information of their relatives. The relationship matrix that combines information for genotyped and non-genotyped animals is represented by ''H'':
<center>
<math>
H=
\begin{bmatrix}
A_{11}+A_{12}A_{22}^{-1}(G-A_{22})A_{22}^{-1}A_{21} & A_{12}A_{22}^{-1}G \\
GA_{22}^{-1}A_{21}  & G
\end{bmatrix}
</math>
</center>
Where the subscripts 1 and 2 refer to non-genotyped and genotyped animals, respectively. ''A'' is the pedigree relationship matrix and ''G'' is the genomic relationship matrix computed based on markers. If ''M'' is a matrix of marker genotypes centered for allele frequency (''p'') and has the dimension of number of animals by number of SNP (N), ''G'' is computed as <ref name=VanRaden> VanRaden, P. M. 2008. Efficient methods to compute genomic predictions. J. Dairy Sci. 91:4414-4423. </ref>:
<center>
<math>
G=\frac{MM^'}{2\sum_{i=1}^N p_i(1-p_i)}
</math>
</center>
Although ''H'' is very complicated, ''H''<sup>''-1''</sup> is quite simple <ref name=Aguilar />:
<center>
<math>
H^{-1}=A^{-1}+
\begin{bmatrix}
0 & 0 \\
0  & G^{-1}- A_{22}^{-1}
\end{bmatrix}
</math>
</center>
If we replace ''A''<sup>''-1''</sup> by ''H''<sup>''-1''</sup> in the BLUP mixed model equations, we have ssGBLUP<ref name=Aguilar />:
<center>
<math>
\begin{bmatrix}
{X^'X} & {X^'Z}  \\
{Z^'X}  & {Z^'Z}+ H^{-1}{\lambda}
\end{bmatrix}
\begin{bmatrix}
\hat{b}\  \\
\hat{u}\ 
\end{bmatrix}
=
\begin{bmatrix}
{X^'y} \\
{Z^'y} 
\end{bmatrix}
</math>
</center>
Where ''b'' and ''u'' are vectors of fixed effects and breeding values, respectively; ''X'' and ''Z'' are incidence matrices for the effects in ''b'' and ''u''; ''y'' is a vector of phenotypes, and &lambda; is the ratio of residual to additive genetic variance. 
As a combined relationship is used in ssGBLUP, the output for each animal is automatically a genomic EBV, and the mixed model equations above can be simplistically represented as:
<center>
[[File:figure1_ssGBLUP.jpg | 250px]]
</center>
 
The genomic EPD is then calculated as:
<center>
<math>
{genomic\,EPD}=\frac{genomic\,EBV}{2}
</math>
</center>
When the subject is genetic evaluation, one of the most common questions is “What is the main difference among ssGBLUP, BLUP, and genomic BLUP (GBLUP)?” In a nutshell, ssGBLUP uses phenotypes, pedigree, and genotypes for both genotyped and non-genotyped animals, whereas BLUP uses phenotypes and pedigree for all animals and GBLUP uses phenotypes and genotypes only for genotyped animals.
 
In the US, ssGBLUP has been used for genomic evaluation of beef and dairy cattle, pigs, chickens, and fish. Regarding to beef cattle, Angus Genetics Inc. runs ssGBLUP evaluations for American Angus and Charolais, Canadian Angus, Red Angus, and Charolais, and Maine Anjou. Moreover, Livestock Genetic Services (A Neogen Company) runs ssGBLUP evaluations for Santa Gertrudis. For more information about ssGBLUP for beef cattle evaluation check Lourenco et al. (2015) <ref name=Lourenco1> Lourenco, D. A. L., S. Tsuruta, B. O. Fragomeni, Y. Masuda, I. Aguilar, A. Legarra, J. K. Bertrand, T. Amen, L. Wang, D. W. Moser, and I. Misztal. 2015. Genetic evaluation using single-step genomic best linear unbiased predictor in American Angus. Journal of Animal Science 93: 2653-2662. </ref>, Lourenco et al. (2017) <ref name=Lourenco2> Lourenco, D.A.L., J.K. Bertrand, H.L. Bradford, S. Miller, and I. Misztal. 2017. The promise of genomics for beef improvement. BIF Meeting (http://www.bifconference.com/bif2017/proceedings/01-lourenco.pdf) </ref>, and Misztal & Lourenco (2018) <ref name=Misztal1> Misztal, I. and D. Lourenco. 2018. Current research in unweighted and weighted ssGBLUP. In Proc. Beef Improvement Federation 11th genetic prediction workshop 11:6-13. </ref>.
 
 
<em>ssGBLUP for large genotyped populations</em> <p></p>
Running ssGBLUP evaluations for large genotyped populations can be a huge computational challenge. This is because the construction of ''H''<sup>''-1''</sup> requires the construction and inversion of ''G''. Matrix inversion has a cubic computational cost, requiring a large amount of memory. As an example, inverting ''G'' for 100,000 animals requires about 300Gb of memory and takes over 2 hours.
 
The algorithm for proven and young (APY) was proposed by Misztal et al. (2014) <ref name=Misztal2> Misztal, I., A. Legarra, and I. Aguilar. 2014. Using recursion to compute the inverse of the genomic relationship matrix. J. Dairy Sci. 97: 3943–3952. </ref> to overcome this computing limitation of ssGBLUP, and was based on Henderson’s algorithm to construct ''A''<sup>''-1''</sup> <ref name=Henderson> Henderson, C.R. 1976. A simple method for computing the inverse of a numerator relationship matrix used in the prediction of breeding values. Biometrics 32:69-83. </ref>. In APY, ''G''<sup>''-1''</sup> is constructed directly, avoiding the matrix inversion step. In this algorithm, genotyped animals are split into two groups: core (c) and non-core (n). Breeding values of non-core animals are then calculated as functions of breeding values of core animals and the genomic relationships between core and non-core. If the number of genotyped animals surpasses 100,000, using APY ''G''<sup>''-1''</sup> in ssGBLUP is highly recommended.  
Constructing APY ''G''<sup>''-1''</sup> is computationally efficient because it requires only the inversion of a block of ''G'' that contains relationships between core animals:
<center>
<math>
G_{APY}^{-1}=
\begin{bmatrix}
G_{cc}^{-1} & 0 \\
0  & 0
\end{bmatrix}
+
\begin{bmatrix}
{-G}_{cc}^{-1}G_{cn}  \\
I
\end{bmatrix}
M_{nn}^{-1}
\begin{bmatrix}
G_{nc}{-G}_{cc}^{-1}  & I
\end{bmatrix}
</math>
</center>
Where ''M''<sub>nn</sub><sup>''-1''</sup> is the Mendelian error. Although this formula looks complicated, a simple graphic representation of APY ''G''<sup>''-1''</sup> is:
<center>
[[File:figure2_APY.jpg | 250px]]
</center>
Looking at the above figure it is easy to see that relationships between non-core animals are ignored in APY. However, this does not have an impact on breeding values. Several studies have reported correlations greater than 0.99 between genomic EPD from regular ssGBLUP and from ssGBLUP with APY ''G''<sup>''-1''</sup> <ref name=Lourenco1 /> <ref name=Fragomeni> Fragomeni, B. O., D. A. L. Lourenco, S. Tsuruta, Y. Masuda, I. Aguilar, A. Legarra, T. J. Lawlor, and I. Misztal. 2015. Hot topic: Use of genomic recursions in single-step genomic best linear unbiased predictor (BLUP) with a large number of genotypes. J. Dairy Sci. 98:4090-4094. </ref> <ref name=Masuda> Masuda Y., I. Misztal, S. Tsuruta, A. Legarra, I. Aguilar, D.A.L. Lourenco, B.O. Fragomeni, T.J Lawlor. 2016. Implementation of genomic recursions in single-step genomic best linear unbiased predictor for US Holsteins with a large number of genotyped animals. J. Dairy Sci. 99:1968-74. </ref>.
 
 
<em> Genomic EPD accuracy in ssGBLUP</em> <p></p>
When datasets are very large, using the inverse of the left-hand side of the ssGBLUP mixed model equations to calculate accuracy of genomic EPD is impeditive. Sampling techniques or approximations <ref name=Tsuruta> Tsuruta, S., D. Lourenco, Y. Masuda, D.W. Moser, and I. Misztal. 2016. Practical approximation of accuracy in genomic breeding values for a large number of genotyped animals. J. Anim. Sci. 94:162. </ref> <ref name=Edel> Edel, C., E.C.G. Pimentel, M. Erbe, R. Emmerling, and K.U. Gotz. 2019. Short Communication: Calculating analytical reliabilities for single-step predictions. J. Dairy Sci. 102:1-7.</ref> can help to overcome this limitation.  The approximated genomic EPD accuracy developed by the Animal Breeding and Genetics Group at University of Georgia combines contributions from phenotypes and pedigree <ref name=Misztal3> Misztal, I. and Wiggans, G.R. 1988. Approximation of prediction error variance in large-scale animal models. J. Dairy Sci. 71: 27-32.</ref> with contribution from genomic relationships. To reduce computing time, only coefficients from the diagonal of ''G'' are used in the formula to compute genomic contribution.
 
 
<em>Marker effects in ssGBLUP</em> <p></p>
Although the marker information in ssGBLUP is used to construct genomic relationships, it is possible to calculate SNP effects once we obtain genomic EBV (Wang et al., 2012<ref name=Wang> Wang, H., I. Misztal, I. Aguilar, A. Legarra, and W. M. Muir. 2012. Genome-wide association mapping including phenotypes from relatives without genotypes. Genet. Res. 94(2):73-83.</ref>, Lourenco et al., 2015<ref name=Lourenco1 />):
<center>
<math>
\hat{a}\  = k{M^'} G^{-1} \hat{u}\ 
</math>
</center>
Where k is the ratio of marker variance to additive genetic variance. Marker effects can be then used to calculate predictions based only on marker genotypes for young genotyped animals that are not yet or will never make into an official evaluation. This type of prediction is called direct genomic value (DGV) or molecular breeding value (MBV):
<center>
<math>
\hat{DGV} \  = {M}_{young} \hat{a}
</math>
</center>
 
 
<em>References</em> <p></p>
<references />
 
===[[Single-step Hybrid Marker Effects Models]] (Suggested writer: Bruce Golden)===
 
==Interim Calculations==
=Bias=
==(in)complete reporting / contemporary groups / preferential treatment (Suggested writer: Bob Weaber==
=[[Accuracy]] (Suggested writer: Matt Spangler)=
==meaning of accuracy==
==what impacts accuracy==
==different definitions of accuracy (true, BIF, reliability)==
 
=Variance components (Suggested writer: Steve Kachman)=
==Impact on EPD, accuracy, genetic gain (Suggested writer: Steve Kachman)==
==[[Heterogeneous variance]]==
 
=Connectivity (Suggested writer: Ron Lewis)=
==Measures of (Suggested writer: Ron Lewis)==
==Impact on GE== (Suggested Writer: Ron Lewis)
=Current GE=
==How each breed (organization) is modeling each trait (Suggested writers: Steve Miller, Lauren Hyde, AHA)==

Latest revision as of 17:19, 12 April 2021

Predicting genetic merit for breeding animals is one of the oldest practices that mankind has used to improve food and fiber production. Identifying animals for selection and mating has evolved from visual appraisal to sophisticated analytical models for predicting additive genetic merit of animals. Additive genetic merit is the effect of genes that are passed from parent to offspring that can be used to make genetic progress through selection.

In North America, the standard for identifying genetic merit of breeding animals is expected progeny differences (EPDs). With very few ad hoc exceptions, EPDs are produced for North American beef cattle using models based on Best Linear Unbiased Prediction. Consequently, BIF recommends the use of EPD when available.

While not all economically relevant traits in all situations and in all North American breed registries have EPDs available, the number of traits and trait components that have EPDs has increased dramatically. Nearly all the major North American beef cattle breed organizations have migrated to weekly genetic evaluations, eliminating the need for interim EPDs.

Most of the improvements in the technologies used in genetic evaluation have been motivated by an opportunity to increase accuracy of prediction and reduce bias. For example, the advent of genomic information to enhance the accuracy of prediction has resulted in EPDs for most traits being produced using either Single-step Genomic BLUP or Single-step Hybrid Marker Effects Models. The BIF has developed an extensive set of recommendations for the inclusion of genomic data in genetic evaluations.

In commercial cattle production, EPDs for economically relevant traits should be combined with appropriate selection tools such as selection indices to make optimal genetic progress toward achieving breeding objectives. It must be remembered that EPDs are just tools to make selection decisions to make genetic progress and manage certain genetic risks.

In some special situations in seedstock production breeders may need to make selection decisions using EPDs that are not economically relevant traits in commercial settings in order to enhance the marketability of their breed or breeding animals. For example, if a breed has a perceived defect that is limiting that breed organizations' members from expanding their market for selling germplasm, then selection to improve that characteristic should be included in the seedstock breeder's breeding objectives.

Critical to genetic evaluation is having high-quality estimates of variance components. Knowing the heritabilities and correlations of the traits and performing Multiple-Trait Evaluation enhances the accuracy of prediction and reduces bias from effects such as incomplete reporting. Equally critical is understanding the connectedness of the data in a particular data set. Disconnected data can lead to invalid comparisons.