Enhanced comment feature has been enabled for all readers including those not logged in. Click on the Discussion tab (top left) to add or reply to discussions.
Variance Components: Difference between revisions
No edit summary |
No edit summary |
||
(12 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
[[Category: Genetic Evaluation]] | |||
Methods such as [[Best Linear Unbiased Prediction | BLUP]], [[Single-step Genomic BLUP]], and [[Single-step Hybrid Marker Effects Models]] used to predict [[Expected Progeny Difference]] (EPD) are based on models which include random effects. Associated with these random effects are parameters known as variance components. For example, a typical model to predict the EPD using BLUP would include random effects for the additive genetic merit and environmental effects. Each of these would have an associated variance component, in this case additive genetic variance and environmental variance which quantify the amount variability associated with the two random effects. As these variance components are unknown, they must be estimated. | |||
For methods based on linear mixed models, such as BLUP and Single-step Genomic BLUP, variance components can be estimated using Residual Maximum Likelihood (REML) <ref>Harville, D. A. 1977. Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association 72(358):320-338. </ref> while for methods based on Bayesian models, such as Single-step Hybrid Marker Effects Models, variance components can be estimated by corresponding Bayesian methods <ref>Gianola, D., and R. L. Fernando. 1986. Bayesian methods in animal breeding theory. Journal of Animal Science 63:217-244. </ref>. | |||
==REML== | |||
Both REML and the Bayesian methods work with the likelihood of the data. A likelihood is a measure of how likely a set of data is for different values of the parameters. In the case of REML, it uses the likelihood of the observed residuals where the residual, <math>\hat r=y-\hat y,</math> is the difference between the observed data, <math>y</math>, and the estimated data, <math>\hat y</math>, given fixed effects such as contemporary group effects. The REML estimates are then the values of the variance components that maximize the likelihood of the observed residuals. In some cases, it is not possible to maximize residual likelihood directly and a variety of iterative algorithms such as Expectation Maximization, Fisher Scoring, and Average Information have been used to numerically find the estimates. These algorithms do have commonality in that they all use BLUP of the random effects obtained from the mixed-model equations. | |||
For a trait such as calving difficulty, genetic evaluation might use a [[Glossary#T| threshold model]] <ref>Gianola, D., and J. L. Foulley. 1983. Sire evaluation for ordered categorical data with a threshold model. Genet. Sel. Evol. 15(2):201-224. </ref> instead of a linear mixed model. In the case of a threshold model, it is no longer feasible to find either the likelihood or the residual likelihood. For threshold models where REML is no longer an option, penalized quasi-likelihood based methods <ref> Breslow, N. E., and D. G. Clayton. 1993. Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc. 88:9-25. </ref> can be used to obtain REML-like estimates of the variance components. | |||
==Bayesian methods== | |||
Bayesian methods make use of a prior distribution on the variance components in addition to the information coming from the likelihood to form the posterior distribution. In most cases, the prior is selected with the aim that it has little impact on the estimates. The two predominant types of Bayes estimators are the posterior mode and the posterior mean. Posterior mode estimates are the values of the variance components that maximize the likelihood of the posterior distribution. Posterior mean estimates are the average values of the variance components sampled from the posterior distribution. | |||
Markov chain Monte Carlo (MCMC) is a general method of generating random samples from a posterior distribution. Briefly, the method starts with an initial sample of the unknown parameters (e.g. variance components, fixed effects), and using the initial sample and the posterior distribution, generates the second sample of the parameters; the second sample is then used to generate the third sample, and the cycle continues until the desired number of samples have been obtained. After the samples have been obtained, the estimate of a variance component is the average of the sample values of that variance component. A large number of samplers exist for generating an MCMC sample, of which the Gibbs sampler is commonly used. A Gibbs sampler generates samples for each parameter sequentially conditional on the current values of the other parameters. | |||
==References== | |||
<references /> | <references /> |
Latest revision as of 13:54, 11 April 2021
Methods such as BLUP, Single-step Genomic BLUP, and Single-step Hybrid Marker Effects Models used to predict Expected Progeny Difference (EPD) are based on models which include random effects. Associated with these random effects are parameters known as variance components. For example, a typical model to predict the EPD using BLUP would include random effects for the additive genetic merit and environmental effects. Each of these would have an associated variance component, in this case additive genetic variance and environmental variance which quantify the amount variability associated with the two random effects. As these variance components are unknown, they must be estimated.
For methods based on linear mixed models, such as BLUP and Single-step Genomic BLUP, variance components can be estimated using Residual Maximum Likelihood (REML) [1] while for methods based on Bayesian models, such as Single-step Hybrid Marker Effects Models, variance components can be estimated by corresponding Bayesian methods [2].
REML
Both REML and the Bayesian methods work with the likelihood of the data. A likelihood is a measure of how likely a set of data is for different values of the parameters. In the case of REML, it uses the likelihood of the observed residuals where the residual, is the difference between the observed data, , and the estimated data, , given fixed effects such as contemporary group effects. The REML estimates are then the values of the variance components that maximize the likelihood of the observed residuals. In some cases, it is not possible to maximize residual likelihood directly and a variety of iterative algorithms such as Expectation Maximization, Fisher Scoring, and Average Information have been used to numerically find the estimates. These algorithms do have commonality in that they all use BLUP of the random effects obtained from the mixed-model equations.
For a trait such as calving difficulty, genetic evaluation might use a threshold model [3] instead of a linear mixed model. In the case of a threshold model, it is no longer feasible to find either the likelihood or the residual likelihood. For threshold models where REML is no longer an option, penalized quasi-likelihood based methods [4] can be used to obtain REML-like estimates of the variance components.
Bayesian methods
Bayesian methods make use of a prior distribution on the variance components in addition to the information coming from the likelihood to form the posterior distribution. In most cases, the prior is selected with the aim that it has little impact on the estimates. The two predominant types of Bayes estimators are the posterior mode and the posterior mean. Posterior mode estimates are the values of the variance components that maximize the likelihood of the posterior distribution. Posterior mean estimates are the average values of the variance components sampled from the posterior distribution.
Markov chain Monte Carlo (MCMC) is a general method of generating random samples from a posterior distribution. Briefly, the method starts with an initial sample of the unknown parameters (e.g. variance components, fixed effects), and using the initial sample and the posterior distribution, generates the second sample of the parameters; the second sample is then used to generate the third sample, and the cycle continues until the desired number of samples have been obtained. After the samples have been obtained, the estimate of a variance component is the average of the sample values of that variance component. A large number of samplers exist for generating an MCMC sample, of which the Gibbs sampler is commonly used. A Gibbs sampler generates samples for each parameter sequentially conditional on the current values of the other parameters.
References
- ↑ Harville, D. A. 1977. Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association 72(358):320-338.
- ↑ Gianola, D., and R. L. Fernando. 1986. Bayesian methods in animal breeding theory. Journal of Animal Science 63:217-244.
- ↑ Gianola, D., and J. L. Foulley. 1983. Sire evaluation for ordered categorical data with a threshold model. Genet. Sel. Evol. 15(2):201-224.
- ↑ Breslow, N. E., and D. G. Clayton. 1993. Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc. 88:9-25.