Int. J. Pure Appl. Sci. Technol., 16(1) (2013), pp. 7-19
International Journal of Pure and Applied Sciences and Technology ISSN 2229 - 6107 Available online at www.ijopaasat.in Research Paper Statistical Bayesian Analysis of Experimental Data Labdaoui Ahlam1, * and Merabet Hayet1
1 Department of Mathematics, University Constantine 1, Route of Ain El Bey, 25000 Constantine,
* Corresponding author, e-mail: ([email protected])
Abstract: The Bayesian researcher should know the basic ideas underlying Bayesian methodology and the computational tools used in modern Bayesian econometrics. Some of the most important methods of posterior simulation are Monte Carlo integration, importance sampling, Gibbs sampling and the Metropolis- Hastings algorithm. The Bayesian should also be able to put the theory and computational tools together in the context of substantive empirical problems. We focus primarily on recent developments in Bayesian computation. Then we focus on particular models. Inevitably, we combine theory and computation in the context of particular models. Although we have tried to be reasonably complete in terms of covering the basic ideas of Bayesian theory and the computational tools most commonly used by the Bayesian, there is no way we can cover all the classes of models used in econometrics. We propose to the user of analysis of variance and linear regression model. Keywords: Bayesian analysis, Markov Chain Monte Carlo Algorithms, regression models. 1. Introduction Regression is by far the larger the field of statistics, both theoretical and applied. This is the preferred method of econometrics, and the practice of social science modeled on econometrics, "econometric model" has come to mean any regression model, even without reference to economic problems. The framework model of regression is defined by a variable to predict (or "dependent", dedicated notation y), and a variable (simple regression) and multivariate (multiple regression) known predictor variables (or "independent"). Regression is to construct a variable regressed ࢟
predictor variables as close as possible (in a sense to be specified) of the dependent variable.
Int. J. Pure Appl. Sci. Technol., 16(1) (2013), 7-19 8
Procedures classical linear regression, applicable to numeric variables, recently came to enlist the logistic regression and its variants for the variables categorized. Considerations of this module focus on linear regression, shall apply (mutatis mutandis) to various forms of regression. Statistical experimental data, regression can be considered as a special case of the analysis of variance, in the case of digital independent variables. For observational data, new problems arise, related to the fact that in general the predictor variables are not statistically independent. It is these problems that have focused my recent work. In the Bayesian framework, there is no fundamental difference between the observation and the parameter of a statistical model, both of which are considered variable quantities, so if we denote by x the given bill sampling f (x \ θ), and θ the model parameters considered (plus possibly latent variables) of prior formal inference requires updating of the conditional distribution f (θ \ x) parameter. Determining π (θ) and f (x \ θ) gives f (x, θ) by ݂(ݔ, θ) = ݂(ݔ \θ)* π(ߠሻ
After observing x, we can use Bayes' theorem to determine the distribution of θ conditional on the data (or the posterior) (see [2]).
(௫\ሻ∗ (ሻୢ(ሻ
For the Bayesian approach, all the features of the posterior distribution are important for inference: time, quantile, etc . These quantities can often be expressed in terms of conditional expectation of a function of θ with respect to the law post
୦(ሻ(௫\ሻ∗ (ሻୢ(ሻ
We can calculate the posterior distribution directly in the simple case or calculation is made by MCMC simulation where the integral calculation is very complex. In our work we first present the regression model and the simple and multiple logistic model then we set the conditions for the use of algorithms Monte Carlo Markov Chain (MCMC) then we introduce some MCMC algorithms, in particular the Metropolis-Hastings algorithm and the Gibbs sampling method. Finally, we present the numerical results and their interpretations. We used the software WinBUGS to estimate the parameters, and interpret the results of actual data, WinBUGS (the MS Windows operating system version of BUGS: Bayesian Analysis Using Gibbs Sampling) is a versatile package that has been designed to carry out Markov chain Monte Carlo (MCMC) computations for a wide variety of Bayesian models (see [5]). 2. Methodology 2.1 Regression Models 2.1.1. Linear Regression Model: Regression is for a type of problem where two continuous quantitative variables X and Y have a role asymmetrical variable Y depends on the variable X. The connection between the dependent variable Y and the independent variable X can be modeled as a function of Y = α + β X+ߝ, (see [3])
Int. J. Pure Appl. Sci. Technol., 16(1) (2013), 7-19 9
Y: dependent variable (explained) X: independent variable (predictor) α: intercept (value of Y for x = 0) β: slope (average variation of the value of Y for a one-unit increase of α et β can be calculated by :
r = correlation coefficient = is another important determinant and looks a lot like β.
r=ඥ∑(ିതሻమ ∑(ିതሻమ
r = measure for the strength of association between Y and X-data. The stronger the association, the better Y predicts X. 2.1.2 Multiple Linear Models: The multiple regression model is a generalization of the regression model Simple when the explanatory variables are finite in number. The connection between the dependent variable Y and the independent variables ܺଵand ܺଶ can be modeled as a function of
Y= α + β1* X1 + β2* X2.
A linear regression model is defined by an equation of the form: ܻ×ଵ = ܺ×ߚ×ଵ + ߝ×ଵ
Y: is an n-dimensional random vector. X: is a matrix of size n × p known design matrix called experience. β: is the p-dimensional vector of unknown model parameters ε: the vector is centered, n-dimensional errors. 2.1.3 Logistic Model: A standard qualitative regression and logistic regression model or logit model, where the conditional distribution of y is zϵRp explanatory variables, (see [2]):
ܲ(ݕ = 1ሻ = 1 − ܲ(ݕ = 0ሻ = ୣ୶୮ (௭ఊሻ
Consider the particular case where z= (1,ݔ) and ߛ= (α, β) random variables yi values in {0,1} are associated with explanatory variables were modeled using a Bernoulli conditional probability
ݕ\ݔ ∼ ܤ ቀ ୣ୶୮(ఈାఉ௫ሻ ቁ
Assume that our parameters follow a priori law unsuitable π(α, β) = 1. The likelihood of our model for a sample (ݕଵ, ݔଵ),…,(ݕ,ݔ),is equal to
݂(ݕଵ, … . . , ݕ\ݔଵ, … … . . , ݔ, ߙ, ߚሻ = ∏
ୣ୶୮ሼ (ఈାఉ௫ሻ௬ሽ
The posterior distribution of (α, β) is then deduced by formal application of Bayes Theorem, see [9]
ߙ ∏ ୣ୶୮ሼ (ఈାఉ௫ሻ௬ሽ
= ௫ሼ∑సభ(ఈାఉ௫ሻ௬ ሽ
Int. J. Pure Appl. Sci. Technol., 16(1) (2013), 7-19 10
2.2. MCMC Methods The Monte Carlo Markov Chain (Monte Carlo Markov Chains in English or MCMC) is used when interest law cannot be simulated directly by the usual methods and / or when its density is known to a normalization constant fields. 2.2.1. Metropolis-Hasting Algorithm: The Metropolis-Hastings algorithm based on the use of a conditional density measurement ݍ(ݕ|ݔሻwith respect to the dominant model li. It cannot be put into practice if ݍ(. |ݔሻ is simulated quickly and is available either analytically for a constant independent of either symmetrical, that is to say as ݍ(ݕ|ݔሻ = ݍ(ݔ|ݕሻ. The Metropolis-Hastings algorithm (see [4]) associated with the objective law ߨ and the conditional ݍ produces a Markov chain ݔ(௧ሻ based on the following transition:
Initialization: X0 At each step k ≥ 0: • Simulate a value • Simulate a value. • Ask
ܺାଵ ൜ݕ ݂݅ ݑ ≤ ߩ(ݔ, ݕሻ
Or ߩ(ݔ, ݕሻ = min ൜1, గ(௬ೖሻ(௫ೖ│௬ೖሻൠ .
The law ݍ is called the law of instrumental or proposal. This algorithm accepts systematically simulations ݕ௧ such that the ratio ቀߨ(ݕ௧ሻቚݍ൫ݕ௧หݔ(௧ሻ൯ቁ is greater than the previous
value൬ߨ ቀ൫ݔ(௧ሻ൯ቁ ฬݍ൫ݔ(௧ሻหݕ௧൯൰. It is only in the symmetric case that acceptance is governed by the report ߨ(ݕ௧ሻ/ߨ(ݔ௧ሻ.
2.2.2 The Gibbs Sampling: The Gibbs sampling algorithm is a simulation of a law π (x) such that:
x admits a decomposition of the form ݔ = (ݔଵ, . . . , ݔሻ,
The conditional law ߨ (. |(ݔଵ, . . . , ݔ௫ିଵ, ݔ௫ାଵ, . . . , ݔሻሻare easily simulated (see[8]). Example: (ܺ, ܻሻ~ܰ(0, ∑), with ∑=ቀଵ ఘ
Principle of the algorithm: Updating "component by component".
~ߨଵ൫. หܺଶ , … … … . , ܺ൯
~ߨ(. │ܺଵ , … … , ܺିଵ , ܺିଵ, … … . , ܺሻ 5
~ߨଵ(. │ܺଵ , … … … . , ܺିଵሻ
Int. J. Pure Appl. Sci. Technol., 16(1) (2013), 7-19 11
3. Applications 3.1 Example with WinBUGS: Linear Model “Calculated α and β” Table 1 gives the real data of a crossover study comparing a new laxative versus a standard laxative, bisacodyl. Days with stool are used as primary endpoint. The table shows that the new drug is more efficacious than bisacodyl (see [10]). Table 1: Example of a crossover trial comparing efficacy of a new
*Model with software WinBUGS Y-variables: new treatment (days with stool). X-variables: bisacodyl (days of stool). Y̴ N (mui, tau)
Int. J. Pure Appl. Sci. Technol., 16(1) (2013), 7-19 12
mui=α + β * Xi The model is: model { for(i in 1 : 35) { y[i] ~ dnorm(mu[i], tau) mu[i] <- alpha + beta * X[i] } alpha ~ dnorm(0, 1.0E-6) beta ~ dnorm(0, 1.0E-6) tau ~ dgamma(1.0E-3, 1.0E-3) sigma <- 1/sqrt(tau) } We then proceed to estimate, this time on two channels, with 110 000 iterations (1000 enough) each, keeping an iteration of 150. The parameters of the line are estimated, α = 8.669 with a standard deviation of 3.236 and β= 2.062 with a standard deviation of 0.2854. WinBUGS outputs are as follows:
MC error 2.5% median 97.5% start
We now presenting a graphical representation of the parameters alpha and beta, of Kernel density in fig.1, quantiles in fig.2 and the auto correlation function in fig.3
Figure 1: Kernel density Figure 2: Quantiles Int. J. Pure Appl. Sci. Technol., 16(1) (2013), 7-19 13
Figure 3: Autocorrelation function 3.2 Example with WinBUGS: Multiple Linear Model “Calculated α, β1 and β2” We may be Interested to know if age is an independent contributor to the effect of the new laxative. That purpose for a simple regression equation has to be extended as follows Y = α + β1 X1 + β2 X2, two partial regression coefficients are Called. Just like a simple linear regression, multiple linear regressions can give us the best fit for the data given, although it is hard to display the correlations in a figure. Table 2 gives the data from Table 1 extended by the variable age (see [10]). Int. J. Pure Appl. Sci. Technol., 16(1) (2013), 7-19 14
Table 2: Example of a crossover trial comparing efficacy of a new laxative versus bisacodyl
*Model with software WinBUGS Y-variables: new treatment (days with stool). X1-variables: bisacodyl (days of stool). X2-variables: age (years). Y̴ N (mu, tau) mu=α + β1 * X1 + β2*X2 The model is: model
mu[i] <- alpha + beta1* X1[i] + beta2* X2[i]
We then proceed to estimate, this time on two channels, with 110 000 iterations (1000 enough) each, keeping an iteration of 150. The parameters of the line are estimated, α = 2.332 with a standard deviation of 4.985 and β1= 1.876 with a standard deviation of 0.3003, β2= 0.282 with a standard deviation of 0.171. WinBUGS outputs are as follows:
MC error 2.5%
We now present a graphical representation of the parameters alpha and beta (1), beta (2) of Kernel density in fig.4, quantiles, fig.5 and the graphical of auto correlation in fig.6:
Int. J. Pure Appl. Sci. Technol., 16(1) (2013), 7-19 15
Figure 4: Kernel density Figure 5: Quantiles Figure 6: Autocorrelation function Int. J. Pure Appl. Sci. Technol., 16(1) (2013), 7-19 16
3.3 Example with WinBUGS: Logistic Model
Our study is based on a comparison of an antiseptic cream and Placebo; as the endpoint is cure an infection. We seek to estimate the effect of the cream versus placebo, the following table gives the answer 8 centers that we have considered, see [1]: Table 3: Processed data Int. J. Pure Appl. Sci. Technol., 16(1) (2013), 7-19 17
{ for(i in 1 : 8) { rp[i] ~ dbin(pp[i], np[i]) rc[i] ~ dbin(pc[i], nc[i]) logit(pp[i]) <- alpha - beta / 2 + u[i] logit(pc[i]) <- alpha + beta / 2 + u[i] u[i] ~ dnorm(0.0, tau) } alpha ~ dnorm(0.0, 1.0E-6) beta ~ dnorm(0.0, 1.0E-6) tau ~ dgamma(0.1, 0.1) sigma <- 1/ sqrt(tau) OR <- exp(beta) } We then proceed to estimate, this time on three channels, with 110 000 iterations (1000 enough) each, keeping an iteration of 150. The (assumed homogeneous) cream is estimated at 0.757, with a standard deviation of 0.304. WinBUGS outputs are as follows: sd MC error 2.5% median 97.5%
We now presenting a graphical representation of the parameters alpha and beta, of Kernel density in fig.7, quantiles fig.8 and finally the auto correlation function in fig.9
Figure 7 : Kernel density Figure 8: Quantiles Int. J. Pure Appl. Sci. Technol., 16(1) (2013), 7-19 18
Figure 9: Autocorrelation function 4. Discussion
•
In the linear regression model the regression line is Y = 2.065+X 8.646. The slope is 2.065 and directed the original is 8.646, and if we have x = 1 → y = 10 therefore the new treatment is better than the standard treatment.
The regression line in the model of multiple linear regression is Y = 2332 + 0282 + X2 1.876X1
We add the parameter age or not the new treatment is the best. •
As gold is greater than 1 and the confidence interval between 3.88 and 1191 at 97.5% it is said that our anti septic cream is effective.
5. Conclusion One of the merits of our work is to have shown using experimental data of clinical trials that can be modeled in a natural way and draw appropriate inferences, namely estimating parameters in regression models: model simple and multiple linear and logit model using Monte Carlo methods for Markov Chain (MCMC) especially as computer performance, made feasible processes effective simulations and the availability of computer programs has facilitated the calculation of posterior probabilities, which were previously daunting complexity. 6. Acknowledgement
We definitely want to thank Mr. Pierre Druilhet, Professor at the University of Blaise Pascal, Clermont Ferrand, France, for his help and advice for successful completion of this work. References [1]
A. Agresti, Categorical Data Analysis (Volume 359), de Wiley Series in Probability and Statistics, 2002.
A. Altaleb and C.P. Robert, Analyse bayésienne du modèle logit: Algorithme par tranches ou metropolis-Hastings? Revue de Statistique Appliquée, Tome, 49(4) (2001), 53-70.
C.P. Robert and J.M. Marin, Bayesian Core: A Practical Approach to Computational Bayesian Statistics, Springer Texts in Statistics, 2007.
C.P. Robert and G. Casella, Monte Carlo Statistical Methods, Springer, 2004.
D.J. Lunn, A. Thomaa, N. Best and D. Spiegelhalter, WinBUGS – A Bayesian modeling framework: Concepts, structure and extensibility, Statistics and Computing, 10(2000), 325-337.
É. Parent and J. Bernier, Le Raisonnement Bayésien, Springer-Verlag, France, Paris, 2007.
L.R. França, Statistique Bayésienne, INSERM U669, Mai, 2009.
Int. J. Pure Appl. Sci. Technol., 16(1) (2013), 7-19 19
C.P. Robert and G. Casella, Monte Carlo Statistical Methods, New York: Springer Verlag, 1999.
C.P. Robert, L’analyse Statistique Bayésienne Economica, Paris, 1992.
T.J. Cleophas, A.H. Zwinderman and T.F. Cleophas, Statistics Applied to Clinical Trials, Springer, P.O. Box 17, 3300 AA Dordrecht, The Netherlands, 2006.
Terms of Reference HEALTH SYSTEM MODERNIZATION PROJECT Consultant Terms of Reference 1. Background One of the components of the Albanian Health System Modernization Program is to make a thorough assessment of health technologies at public health care providers and an in-depth analysis of supportive systems and standards at the same in order to increase the effectiveness and eff
Prospettive del trattato Libia-Italia Il 3 febbraio è stato definitivamente approvato il disegno di legge che autorizza la ratifica del Trattato “ Amicizia, partenariato e cooperazione ” tra Italia e Libia, firmato a Bengasi il 30 agosto 2008. L’unica forza politica ad opporsi, al senato, sono stati i radicali; come c’era da aspettarsi, hanno fatto leva sulla solfa dei ‘diritti uman