[R] 몬테카를로 실험 기반의 검정 (monte carlo test)

해당 게시글은 Yudi Pawitan의 In all likelihood 책 본문의 Exercise 4.14에 대한 풀이입니다.

Monte Carlo Test

귀무가설 하에서 특정 확률 분포를 따르는 데이터를 반복 추출하고, 추출된 데이터를 기반으로 가설 검정하는 방법이다.

귀무가설 하에서 얻어진 데이터를 통해 p-value가 계산 가능해지며, 이는 가설 검정이 가능함을 의미한다.

몬테카를로 실험은 통계적 특성을 조사하기 어려운 경우에도 사용할 수 있다는 강점이 있다.

이후 소개할 Exercise 4.14의 예제도 Overdispersion parameter 가 포함된 Likelihood의 계산이 명시적이지 않기 때문에, 몬테카를로 실험 기반의 가설 검정을 진행한다.

Process of Monter Carlo Experiment — Monte Carlo, 복원 추출된 m개의 데이터로 추정하고자 하는 파라미터를 계산한다.

Exercise 4.14

The following data shows the number of paint defects recorded in a sample of 20 cars using an experimental process:

0 10 1 1 1 2 1 4 11 0 6 2 6 2 0 2 0 1 3 0

The sample mean is 2.55 and the sample variance is 9.84, indicating overdispersion.

Using the Poisson-type model above $A(\theta) = e^{\theta}$, so $Var_{\theta}(X)=\phi E_{\theta}(X)$,

and the method-of-moments estimate of $\phi$ is 9.84/2.55 = 3.86.

Obviously, there is some variability in estimates of $\phi$. Is it significantly away from 1?

An exact likelihood analysis is difficult, but instead we can do a Monte Carlo experiment to test $\phi=1$.

$\text{H}_0 \text{: } \phi=1 \text{ vs H}_1 \text{: not H} _0$.

Generate $x_{1}^{*}, \ldots, x_{20}^{*}$, as an i.i.d sample from Poisson distribution with mean $\mu=2.55$.
Compute $\hat{\phi^{*}}=(\hat{s^*}^{2} / \hat{\bar{X^*}})$ from the data in part 1.
Repeat 1 and 2 a large number of times and consider the $\hat{\phi^{*}}$'s as a sample from the distribution of $\hat{\phi}$.
Compute P-value = the proportion of $\hat{\phi^{*}}$ > the observed $\hat{\phi}$.

P-value는 Type I error를 의미한다. 만약 귀무가설이 참일 때, 관찰된 $\phi$가 귀무가설을 기각할 확률, 즉 관찰된 값이 얼마나 극단적인지 설명하는 지표라고 생각할 수 있다.

책의 본문에서 소개한 몬테카를로 실험의 Step 1-4 은 다음의 의미를 갖는다.

귀무가설 (평균이 2.55이고, dispersion parameter = 1) 하에서 $n=20$의 데이터를 샘플링한다.
귀무가설 하에서 얻어진 데이터를 사용하여 Overdispersion parmeter $\phi$를 추정한다.
이 과정을 m번 반복한다.
우리가 the number of paint defects 데이터로부터 구한 $\hat{phi}$이 얼마나 극단적인 값인지, m개의 $\hat{\phi}^{m}$를 통해 확인한다.

이 과정을 R로 작성하면 다음과 같다.

set.seed(1)

obs <- c(0, 10, 1, 1, 1, 2, 1, 4, 11, 0, 5, 2, 
         5, 2, 0, 2, 0, 1, 3, 0)
obs.phi <- var(obs)/mean(obs)
mu=mean(obs)

n.vec=c(20, 100, 500)
m <- 500

p.val <- numeric(length(n.vec))
# * Large N
for ( i in 1:length(n.vec)) {
  # i=1
  phi.star <- numeric(m) # init.
  # 3. m번 반복
  for (j in 1:m) {
    # 1. 모집단의 분포가 Pois(2.55)를 따르는 ... x^* 생성
    x.star <- rpois(n=n.vec[i], lambda=mu) # if H0 is true
    # 2. 1의 자료로 phi 계산
    phi.star[j] <- var(x.star)/mean(x.star)
  }
  
  # 4. p-value 계산
  p.val[i] <- mean((phi.star > obs.phi)*1)
}

results <- data.frame("N" = n.vec, "p.value"=p.val)
results

results 표는 다음과 같이 나오며, p-value < 0.0001 보다 작다.

	N	p.value
1	20	0.000
2	100	0.000
3	500	0.000

저작자표시 비영리 변경금지

'R Programming > Analysis' 카테고리의 다른 글

[R] shapviz 패키지로 SHAP Value 구하고 해석하기 (0)	2024.08.22
[R] Auto Correlation 데이터 생성과 Durbin-Watson 검정 (1)	2023.12.28
[R] Cox 분석을 위한 생존 시간 데이터 생성 (시뮬레이션 코드) (2)	2023.12.22
[R] 부트스트랩 신뢰 구간 (Bootstrap Confidence Intervals) 계산 (2)	2023.11.27
[R] 랜덤포레스트 (randomForest)에 대한 모든 것 (1)	2023.11.16

[R] 몬테카를로 실험 기반의 검정 (monte carlo test)

Monte Carlo Test

Exercise 4.14

'R Programming > Analysis' 카테고리의 다른 글

관련글

티스토리툴바