R의 비선형 최적화에 대한 부트 스트랩 매개 변수 추정 : 일반 매개 변수 추정과 다른 이유는 무엇입니까?

여기 내 질문의 짧은 버전이 있습니다. 코드는 다음과 같습니다.R의 비선형 최적화에 대한 부트 스트랩 매개 변수 추정 : 일반 매개 변수 추정과 다른 이유는 무엇입니까?

optim()을 사용하여 R에서 비선형 폰 Bertalanffy 성장 방정식에 대한 매개 변수를 계산했는데 이제는 부트 스트랩으로 von B 성장 계수 K에 95 % 신뢰 구간을 추가하려고합니다. 내가 성장 계수 K의 부트 스트랩 출력을 요약 데이터의 년 중 적어도 하나를 들어, 부트 스트랩에서 평균과 중앙값 매개 변수 추정은 추정 파라미터에 비해 상당히 다르다 :

>summary(temp.store) # summary of bootstrap values 
    Min. 1st Qu. Median  Mean 3rd Qu.  Max. 
0.002449 0.005777 0.010290 0.011700 0.016970 0.056720 

> est.K [1] 0.01655956 # point-estimate from the optimization

나는 불일치가 의심 무작위 추출의 부트 스트랩에는 결과에 편향을주는 오류가 있기 때문에 try()를 사용하여 오류를 유발하는 입력 값의 조합이있을 때 충돌이 최적화되지 않도록했습니다. 그래서 그 문제를 해결하기 위해 무엇을해야하는지 알고 싶습니다. 맞는 곡선이 올바르게 보이기 때문에 나는 올바르게 일을하고 있다고 생각합니다.

또 다른 해의 데이터에 대해이 코드를 실행했으며 적어도 다른 해에는 부트 랩 예상값과 정기적 추정값이 매우 비슷합니다.

장황한 버전 :

길이에 대한 폰 Bertalanffy 성장 곡선 (VBGC)이 주어진다 : L (t) = L.inf * 1 - EXP (-K * (t-T0))] (식 3.1.0.1, FAO)

여기서 L (t)는 물고기의 길이, L.inf는 점근선의 최대 길이, K는 성장 계수, t는 시간 간격이고 t0는 성장이 시작되었습니다. L (t)와 t는 관측 된 자료이다. 일반적으로 시간 또는 연령은 년 단위로 측정되지만, 여기서는 어린 물고기 데이터를보고 1 월 1 일부터 1 일까지 'doy'로 시작합니다.

최적화, 나는 VBGC 방정식의 선형화를 사용했습니다.

vbl.ssq <- function(theta, data){ 
    linf=theta[1]; k=theta[2]; t0=theta[3] 
    # name variables for ease of use 
    obs.length=data$len 
    age=data$doy 
    #von B equation 
    pred.length=linf*(1-exp(-k*(age-t0))) 
    #sums of squares 
    ssq=sum((obs.length-pred.length)^2) 
}

추정 매개 변수 폰 Bertalanffy 성장 방정식에 대한

#Get starting parameter values 
theta_init <- get.init(dframe=data06) 
# optimize VBGC by minimizing sums of square differences 
len.fit <- optim(par=theta_init, fn=vbl.ssq, method="BFGS", data=data06) 

est.linf <- len.fit$par[1] # vonB len-infinite 
est.K <- len.fit$par[2]  # vonB K 
est.t0 <- len.fit$par[3]  # vonB t0

부트 스트랩을 사각형의

get.init <-function(dframe){ # linearization of the von B 
    l.inf <- 80 # by eyeballing max juvenile fish 
    # make a response variable and store it in the data frame: 
    # Eqn. 3.3.3.1 in FAO document 
    dframe$vonb.y <- - log(1 - (dframe$len)/l.inf ) 
    lin.vonb <- lm(vonb.y ~ doy, data=dframe) 
    icept <- lin.vonb$coef[1] # 0.01534013 # intercept is a 
    slope <- k.lin <- lin.vonb$coef[2] # slope is the K param 
    t0 <- - icept/slope # get t0 from this relship: intercept= -K * t0 
    pars <- c(l.inf,as.numeric(slope),as.numeric(t0)) 
}

총액 :

doy <- c(156,205,228,276,319,380) 
len <- c(36,56,60,68,68,71) 
data06 <- data.frame(doy,len)

기능 최적화에 대한 매개 변수를 시작 얻을 수 오류의

#>summary(temp.store) 
#  Min. 1st Qu. Median  Mean 3rd Qu.  Max. 
# 0.002449 0.005777 0.010290 0.011700 0.016970 0.056720 
# 
# est.K [1] 0.01655956

예 :

Error in optim(par = init.par, fn = vbl.ssq, method = "BFGS", data = tmp.frame) : 
    non-finite finite-difference value [2]

나는 내가 믿지 않는 K 매개 변수 여기

k.ci <- quantile(temp.store,c(0.025,0.975)) 
#  2.5%  97.5% 
#0.004437702 0.019784178

에 대한

# set up for bootstrap loop 
tmp.frame <- data.frame() 
temp.store <- vector() 

# bootstrap to get 95% conf ints on growth coef K 
for (j in 1:1000){ 
    # choose indices at random, with replacement 
    indices <- sample(1:length(data06[,1]),replace=T) 
    # values from original data corresponding to those indices 
    new.len <- data06$len[indices] 
    new.doy <- data06$doy[indices] 
    tmp.frame <- data.frame(new.doy,new.len) 
    colnames(tmp.frame) <- c("doy","len") 

    init.par <- get.init(tmp.frame) 
    # now get the vonB params for the randomly selected samples 
    # using try() to keep optimizing errors from crashing the program 
    try( len.fit.bs <- optim(par=init.par, fn=vbl.ssq, method="BFGS", data=tmp.frame)) 
    tmp.k <- len.fit.bs$par[2] 
    temp.store[j] <- tmp.k 
}

95 % 신뢰 구간은 문제 야 최적화와 관련하여 오류 발생 합리적인 VBGC를 사용하십시오.여기 플롯은 다음과 같습니다

plot(x=data06$doy,y=data06$len,xlim=c(0,550),ylim=c(0,100)) 
legend(x="topleft",legend=paste("Length curve 2006"), bty="n") 
curve(est.linf*(1-exp(-est.K*(x-est.t0))), add=T,type="l") 

plot(x=2006,y=est.K, main="von B growth coefficient for length; 95% CIs", 
    ylim=c(0,0.025)) 
arrows(x0=2006,y0=k.ci[1],x1=2006,y1=k.ci[2], code=3, 
    angle=90,length=0.1)

출처

2016-11-02 ginko

너는 이것을 너무 어렵게 만들고있다. 이 자동 시작 모델에서는 nls를 사용하십시오. https://stat.ethz.ch/R-manual/R-devel/library/stats/html/SSasympOff.html 그런 다음 nlstools 패키지에서 nlsBoot 함수를 사용할 수 있습니다. – Roland

고마워, 나는 그것을 시험해 볼 것이다! 다른 지점 추정치와 중앙값 또는 중앙값 부트 스트랩 추정치의 문제도 해결할 것이라고 생각하십니까? – ginko

나는 내 전화 때문에 코드를 자세히 연구 할 수 없다. x 값의 분포를 보존하기 때문에 수행해야하는 나머지를 부트 스트래핑하는 것처럼 보이지 않습니다. nlsBoot가 나머지를 부트 스트랩합니다. – Roland

우선, 당신은 부트 스트랩 방법을 신뢰할 가능성이 너무 적은 값의 매우 작은 수를 가지고있다. 리샘플링으로 인해 뚜렷한 x 값이 부족하기 때문에 고전적인 부트 스트랩에서는 적합 비율이 높지 않습니다.

여기에 nls과 함께 자체 시작 모델과 부팅 패키지를 사용한 구현이 나와 있습니다.

보는 바와 같이, 부트 스트랩 CI는 프로파일 CI보다 더 넓고 더 좁은 추정 CI에서 잔차 결과 부트 스트랩이다

#classic bootstrap 
library(boot) 
set.seed(42) 
boot1 <- boot(data06, function(DF, i) { 
    tryCatch(coef(nls(len ~ SSasympOff(doy, Asym, lrc, c0), data = DF[i,])), 
      error = function(e) c(Asym = NA, lrc = NA, c0 = NA)) 
}, R = 1e3) 

#proportion of unsuccessful fits 
mean(is.na(boot1$t[, 1])) 
#[1] 0.256 

#bootstrap CI 
boot1CI <- apply(boot1$t, 2, quantile, probs = c(0.025, 0.5, 0.975), na.rm = TRUE) 
#   [,1]  [,2]  [,3] 
#2.5% 69.70360 -4.562608 67.60152 
#50% 71.56527 -4.100148 113.9287 
#97.5% 74.79921 -3.697461 151.03541 


#bootstrap of the residuals 
data06$res <- residuals(fit) 
data06$fit <- fitted(fit) 

set.seed(42) 
boot2 <- boot(data06, function(DF, i) { 
    DF$lenboot <- DF$fit + DF[i, "res"] 
    tryCatch(coef(nls(lenboot ~ SSasympOff(doy, Asym, lrc, c0), data = DF)), 
      error = function(e) c(Asym = NA, lrc = NA, c0 = NA)) 
}, R = 1e3) 

#proportion of unsuccessful fits 
mean(is.na(boot2$t[, 1])) 
#[1] 0 

#(residuals) bootstrap CI 
boot2CI <- apply(boot2$t, 2, quantile, probs = c(0.025, 0.5, 0.975), na.rm = TRUE) 
#   [,1]  [,2]  [,3] 
#2.5% 70.19380 -4.255165 106.3136 
#50% 71.56527 -4.100148 113.9287 
#97.5% 73.37461 -3.969012 119.2380 
proCI[2,1] 

CIs_k <- data.frame(lwr = c(exp(proCI[2, 1]), 
          exp(boot1CI[1, 2]), 
          exp(boot2CI[1, 2])), 
        upr = c(exp(proCI[2, 2]), 
          exp(boot1CI[3, 2]), 
          exp(boot2CI[3, 2])), 
        med = c(NA, 
          exp(boot1CI[2, 2]), 
          exp(boot2CI[2, 2])), 
        estimate = exp(coef(fit)[2]), 
        method = c("profile", "boot", "boot res")) 

library(ggplot2) 
ggplot(CIs_k, aes(y = estimate, ymin = lwr, ymax = upr, x = method)) + 
    geom_errorbar() + 
    geom_point(aes(color = "estimate"), size = 5) + 
    geom_point(aes(y = med, color = "boot median"), size = 5) + 
    ylab("k") + xlab("") + 
    scale_color_brewer(name = "", type = "qual", palette = 2) + 
    theme_bw(base_size = 22)

doy <- c(156,205,228,276,319,380) 
len <- c(36,56,60,68,68,71) 
data06 <- data.frame(doy,len) 

plot(len ~ doy, data = data06) 

fit <- nls(len ~ SSasympOff(doy, Asym, lrc, c0), data = data06) 
summary(fit) 
#profiling CI 
proCI <- confint(fit) 
#   2.5%  97.5% 
#Asym 68.290477 75.922174 
#lrc -4.453895 -3.779994 
#c0 94.777335 126.112523 

curve(predict(fit, newdata = data.frame(doy = x)), add = TRUE)

. 모두 대칭입니다. 또한 중앙값은 예상 포인트에 가깝습니다.

코드에서 무엇이 잘못되었는지 조사하기위한 첫 단계로, 프로 시저에서 실패한 부분의 비율을 조사해야합니다.

출처

2016-11-03 07:36:26 Roland

정말 도움이됩니다. 나는 또한 내가 왜 그것을하고 있었는지 부트 스트랩 신뢰 구간이 포인트 추정치 (9 년간의 성장 데이터)를 중심으로 대칭 적이 지 않은지 궁금해했다. – ginko

한 번 더 질문드립니다. 때로는 서로 다른 방법, nls(), optim()을 사용하여 비선형 적합도를 최적화하여 제곱의 합과 음의 로그 가능성을 최소화 한 다음 AICc에 따라 최적으로 맞 춥니 다. 여러 방법을 사용하지 않고 nls()를 사용하여 제거 할 수 있습니까? 어쨌든 항상 AICc에 따라 최상의 매개 변수를 제공합니다 (항상 그런 것은 아닙니다). (편집자가 @Roland를 사용하게하지 않습니다.) – ginko

피팅 모델에 따라 다른 옵티 마이저를 시도하는 것이 좋습니다. optimx 패키지는이를 위해 아주 좋습니다. 그러나 SSE 만 확인하면됩니다. – Roland

R의 비선형 최적화에 대한 부트 스트랩 매개 변수 추정 : 일반 매개 변수 추정과 다른 이유는 무엇입니까?

답변

관련 문제