불균형 설계 출력과의 R- 다중 비교는 Anova와 다릅니다

실험을 위해 식물을 잘라내어 잎 말량과 같은 반응을 계절의 끝에 측정했습니다. 클리핑 강도와 클리핑 시간을 조작하고이 두 가지 처리를 교차 시켰습니다. 나는 또한 5 개의 다른 클리핑 처리 조합을 초래하는 제어 클리핑 처리를 포함했습니다. 치료 당 12 개의 식물을 가지고 나는 2 년 동안 총 60 개의 식물을 가지고 있습니다. 즉, 나는 1 년에 60 개 식물과 2 년째에 같은 식물에 대한 측정을 수집했다.불균형 설계 출력과의 R- 다중 비교는 Anova와 다릅니다

타이밍이 "절대적이지 않고"강도가 "제로"인 임의로 "제어"치료를 대신 한 나의 디자인이 여기에있다. 분산 분석 출력 사이에 나에게 의미있는 상호 작용을 준

m1<-lmer(log(plant.leaf.g)~timing*intensity*year+(1|id), data=cmv) 

Anova(m1, type="III", test="F")

Year Timing intensity treatments 
2015 early high  early-high 
2015 early low  early-low 
2015 late high  late-high 
2015 late low  late-low 
2015 never zero  control 
2014 early high  early-high 
2014 early low  early-low 
2014 late high  late-high 
2014 late low  late-low 
2014 never zero  control

나는 lme4를 실행 한 경고를 무시하고 나중에 모델 상에 F-테스트를 실행하는 벤 Bolker 하나의 제안 (R- analyzing repeated measures unbalanced design with lme4?)를 따라 타이밍과 강도 (p = 0.006), 및 I를 이용하여 다중 비교 시험으로 추적 : 여기

cmv$SHD<-interaction(cmv$timing, cmv$intensity) 
m2<-lmer(log(plant.leaf.g)~-1+SHD+(1|id),data=cmv, na.action=na.exclude) 
summary(glht(m2, linfct=mcp(SHD="Tukey")))

유일한 유의 한 쌍 내 출력 클립 여기서 p = 0.08 :

       Estimate Std. Error z value Pr(>|z|) 
late.2014 - early.2014 == 0 -0.6584  0.3448 -1.910 0.3844 
never.2014 - early.2014 == 0 0.1450  0.4102 0.354 0.9992 
early.2015 - early.2014 == 0 -0.4906  0.2786 -1.761 0.4788 
late.2015 - early.2014 == 0 -0.1687  0.3494 -0.483 0.9965 
never.2015 - early.2014 == 0 0.4201  0.4079 1.030 0.9032 
never.2014 - late.2014 == 0 0.8034  0.4119 1.951 0.3597 
early.2015 - late.2014 == 0 0.1678  0.3419 0.491 0.9963 
late.2015 - late.2014 == 0  0.4897  0.2724 1.797 0.4553 
never.2015 - late.2014 == 0 1.0785  0.4119 2.618 0.0885 . 
early.2015 - never.2014 == 0 -0.6356  0.4074 -1.560 0.6133

이유 Anova는 타이밍 * 강도가 매우 중요하다고 생각했으나 다중 비교 테스트에서는 의미가 없었습니다. 여러 비교를 수행해야하는 또 다른 방법이 있습니까?

다른 다중 비교 출력에서 1.00000만큼 높은 p 값을 얻습니다. 정상입니까?

data<-structure(list(id = c(91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 
99L, 100L, 101L, 102L, 103L, 105L, 106L, 107L, 108L, 109L, 110L, 
111L, 112L, 113L, 114L, 115L, 116L, 117L, 119L, 120L, 121L, 122L, 
123L, 124L, 125L, 126L, 127L, 128L, 129L, 130L, 131L, 132L, 133L, 
134L, 135L, 136L, 137L, 138L, 139L, 140L, 141L, 142L, 143L, 144L, 
146L, 147L, 148L, 149L, 150L, 91L, 92L, 93L, 94L, 95L, 96L, 97L, 
98L, 99L, 100L, 101L, 102L, 103L, 105L, 106L, 107L, 108L, 109L, 
110L, 111L, 112L, 113L, 114L, 115L, 116L, 117L, 119L, 120L, 121L, 
122L, 123L, 124L, 125L, 126L, 127L, 128L, 129L, 130L, 131L, 132L, 
133L, 134L, 135L, 136L, 137L, 138L, 139L, 140L, 141L, 142L, 143L, 
144L, 146L, 147L, 148L, 149L, 150L), quad = c(2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), year = c(2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L), timing = structure(c(1L, 
3L, 2L, 1L, 1L, 2L, 3L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 3L, 1L, 1L, 
3L, 2L, 3L, 1L, 3L, 2L, 3L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 
1L, 3L, 3L, 2L, 2L, 1L, 2L, 3L, 2L, 1L, 1L, 2L, 2L, 3L, 1L, 2L, 
2L, 2L, 1L, 1L, 2L, 1L, 1L, 3L, 1L, 3L, 2L, 1L, 1L, 2L, 3L, 2L, 
2L, 2L, 2L, 1L, 1L, 2L, 3L, 1L, 1L, 3L, 2L, 3L, 1L, 3L, 2L, 3L, 
1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 2L, 2L, 1L, 2L, 
3L, 2L, 1L, 1L, 2L, 2L, 3L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 3L, 
1L), .Label = c("early", "late", "never"), class = "factor"), 
intensity = structure(c(2L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 
1L, 2L, 1L, 1L, 2L, 3L, 2L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 
1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 3L, 3L, 2L, 1L, 1L, 
1L, 3L, 1L, 1L, 2L, 2L, 1L, 3L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 
2L, 1L, 3L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L, 
1L, 2L, 3L, 2L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 2L, 1L, 
1L, 2L, 2L, 2L, 2L, 1L, 2L, 3L, 3L, 2L, 1L, 1L, 1L, 3L, 1L, 
1L, 2L, 2L, 1L, 3L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 3L, 2L 
), .Label = c("high", "low", "zero"), class = "factor"), 
treatment = structure(c(3L, 1L, 4L, 3L, 2L, 5L, 1L, 4L, 5L, 
4L, 5L, 2L, 2L, 5L, 1L, 3L, 2L, 1L, 4L, 1L, 2L, 1L, 4L, 1L, 
2L, 3L, 2L, 4L, 3L, 5L, 5L, 3L, 2L, 3L, 1L, 1L, 5L, 4L, 2L, 
4L, 1L, 4L, 2L, 3L, 5L, 4L, 1L, 3L, 4L, 5L, 4L, 2L, 3L, 5L, 
3L, 2L, 1L, 3L, 1L, 4L, 3L, 2L, 5L, 1L, 4L, 5L, 4L, 5L, 2L, 
2L, 5L, 1L, 3L, 2L, 1L, 4L, 1L, 2L, 1L, 4L, 1L, 2L, 3L, 2L, 
4L, 3L, 5L, 5L, 3L, 2L, 3L, 1L, 1L, 5L, 4L, 2L, 4L, 1L, 4L, 
2L, 3L, 5L, 4L, 1L, 3L, 4L, 5L, 4L, 4L, 3L, 5L, 2L, 1L, 3L 
), .Label = c("control", "early-high", "early-low", "late-high", 
"late-low"), class = "factor"), plant.leaf.g = c(846.216, 
382.704, 2393.088, 61.832, 1315.86, 275.816, 3705.862, 3500.52, 
67.482, 432, 487.492, 1228.618, 776.16, 1575, 735.9, 2417.75, 
1342.92, 2359.046, 686.726, 1385.856, 343.684, 2277.312, 
465.528, 2314.584, 508.4, 1243.644, 1064.448, 1020.646, NA, 
494.832, 1318.248, 1516.4, 1271.218, 512.512, 157.878, 3753.992, 
586.032, 1042.176, 889.632, 651.052, 498.042, 625.872, 16.28, 
497.51, 593.75, 706.84, 2238.742, 232.584, 671.532, 90.72, 
1412.442, 902.728, 3077.184, 619.106, 0.576, 400.452, 684.522, 
849.852, 152.76, 1280.448, 274.47, 387.614, 98.496, 2304.504, 
644.952, 35.392, 250.56, 267.33, 2212.08, 2392.596, 751.944, 
629.418, 731.544, 1013.196, 1516.4, 130.536, 2910.6, 554.4, 
2163.35, 223.86, 2369.376, 551.976, 985.6, 1482.24, 815.386, 
1664.132, 596.376, 1581.432, 217.128, 1041.656, 951.168, 
256.172, 1587.148, 359.448, 546.48, 1226.544, 371.64, 293.504, 
177.726, 343.26, 691.24, 207.604, 588.924, 1405.258, 136.17, 
451.432, 576.18, 424.804, 884.534, 2466.45, 1524.432, 973.208, 
369.474, 410.048)), .Names = c("id", "quad", "year", "timing", 
"intensity", "treatment", "plant.leaf.g"), class = "data.frame", row.names = c(NA, 
-114L))

ps. 나는이 삶의 균형이 맞지 않아서 일할 수 없었다. 많은 국가 표준 협회 (NAs)가 산출물에보고됩니다.

출처

2016-08-27 Angela P

'lsmeans'이'NA's를보고하면 결과가 유일하게 추정 할 수있는 것이 아니라는 것을 요구하고있다. 그 결과 일반적으로 일부 요인 조합에서 데이터가 없기 때문입니다. 모델에 몇 가지 상호 작용을 남기지 않으면 더 잘할 수 있습니다.처음에는 three-way interaction을 생략 한'log (plant.leaf.g) ~ (timing + intensity + year)^2 + (1 | id) '모델을 피팅 해보자. – rvl

데이터 목록을 자세히 보면 '연도'와 '타이밍'의 각 조합에 대해 '강도'값만 표시됩니다. 즉,이 세 가지 요소간에 선형 의존성이 있음을 의미합니다. 그 중 하나를 모델에서 완전히 벗어나게해야합니다. 나는 당신이하는 비교의 주요 요인이 아니므로'year'를 생략하는 것이 좋습니다. – rvl

코드'log (plant.leaf.g) ~ (timing + intensity + year)^2 + (1 | id)'를 사용하면 "고정 효과 모델 행렬은 5 열/계수 (coefficient) "와"의미 (means) "는 단지 'NA'만 생성한다. 또한 내 디자인을 더 잘 표현하기 위해 데이터를 편집합니다. 이는 내가 공유 한 하위 집합이므로 내 치료의 균형을 나타내지 않기 때문입니다. 컨트롤이 아닌 동안 타이밍과 강도가 완전히 교차됩니다. @ rvl –

나는 신중하게 질문을 읽어 보지 않았기 때문에, 나는 당신을 실현하지 않았다 당신이 (어떤 이유로, 사용자가 제공 한 데이터 세트는 다른 이름을 가지고 timing*intensity 대신에 treatment을 사용하는 모델에 lsmeans을 시도하고하지 않았다 SHD 대신 treatment). 수정하면 정상적으로 작동합니다.

> m3<-lmer(log(plant.leaf.g) ~ treatment+year+(1|id), data=data) 
> library(lsmeans) 
> lsmeans(m3, "treatment", type = "response") 
Loading required namespace: pbkrtest 
treatment response  SE df lower.CL upper.CL 
control 1017.7290 289.1544 62.29 576.7671 1795.8244 
early-high 909.2335 260.3904 68.44 513.4725 1610.0288 
early-low 388.1875 116.3790 65.92 213.3433 706.3242 
late-high 626.5379 176.6823 56.61 356.1791 1102.1134 
late-low 393.3225 125.4142 51.60 207.4053 745.8947 

Confidence level used: 0.95 
Intervals are back-transformed from the log scale 

> pairs(.Last.value, type = "response") 
contrast    response.ratio  SE df t.ratio p.value 
control - early-high  1.1193264 0.4457451 72.14 0.283 0.9986 
control - early-low   2.6217461 1.0685111 71.26 2.365 0.1371 
control - late-high   1.6243693 0.6501965 59.75 1.212 0.7445 
control - late-low   2.5875182 1.1050617 56.07 2.226 0.1853 
early-high - early-low  2.3422535 0.9581385 74.29 2.081 0.2394 
early-high - late-high  1.4512026 0.5762732 68.49 0.938 0.8811 
early-high - late-low  2.3116745 0.9907174 58.53 1.955 0.3006 
early-low - late-high  0.6195754 0.2549274 61.77 -1.163 0.7719 
early-low - late-low  0.9869446 0.4319594 57.88 -0.030 1.0000 
late-high - late-low  1.5929371 0.6780835 53.73 1.094 0.8090 

P value adjustment: tukey method for comparing a family of 5 estimates 
Tests are performed on the log scale

이제 원래 질문을 조금만 수정하십시오. 그래서 우리는 treatment ANOVA의 테스트가 약 0.05의 P 값을 갖는 것을

> library(car) 
> Anova(m3) 
Analysis of Deviance Table (Type II Wald chisquare tests) 

Response: log(plant.leaf.g) 
      Chisq Df Pr(>Chisq) 
treatment 9.5752 4 0.04823 
year  0.1147 1 0.73484

반면, 가장 작은 P의 값보다 약 0.13 인 것을 알 수있다. 거의 유의하지 않은 ANOVA $ F $ 테스트는 의 레벨 중 treatment의 대비가 Scheffe 임계 값을 기준으로 볼 때 매우 중요하다고 말하는 것과 같습니다. 그 대비가 쌍으로 된 비교이거나, 거의 하나 인 경우, 그 쌍의 비교가 중요 할 것입니다. 하지만 여기서는 그렇지 않습니다.상기 제 2 수단은 3과 5보다 훨씬 높은, 그리고 그 명암이 결과에 날 리드가 매우 중요하다 :

> contrast(lsmeans(m3, "treatment"), list(my.con = c(1, 1, -1, 0, -1))) 
contrast estimate  SE df t.ratio p.value 
my.con 1.801813 0.5910685 64.56 3.048 0.0033 

Results are given on the log (not the response) scale. 
Tests are performed on the log scale

다소 크게 높고 낮은의 비교가 나오는 또 다른 대비 강도 :

기억할 테스트는 쌍 비교가 나오는 방법에 대해 많은 것을 보장하지 않습니다.

출처

2016-08-29 02:11:37 rvl

감사합니다. 0, 1, -1이 코드'list (hi.lo = c (0, 1, -1, 1, -1)/2)'에서 나온 것과 왜 2로 나눈 것일까? –

제 다른 대답을 참조하십시오 ... – rvl

또 다른 발사. 당신은 영업 이익이 알고 있지만 단지 다른 사람에게 명확하게하기 위해, 여기에 이러한 요소가 관련이 있는지 살펴입니다 :

R> with(data, table(timing, intensity, year)) 
, , year = 2014 

     intensity 
timing high low zero 
    early 11 11 0 
    late 13 10 0 
    never 0 0 12 

, , year = 2015 

     intensity 
timing high low zero 
    early 12 11 0 
    late 12 10 0 
    never 0 0 12

공지 사항 intensity = "zero"와 timing = "never" 이러한 요소의 다른 수준의 특수 제어 조건 및입니다 조합으로 나타납니다. 그렇기 때문에 그것들을 별개의 요소로 취급하는 모델은 해석상 어려움을 낳습니다. 요인 treatment이 데이터 집합에 포함되어 실제로 발생하는 5 가지 조합을 레벨로 갖습니다. 일부 모델에서

봐는 (나는 잔차 그림을 보았고, 나는 제곱근 변환이 최고라고 생각) :

R> anova(m3, m4) 
refitting model(s) with ML (instead of REML) 
Data: data 
Models: 
m3: sqrt(plant.leaf.g) ~ treatment + year + (1 | id) 
m4: sqrt(plant.leaf.g) ~ treatment * year + (1 | id) 
    Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) 
m3 8 876.92 898.74 -430.46 860.92       
m4 12 872.14 904.87 -424.07 848.14 12.783  4 0.

가치가있을 것으로 보인다 :

R> library("lme4") 
R> m3 = lmer(sqrt(plant.leaf.g) ~ treatment + year + (1|id), data=data) 
R> m4 = lmer(sqrt(plant.leaf.g) ~ treatment * year + (1|id), data=data) 
Warning message: 
Some predictor variables are on very different scales: consider rescaling

이 두 모델을 비교 경고 메시지가 표시 되더라도 year과의 상호 작용을 포함합니다.

이 모델을 해석하려면 ANOVA 테이블이 제한적으로 사용되어야합니다. 모델이 예측하는 것을보고 적절한 비교를하는 것이 더 유익합니다. 이 예측은 "최소 제곱 평균"이라고합니다. 우리는 (때문에 year과의 상호 작용의) 매년 별도로 계산합니다 :

R> library("lsmeans") 
R> (m4.lsm = lsmeans(m4, ~ treatment | year, at = list(year = c(2014, 2015)), type = "response")) 
year = 2014: 
treatment response  SE df lower.CL upper.CL 
control 1082.1190 224.6040 91.46 681.9805 1574.2165 
early-high 1149.8090 236.6617 97.92 728.1153 1667.4202 
early-low 647.5407 180.8791 92.57 338.1453 1056.5696 
late-high 490.0813 148.4696 82.72 239.2544 829.8841 
late-low 485.0953 171.6529 73.61 203.3380 887.4499 

year = 2015: 
treatment response  SE df lower.CL upper.CL 
control 1393.5241 254.9350 91.35 933.1535 1945.8958 
early-high 831.3529 192.9245 97.47 492.5582 1258.3147 
early-low 746.0977 199.8967 96.30 402.0731 1195.6255 
late-high 1050.4552 225.8614 83.42 649.2805 1547.6725 
late-low 520.3968 177.7890 73.61 226.4119 934.9789 

Confidence level used: 0.95 
Intervals are back-transformed from the sqrt scale

마지막으로 우리는 그 (것)들의 사이에서 비교를 할 수 있습니다. 모든 pairwise 비교를 수행하는 것이 가능하지만, 아마도 더 유익한 것은 4d.f.의 해석 가능한 분석을 구성하는 대비를 구성하는 것입니다. treatment 님

R> trt.con = data.frame(
+  timing = c(0, 1, 1, -1, -1)/2, 
+  intensity = c(0, 1, -1, 1, -1)/2, 
+  tim.int = c(0, 1, -1, -1, 1)/2, 
+  ctl.vs.trt = c(4, -1, -1, -1, -1)/4, 
+  row.names = levels(data$treatment)) 
R> trt.con 
      timing intensity tim.int ctl.vs.trt 
control  0.0  0.0  0.0  1.00 
early-high 0.5  0.5  0.5  -0.25 
early-low  0.5  -0.5 -0.5  -0.25 
late-high -0.5  0.5 -0.5  -0.25 
late-low  -0.5  -0.5  0.5  -0.25

이들은 treatment 최소 제곱 수단에인가되는 계수들을 포함한다. 예를 들어,시기를 정하는 것은 두 개의 초기 타이밍의 평균을 두 개의 지연 타이밍의 평균과 비교합니다. 세 번째 열은 상호 작용이고 네 번째 열은 제어 조건을 다른 네 개의 평균과 비교합니다.

R> contrast(m4.lsm, trt.con) 
year = 2014: 
contrast  estimate  SE df t.ratio p.value 
timing  7.5964984 3.586509 86.29 2.118 0.0370 
intensity 4.2874559 3.569964 87.92 1.201 0.2330 
tim.int  4.1745568 3.508331 94.05 1.190 0.2371 
ctl.vs.trt 7.0159980 3.800634 95.97 1.846 0.0680 

year = 2015: 
contrast  estimate  SE df t.ratio p.value 
timing  0.4625233 3.606278 87.74 0.128 0.8982 
intensity 5.5584603 3.596573 88.42 1.545 0.1258 
tim.int -4.0400583 3.529456 94.91 -1.145 0.2552 
ctl.vs.trt 9.4872065 3.810418 95.75 2.490 0.0145

흥미로운 결과가 제어 조건이 모두 몇 년 동안 치료와 다른 몇 가지 증거가 있다고하고, timing 대비 만 2014

에 의미가 있다는 것입니다 (: 이제 우리는 이러한 대조를 테스트 할 수 있습니다 StackExchange보다 CrossValidated 스타일의 대답이되었다는 것을 알았지 만 궁극적으로는 프로그램을 실행하는 것보다 의미있는 일을하는 것이 중요합니다.)

출처

2016-08-29 17:16:29 rvl

이것은 데이터를 분석하는 정말 흥미로운 방법입니다. 아마도 내가 가진 디자인면에서 더 좋은 방법 일 것입니다. 그러나 다른 클리핑 시간을 컨트롤과 비교할 수 있고 다른 클리핑 강도도 비슷하게 비교할 수 있어야합니다. 추신 : 정말 멋진 모든 항목에 대해 감사드립니다. –

나는 이해한다. 마지막 콘트라스트를 다른 몇 가지, 예를 들어 c (1, -.5, -.5, 0, 0), c (1, 0, 0, -.5, -.5) 등으로 대체 할 수 있습니다. . – rvl

불균형 설계 출력과의 R- 다중 비교는 Anova와 다릅니다

답변

관련 문제