현재 동기화를 중지하기로 한 drbd 서버가 있는데 다시 동기화 할 수있는 방법이 없습니다. 동기화 프로세스는 두 서버 간의 전용 크로스 오버 케이블 (1gbps 구리선)을 통해 발생합니다. 여기ProtocolError로 drbd 동기화가 실패합니다.
내가 R01의 로그에서 볼 것입니다 :
Aug 9 16:09:44 r02 kernel: [12739.178449] block drbd0: receiver (re)started
Aug 9 16:09:44 r02 kernel: [12739.178454] block drbd0: conn(Unconnected -> WFConnection)
Aug 9 16:09:44 r02 kernel: [12739.912037] block drbd0: Handshake successful: Agreed network protocol version 91
Aug 9 16:09:44 r02 kernel: [12739.912048] block drbd0: conn(WFConnection -> WFReportParams)
Aug 9 16:09:44 r02 kernel: [12739.912074] block drbd0: Starting asender thread (from drbd0_receiver [3740])
Aug 9 16:09:44 r02 kernel: [12739.936681] block drbd0: data-integrity-alg: <not-used>
Aug 9 16:09:44 r02 kernel: [12739.936691] block drbd0: Considerable difference in lower level device sizes: 256503768s vs. 1344982880s
Aug 9 16:09:44 r02 kernel: [12739.942918] block drbd0: drbd_sync_handshake:
Aug 9 16:09:44 r02 kernel: [12739.942923] block drbd0: self E17D2EE7BC2C235E:0000000000000000:0000000000000000:0000000000000000 bits:32062701 flags:0
Aug 9 16:09:44 r02 kernel: [12739.942928] block drbd0: peer E21F17F92705CD4F:E17D2EE7BC2C235F:1074ED292C876258:548AFBCD7D5C2C3B bits:32062701 flags:0
Aug 9 16:09:44 r02 kernel: [12739.942933] block drbd0: uuid_compare()=-1 by rule 50
Aug 9 16:09:44 r02 kernel: [12739.942935] block drbd0: Becoming sync target due to disk states.
Aug 9 16:09:44 r02 kernel: [12739.942946] block drbd0: peer(Unknown -> Primary) conn(WFReportParams -> WFBitMapT) pdsk(DUnknown -> UpToDate)
Aug 9 16:09:44 r02 kernel: [12740.099597] block drbd0: conn(WFBitMapT -> WFSyncUUID)
Aug 9 16:09:44 r02 kernel: [12740.104324] block drbd0: updated sync uuid BF8D25FBE26085B0:0000000000000000:0000000000000000:0000000000000000
Aug 9 16:09:44 r02 kernel: [12740.104423] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0
Aug 9 16:09:44 r02 kernel: [12740.106582] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
Aug 9 16:09:44 r02 kernel: [12740.106591] block drbd0: conn(WFSyncUUID -> SyncTarget)
Aug 9 16:09:44 r02 kernel: [12740.106599] block drbd0: Began resync as SyncTarget (will sync 128250804 KB [32062701 bits set]).
Aug 9 16:09:44 r02 kernel: [12740.140796] block drbd0: meta connection shut down by peer.
Aug 9 16:09:44 r02 kernel: [12740.141304] block drbd0: sock was shut down by peer
Aug 9 16:09:44 r02 kernel: [12740.141309] block drbd0: peer(Primary -> Unknown) conn(SyncTarget -> BrokenPipe) pdsk(UpToDate -> DUnknown)
Aug 9 16:09:44 r02 kernel: [12740.141316] block drbd0: short read expecting header on sock: r=0
Aug 9 16:09:44 r02 kernel: [12740.142235] block drbd0: asender terminated
Aug 9 16:09:44 r02 kernel: [12740.142238] block drbd0: Terminating drbd0_asender
Aug 9 16:09:44 r02 kernel: [12740.151561] block drbd0: bitmap WRITE of 979 pages took 2 jiffies
Aug 9 16:09:44 r02 kernel: [12740.151567] block drbd0: 122 GB (32062701 bits) marked out-of-sync by on disk bit-map.
Aug 9 16:09:44 r02 kernel: [12740.151580] block drbd0: Connection closed
Aug 9 16:09:44 r02 kernel: [12740.151586] block drbd0: conn(BrokenPipe -> Unconnected)
Aug 9 16:09:44 r02 kernel: [12740.151592] block drbd0: receiver terminated
그리고 R01에 대한
는 :Aug 9 16:09:44 r01 kernel: [3438273.766768] block drbd0: receiver (re)started
Aug 9 16:09:44 r01 kernel: [3438273.771898] block drbd0: conn(Unconnected -> WFConnection)
Aug 9 16:09:44 r01 kernel: [3438274.474411] block drbd0: Handshake successful: Agreed network protocol version 91
Aug 9 16:09:44 r01 kernel: [3438274.483299] block drbd0: conn(WFConnection -> WFReportParams)
Aug 9 16:09:44 r01 kernel: [3438274.490420] block drbd0: Starting asender thread (from drbd0_receiver [6366])
Aug 9 16:09:44 r01 kernel: [3438274.498900] block drbd0: data-integrity-alg: <not-used>
Aug 9 16:09:44 r01 kernel: [3438274.505166] block drbd0: Considerable difference in lower level device sizes: 1344982880s vs. 256503768s
Aug 9 16:09:44 r01 kernel: [3438274.516226] block drbd0: max_segment_size (= BIO size) = 65536
Aug 9 16:09:44 r01 kernel: [3438274.523385] block drbd0: drbd_sync_handshake:
Aug 9 16:09:44 r01 kernel: [3438274.528677] block drbd0: self E21F17F92705CD4F:E17D2EE7BC2C235F:1074ED292C876258:548AFBCD7D5C2C3B bits:32062701 flags:0
Aug 9 16:09:44 r01 kernel: [3438274.541195] block drbd0: peer E17D2EE7BC2C235E:0000000000000000:0000000000000000:0000000000000000 bits:32062701 flags:0
Aug 9 16:09:44 r01 kernel: [3438274.553710] block drbd0: uuid_compare()=1 by rule 70
Aug 9 16:09:44 r01 kernel: [3438274.559677] block drbd0: Becoming sync source due to disk states.
Aug 9 16:09:44 r01 kernel: [3438274.566897] block drbd0: peer(Unknown -> Secondary) conn(WFReportParams -> WFBitMapS)
Aug 9 16:09:44 r01 kernel: [3438274.666397] block drbd0: conn(WFBitMapS -> SyncSource)
Aug 9 16:09:44 r01 kernel: [3438274.672845] block drbd0: Began resync as SyncSource (will sync 128250804 KB [32062701 bits set]).
Aug 9 16:09:44 r01 kernel: [3438274.683196] block drbd0: /build/buildd-linux-2.6_2.6.32-48squeeze3-amd64-mcoLgp/linux-2.6-2.6.32/debian/build/source_amd64_none/drivers/block/drbd/drbd_receiver.c:1932: sector: 0s, size: 65536
Aug 9 16:09:45 r01 kernel: [3438274.702834] block drbd0: error receiving RSDataRequest, l: 24!
Aug 9 16:09:45 r01 kernel: [3438274.702837] block drbd0: peer(Secondary -> Unknown) conn(SyncSource -> ProtocolError)
Aug 9 16:09:45 r01 kernel: [3438274.703005] block drbd0: asender terminated
Aug 9 16:09:45 r01 kernel: [3438274.703009] block drbd0: Terminating drbd0_asender
Aug 9 16:09:45 r01 kernel: [3438274.711319] block drbd0: Connection closed
Aug 9 16:09:45 r01 kernel: [3438274.711323] block drbd0: conn(ProtocolError -> Unconnected)
Aug 9 16:09:45 r01 kernel: [3438274.711329] block drbd0: receiver terminated
이것은 단지 반복 반복합니다.
다음r01:~$ cat /etc/drbd.conf
global {
usage-count no;
}
resource drbd0 {
protocol C;
handlers { pri-on-incon-degr "echo '!DRBD! pri on incon-degr' | wall ; exit 1"; }
startup {
degr-wfc-timeout 60; # 1 minute.
wfc-timeout 55;
}
disk {
on-io-error detach;
}
syncer {
rate 100M;
al-extents 257;
}
on r01.c07.mtsvc.net {
device /dev/drbd0;
disk /dev/cciss/c0d0p3;
address 10.0.255.253:7788;
meta-disk internal;
}
on r02.c07.mtsvc.net {
device /dev/drbd0;
disk /dev/cciss/c0d0p6;
address 10.0.255.254:7788;
meta-disk internal;
}
}
네트워크 설정이 양쪽에 모습입니다 :
r01:~$ rsync --dry-run --verbose --checksum --itemize-changes 10.0.255.254:/etc/drbd.conf /etc/
sent 11 bytes received 51 bytes 124.00 bytes/sec
total size is 615 speedup is 9.92 (DRY RUN)
이것은 설정의 모습입니다 :
는 config는 그것이 있어야로 두 서버에서 동일 :
r01:~$ sudo ifconfig -a | grep -B 2 -A 8 10.0.255
eth2 Link encap:Ethernet HWaddr 00:26:55:d6:f8:fc
inet addr:10.0.255.253 Bcast:10.0.255.255 Mask:255.255.255.0
inet6 addr: fe80::226:55ff:fed6:f8fc/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4062510240 errors:0 dropped:0 overruns:0 frame:0
TX packets:5692251259 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:5512604514975 (5.0 TiB) TX bytes:5820995499388 (5.2 TiB)
Interrupt:24 Memory:fbe80000-fbea0000
r01:~$ sudo ifconfig -a | grep -B 2 -A 8 10.0.255
eth2 Link encap:Ethernet HWaddr 00:1b:78:5c:a8:fd
inet addr:10.0.255.254 Bcast:10.0.255.255 Mask:255.255.255.252
inet6 addr: fe80::21b:78ff:fe5c:a8fd/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:321977747 errors:0 dropped:0 overruns:0 frame:0
TX packets:264683964 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:332813827055 (309.9 GiB) TX bytes:328142295363 (305.6 GiB)
Interrupt:17 Memory:fdfa0000-fdfc0000
원래 r01과 r02 모두 Debian Squeeze (drbd 8.3.7)를 실행하고있었습니다. 그런 다음 데비안 휘지 (rb)와 함께 r02를 재구성했습니다 (drbd 8.3.13). 며칠 동안 일이 순조로웠다가 drbd를 다시 시작한 후이 문제가 시작되었습니다. 나는 다른 여러 drbd 클러스터를 가지고 있는데이 같은 방식으로 업그레이드를하고있다. 그들 중 일부는 완전히 Wheezy로 업그레이드되고, 나머지는 여전히 절반짜리, 절반은 Wheezy이며 괜찮습니다.
지금까지이 문제를 해결하기 위해 노력한 사항이 있습니다.
- 는 R02에 DRBD 볼륨을 닦고,
- 닦아 재 동기화 다시 설치하고 재구성 R02하려고합니다.
- r02를 다른 하드웨어로 교체하고 처음부터 다시 빌드하십시오.
- 내가 100 % 다른 하드웨어와 R01를 대체 할 것이다 다음 serveral 일 동안 (2 회) 크로스 오버 케이블
를 교체합니다. 그러나 그것이 작동하더라도, 나는 여전히 손실에 처해있다. 이 문제를 일으킨 원인과이를 해결할 수있는 올바른 방법을 이해하고 싶습니다.
두 노드에서 DRBD 버전이 동일합니까? 커널 모듈 버전 (및 자식 해시)은'/ proc/drbd'에 표시되어야합니다. – Dok