2016-08-10 3 views
0

현재 동기화를 중지하기로 한 drbd 서버가 있는데 다시 동기화 할 수있는 방법이 없습니다. 동기화 프로세스는 두 서버 간의 전용 크로스 오버 케이블 (1gbps 구리선)을 통해 발생합니다. 여기ProtocolError로 drbd 동기화가 실패합니다.

내가 R01의 로그에서 볼 것입니다 :

Aug 9 16:09:44 r02 kernel: [12739.178449] block drbd0: receiver (re)started 
Aug 9 16:09:44 r02 kernel: [12739.178454] block drbd0: conn(Unconnected -> WFConnection) 
Aug 9 16:09:44 r02 kernel: [12739.912037] block drbd0: Handshake successful: Agreed network protocol version 91 
Aug 9 16:09:44 r02 kernel: [12739.912048] block drbd0: conn(WFConnection -> WFReportParams) 
Aug 9 16:09:44 r02 kernel: [12739.912074] block drbd0: Starting asender thread (from drbd0_receiver [3740]) 
Aug 9 16:09:44 r02 kernel: [12739.936681] block drbd0: data-integrity-alg: <not-used> 
Aug 9 16:09:44 r02 kernel: [12739.936691] block drbd0: Considerable difference in lower level device sizes: 256503768s vs. 1344982880s 
Aug 9 16:09:44 r02 kernel: [12739.942918] block drbd0: drbd_sync_handshake: 
Aug 9 16:09:44 r02 kernel: [12739.942923] block drbd0: self E17D2EE7BC2C235E:0000000000000000:0000000000000000:0000000000000000 bits:32062701 flags:0 
Aug 9 16:09:44 r02 kernel: [12739.942928] block drbd0: peer E21F17F92705CD4F:E17D2EE7BC2C235F:1074ED292C876258:548AFBCD7D5C2C3B bits:32062701 flags:0 
Aug 9 16:09:44 r02 kernel: [12739.942933] block drbd0: uuid_compare()=-1 by rule 50 
Aug 9 16:09:44 r02 kernel: [12739.942935] block drbd0: Becoming sync target due to disk states. 
Aug 9 16:09:44 r02 kernel: [12739.942946] block drbd0: peer(Unknown -> Primary) conn(WFReportParams -> WFBitMapT) pdsk(DUnknown -> UpToDate) 
Aug 9 16:09:44 r02 kernel: [12740.099597] block drbd0: conn(WFBitMapT -> WFSyncUUID) 
Aug 9 16:09:44 r02 kernel: [12740.104324] block drbd0: updated sync uuid BF8D25FBE26085B0:0000000000000000:0000000000000000:0000000000000000 
Aug 9 16:09:44 r02 kernel: [12740.104423] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 
Aug 9 16:09:44 r02 kernel: [12740.106582] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) 
Aug 9 16:09:44 r02 kernel: [12740.106591] block drbd0: conn(WFSyncUUID -> SyncTarget) 
Aug 9 16:09:44 r02 kernel: [12740.106599] block drbd0: Began resync as SyncTarget (will sync 128250804 KB [32062701 bits set]). 
Aug 9 16:09:44 r02 kernel: [12740.140796] block drbd0: meta connection shut down by peer. 
Aug 9 16:09:44 r02 kernel: [12740.141304] block drbd0: sock was shut down by peer 
Aug 9 16:09:44 r02 kernel: [12740.141309] block drbd0: peer(Primary -> Unknown) conn(SyncTarget -> BrokenPipe) pdsk(UpToDate -> DUnknown) 
Aug 9 16:09:44 r02 kernel: [12740.141316] block drbd0: short read expecting header on sock: r=0 
Aug 9 16:09:44 r02 kernel: [12740.142235] block drbd0: asender terminated 
Aug 9 16:09:44 r02 kernel: [12740.142238] block drbd0: Terminating drbd0_asender 
Aug 9 16:09:44 r02 kernel: [12740.151561] block drbd0: bitmap WRITE of 979 pages took 2 jiffies 
Aug 9 16:09:44 r02 kernel: [12740.151567] block drbd0: 122 GB (32062701 bits) marked out-of-sync by on disk bit-map. 
Aug 9 16:09:44 r02 kernel: [12740.151580] block drbd0: Connection closed 
Aug 9 16:09:44 r02 kernel: [12740.151586] block drbd0: conn(BrokenPipe -> Unconnected) 
Aug 9 16:09:44 r02 kernel: [12740.151592] block drbd0: receiver terminated 

그리고 R01에 대한

는 :

Aug 9 16:09:44 r01 kernel: [3438273.766768] block drbd0: receiver (re)started 
Aug 9 16:09:44 r01 kernel: [3438273.771898] block drbd0: conn(Unconnected -> WFConnection) 
Aug 9 16:09:44 r01 kernel: [3438274.474411] block drbd0: Handshake successful: Agreed network protocol version 91 
Aug 9 16:09:44 r01 kernel: [3438274.483299] block drbd0: conn(WFConnection -> WFReportParams) 
Aug 9 16:09:44 r01 kernel: [3438274.490420] block drbd0: Starting asender thread (from drbd0_receiver [6366]) 
Aug 9 16:09:44 r01 kernel: [3438274.498900] block drbd0: data-integrity-alg: <not-used> 
Aug 9 16:09:44 r01 kernel: [3438274.505166] block drbd0: Considerable difference in lower level device sizes: 1344982880s vs. 256503768s 
Aug 9 16:09:44 r01 kernel: [3438274.516226] block drbd0: max_segment_size (= BIO size) = 65536 
Aug 9 16:09:44 r01 kernel: [3438274.523385] block drbd0: drbd_sync_handshake: 
Aug 9 16:09:44 r01 kernel: [3438274.528677] block drbd0: self E21F17F92705CD4F:E17D2EE7BC2C235F:1074ED292C876258:548AFBCD7D5C2C3B bits:32062701 flags:0 
Aug 9 16:09:44 r01 kernel: [3438274.541195] block drbd0: peer E17D2EE7BC2C235E:0000000000000000:0000000000000000:0000000000000000 bits:32062701 flags:0 
Aug 9 16:09:44 r01 kernel: [3438274.553710] block drbd0: uuid_compare()=1 by rule 70 
Aug 9 16:09:44 r01 kernel: [3438274.559677] block drbd0: Becoming sync source due to disk states. 
Aug 9 16:09:44 r01 kernel: [3438274.566897] block drbd0: peer(Unknown -> Secondary) conn(WFReportParams -> WFBitMapS) 
Aug 9 16:09:44 r01 kernel: [3438274.666397] block drbd0: conn(WFBitMapS -> SyncSource) 
Aug 9 16:09:44 r01 kernel: [3438274.672845] block drbd0: Began resync as SyncSource (will sync 128250804 KB [32062701 bits set]). 
Aug 9 16:09:44 r01 kernel: [3438274.683196] block drbd0: /build/buildd-linux-2.6_2.6.32-48squeeze3-amd64-mcoLgp/linux-2.6-2.6.32/debian/build/source_amd64_none/drivers/block/drbd/drbd_receiver.c:1932: sector: 0s, size: 65536 
Aug 9 16:09:45 r01 kernel: [3438274.702834] block drbd0: error receiving RSDataRequest, l: 24! 
Aug 9 16:09:45 r01 kernel: [3438274.702837] block drbd0: peer(Secondary -> Unknown) conn(SyncSource -> ProtocolError) 
Aug 9 16:09:45 r01 kernel: [3438274.703005] block drbd0: asender terminated 
Aug 9 16:09:45 r01 kernel: [3438274.703009] block drbd0: Terminating drbd0_asender 
Aug 9 16:09:45 r01 kernel: [3438274.711319] block drbd0: Connection closed 
Aug 9 16:09:45 r01 kernel: [3438274.711323] block drbd0: conn(ProtocolError -> Unconnected) 
Aug 9 16:09:45 r01 kernel: [3438274.711329] block drbd0: receiver terminated 

이것은 단지 반복 반복합니다.

다음
r01:~$ cat /etc/drbd.conf 
global { 
    usage-count no; 
} 

resource drbd0 { 
    protocol C; 
    handlers { pri-on-incon-degr "echo '!DRBD! pri on incon-degr' | wall ; exit 1"; } 
    startup { 
    degr-wfc-timeout 60; # 1 minute. 
    wfc-timeout 55; 
    } 

    disk { 
    on-io-error detach; 
    } 

    syncer { 
    rate 100M; 
    al-extents 257; 
    } 

    on r01.c07.mtsvc.net { 
    device  /dev/drbd0; 
    disk  /dev/cciss/c0d0p3; 
    address 10.0.255.253:7788; 
    meta-disk internal; 
    } 

    on r02.c07.mtsvc.net { 
    device  /dev/drbd0; 
    disk  /dev/cciss/c0d0p6; 
    address 10.0.255.254:7788; 
    meta-disk internal; 
    } 
} 

네트워크 설정이 양쪽에 모습입니다 :

r01:~$ rsync --dry-run --verbose --checksum --itemize-changes 10.0.255.254:/etc/drbd.conf /etc/ 

sent 11 bytes received 51 bytes 124.00 bytes/sec 
total size is 615 speedup is 9.92 (DRY RUN) 

이것은 설정의 모습입니다 :

는 config는 그것이 있어야로 두 서버에서 동일 :

r01:~$ sudo ifconfig -a | grep -B 2 -A 8 10.0.255 

eth2  Link encap:Ethernet HWaddr 00:26:55:d6:f8:fc 
      inet addr:10.0.255.253 Bcast:10.0.255.255 Mask:255.255.255.0 
      inet6 addr: fe80::226:55ff:fed6:f8fc/64 Scope:Link 
      UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 
      RX packets:4062510240 errors:0 dropped:0 overruns:0 frame:0 
      TX packets:5692251259 errors:0 dropped:0 overruns:0 carrier:0 
      collisions:0 txqueuelen:1000 
      RX bytes:5512604514975 (5.0 TiB) TX bytes:5820995499388 (5.2 TiB) 
      Interrupt:24 Memory:fbe80000-fbea0000 

r01:~$ sudo ifconfig -a | grep -B 2 -A 8 10.0.255 

eth2  Link encap:Ethernet HWaddr 00:1b:78:5c:a8:fd 
      inet addr:10.0.255.254 Bcast:10.0.255.255 Mask:255.255.255.252 
      inet6 addr: fe80::21b:78ff:fe5c:a8fd/64 Scope:Link 
      UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 
      RX packets:321977747 errors:0 dropped:0 overruns:0 frame:0 
      TX packets:264683964 errors:0 dropped:0 overruns:0 carrier:0 
      collisions:0 txqueuelen:1000 
      RX bytes:332813827055 (309.9 GiB) TX bytes:328142295363 (305.6 GiB) 
      Interrupt:17 Memory:fdfa0000-fdfc0000 

원래 r01과 r02 모두 Debian Squeeze (drbd 8.3.7)를 실행하고있었습니다. 그런 다음 데비안 휘지 (rb)와 함께 r02를 재구성했습니다 (drbd 8.3.13). 며칠 동안 일이 순조로웠다가 drbd를 다시 시작한 후이 문제가 시작되었습니다. 나는 다른 여러 drbd 클러스터를 가지고 있는데이 같은 방식으로 업그레이드를하고있다. 그들 중 일부는 완전히 Wheezy로 업그레이드되고, 나머지는 여전히 절반짜리, 절반은 Wheezy이며 괜찮습니다.

지금까지이 문제를 해결하기 위해 노력한 사항이 있습니다.

  • 는 R02에 DRBD 볼륨을 닦고,
  • 닦아 재 동기화 다시 설치하고 재구성 R02하려고합니다.
  • r02를 다른 하드웨어로 교체하고 처음부터 다시 빌드하십시오.
  • 내가 100 % 다른 하드웨어와 R01를 대체 할 것이다 다음 serveral 일 동안 (2 회) 크로스 오버 케이블

를 교체합니다. 그러나 그것이 작동하더라도, 나는 여전히 손실에 처해있다. 이 문제를 일으킨 원인과이를 해결할 수있는 올바른 방법을 이해하고 싶습니다.

+0

두 노드에서 DRBD 버전이 동일합니까? 커널 모듈 버전 (및 자식 해시)은'/ proc/drbd'에 표시되어야합니다. – Dok

답변

0

많은 것들이 DRBD에서 8.3.7과 8.3.13 사이에서 변경되었습니다. 방법 resyncs 작업에 대한 주요 변경 사항을 포함 : https://blogs.linbit.com/p/128/drbd-sync-rate-controller/

있습니다 (그래서, 긴 링크 {} 섹션) 리소스 구성에서 비 필요한 설정을 제거하려고 DRBD 조정할 수 : 여전히 않으면 # drbdadm adjust all

을 연결하려면 이전 노드를 업그레이드해야 동기화 할 수 있습니다. http://www.drbd.org/download/drbd/8.3/drbd-8.3.13.tar.gz

+0

'drbdadm adjust all'다음에 config 수정을 시도해보십시오. 나는 r01을 업그레이드하면 나에게 효과가있을 것이라고 확신하지만, 그렇게하기 위해서는 우선 할 수있는 r02를 실행해야한다. 그러나 엄청난 고통이있다. 최종 결과가 무엇인지 알려 드리겠습니다. –

+0

구성을 수정하고 조정을 수행해도 차이가 발생하지 않았습니다. 매우 이상한 8.3.7과 8.3.13은 나를 위해 여러 번 훌륭하게 작동했지만,이 경우에는 작동하지 않을 것입니다. –

관련 문제