...
Code Block |
---|
00000102 10/27/2020 13:34:16.2073112 [NetlinkWorkThread] [NETLINK:20] bsr_info<6> [0xFFFFE001428903C0] BSR_ADM_PRIMARY:14 command execution terminated. (pending command:1) 00000103 10/27/2020 13:34:16.3024186 [bsr_uuid_compare] [RESYNC/OV:197] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x80): Local and peer current UUIDs are the same. rule(40), res(0) 00000104 10/27/2020 13:34:16.3024298 [bsr_resync_authoritative] [RESYNC/OV:93] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(UpToDate), prpl(Established), pdvflag(0x100600): Becoming WFBitMapS due to authoritative node changed 00000105 10/27/2020 13:34:16.3024466 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(UpToDate), prpl(Established), pdvflag(0x100600): change_repl_state, pdsk( UpToDate -> Consistent ) repl( Established -> WFBitMapS ) 00000106 10/27/2020 13:34:16.3024729 [receive_state] [STATE:8] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Consistent), prpl(WFBitMapS), pdvflag(0x100600): Receive new repl state "WFBitMapS" 00000107 10/27/2020 13:34:16.3024751 [bsr_queue_bitmap_io] [RESYNC/OV:33] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x80): send_bitmap (WFBitMapS), worker.task(FFFFE001426CD990), current(FFFFE001426CD990) 00000108 10/27/2020 13:34:16.3024839 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Consistent), prpl(WFBitMapS), pdvflag(0x100600): receive_state, pdsk( Consistent -> Outdated ) 00000109 10/27/2020 13:34:16.5009976 [INFO_bm_xfer_stats] [BITMAP:50] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Outdated), prpl(WFBitMapS), pdvflag(0x100600): send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% 00000110 10/27/2020 13:34:16.5036471 [INFO_bm_xfer_stats] [BITMAP:50] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Outdated), prpl(WFBitMapS), pdvflag(0x100600): receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% 00000111 10/27/2020 13:34:16.5036520 [receive_bitmap_finished] [BITMAP:52] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Outdated), prpl(WFBitMapS), pdvflag(0x100600): Send that bitmap exchange has been completed 00000112 10/27/2020 13:34:16.7011344 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Outdated), prpl(WFBitMapS), pdvflag(0x100600): bsr_start_resync, pdsk( Outdated -> Inconsistent ) repl( WFBitMapS -> SyncSource ) 00000113 10/27/2020 13:34:16.7011903 [bsr_start_resync] [RESYNC/OV:144] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): Began resync as SyncSource (will sync 0 KB [0 bits set]). 00000114 10/27/2020 13:34:16.7049274 [receive_state] [STATE:8] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): Receive new repl state "SyncSource" 00000115 10/27/2020 13:34:17.1004158 [__receive_uuids] [UUID:16] bsr_warn<4> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): receive new current but not update UUID: CE5CC8B0672E645C 00000116 10/27/2020 13:34:17.1004241 [bsr_uuid_compare] [RESYNC/OV:197] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x80): Local and peer current UUIDs are the same. rule(40), res(0) 00000117 10/27/2020 13:34:17.6882020 [bsr_resync_finished] [RESYNC/OV:116] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): Resync done (total 1 sec; paused 0 sec; 0 K/sec), hit bit (in sync 0; marked rl 0) 00000118 10/27/2020 13:34:17.6882313 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): bsr_resync_finished, pdsk( Inconsistent -> UpToDate ) repl( SyncSource -> Established ) 00000119 10/27/2020 13:34:17.6883402 [bsr_queue_bitmap_io] [RESYNC/OV:33] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x80): write from resync_finished, worker.task(FFFFE001426CD990), current(FFFFE001426CD990) 00000120 10/27/2020 13:34:17.6891134 [receive_state] [STATE:8] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(UpToDate), prpl(Established), pdvflag(0x100600): Receive new repl state "Established" |
다음 로그를 보면 연결 종료 시점에 send timeout 로그를 확인 할 수 있으며 로그 분류는 SOCKET, 로그 인덱스는 47인걸 확인 할 수 있습니다.
Code Block |
---|
00000124 10/27/2020 13:34:32.0972611 [Send] [SOCKET:47] bsr_info<6> [0xFFFFE00143DB5040] Send not completed in time-out(5000ms). current state 8(0xFFFFE00143F82198) size(4096) |
로그 분류 SOCKET, 로그 인덱스 47 의 로그 사전 설명은 사전은 다음과 같습니다.
47 | info | Send not completed in time-out(%dms). current state %d(0x%p) size(%lu) | 전송 대기시간을 초과했습니다. | bsr 소켓 전송 대기시간은 resource 설정중 net 섹션에서 timeout 으로 설정 할 수 있습니다. |
로그 사전 설명을 보면 전송 대기 시간을 초과했을때 발생하는 로그이며 로그에서 5000ms을 대기했지만 전송이 완료되지 않아 연결이 종료되는걸 알 수 있습니다. 또한 5000ms 대기했지만 전송이 완료되지 않는다면 원인으로는 네트워크 대역폭이 부족하거나 많은 네트워크 트래픽이 과도하게 발생했을 가능성이 높아 보입니다높습니다.
send timeout을 해결을 위해서는 위해서 네트워크 대역폭 확보가 필요하지만 임시방편으로 timeout 시간을 늘려 bsr에서 적용 할 수 있는 임시방편으로는 timeout 설정 시간을 증가시켜 해결 할 수 있으며 있습니다. timeout 시간을 늘리는 방법은 로그 사전 비고에 작성되어있는 resource 설정 중 net 섹션에서 timeout설정을 통해 증가시킬 수 있습니다.
Code Block |
---|
00000124 10/27/2020 13:34:32.0972611 [Send] [SOCKET:47] bsr_info<6> [0xFFFFE00143DB5040] Send not completed in time-out(5000ms). current state 8(0xFFFFE00143F82198) size(4096)
00000125 10/27/2020 13:34:32.0973540 [print_state_change] [STATE:19] bsr_info<6> bsr r0 pnode-id:1, cs(Connected), prole(Secondary), cflag(0xe), scf(0x21): bsr_sender, conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
00000126 10/27/2020 13:34:32.0973579 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(UpToDate), prpl(Established), pdvflag(0x100600): bsr_sender, pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
00000127 10/27/2020 13:34:32.0973703 [Receive] [SOCKET:66] bsr_info<6> [0xFFFFE00143F91880] receiver => Recv multiWait error. err(STATUS_CONNECTION_ABORTED) wsk(0xFFFFE00143F82198) size(16)
00000128 10/27/2020 13:34:32.0973749 [bsr_recv] [SOCKET:14] bsr_info<6> bsr r0 pnode-id:1, cs(Connected), prole(Secondary), cflag(0xe), scf(0x21): Data stream socket reset by peer.
00000129 10/27/2020 13:34:32.9202027 [bsr_ack_receiver] [CONNECTION:16] bsr_info<6> bsr r0 pnode-id:1, cs(NetworkFailure), prole(Unknown), cflag(0xe), scf(0x0): ack_receiver terminated
00000130 10/27/2020 13:34:32.9202081 [bsr_thread_setup] [THREAD:9] bsr_info<6> bsr r0 pnode-id:1, cs(NetworkFailure), prole(Unknown), cflag(0xe), scf(0x0): Terminating ack_recv thread
00000131 10/27/2020 13:34:32.9203724 [w_resync_timer] [RESYNC/OV:105] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(DUnknown), prpl(Off), pdvflag(0x0): The resync is complete because it is not in VerifyS or SyncTarget replication state.
00000135 10/27/2020 13:34:32.9217385 [conn_disconnect] [CONNECTION:9] bsr_info<6> bsr r0 pnode-id:1, cs(NetworkFailure), prole(Unknown), cflag(0xe), scf(0x0): Connection closed
00000136 10/27/2020 13:34:32.9217777 [print_state_change] [STATE:19] bsr_info<6> bsr r0 pnode-id:1, cs(NetworkFailure), prole(Unknown), cflag(0xe), scf(0x22): conn_disconnect, conn( NetworkFailure -> Unconnected )
00000137 10/27/2020 13:34:32.9218018 [bsr_thread_setup] [THREAD:7] bsr_info<6> bsr r0 pnode-id:1, cs(Unconnected), prole(Unknown), cflag(0xe), scf(0x0): Restarting receiver thread
00000138 10/27/2020 13:34:32.9218062 [state_change_lock] [STATE:40] bsr_info<6> [0xFFFFE00143F91880] Worker should not initiate state changes with CS_SERIALIZE current:FFFFE00143831DB0 resource->worker.task:FFFFE001426CD990
00000139 10/27/2020 13:34:32.9218231 [print_state_change] [STATE:19] bsr_info<6> bsr r0 pnode-id:1, cs(Unconnected), prole(Unknown), cflag(0xe), scf(0x2a): conn_connect, conn( Unconnected -> Connecting )
00000140 10/27/2020 13:34:32.9219750 [__bsr_uuid_new_current] [UUID:3] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x6): ensure_current_uuid, 5350137EA9A809C3 UUID has been generated. weak nodes FFFFFFFFFFFFFFFE |
...