로그 사전을 활용한 로그 분석의 예 입니다.
send timeout 연결 단절
00000102 10/27/2020 13:34:16.2073112 [NetlinkWorkThread] [NETLINK:20] bsr_info<6> [0xFFFFE001428903C0] BSR_ADM_PRIMARY:14 command execution terminated. (pending command:1) 00000103 10/27/2020 13:34:16.3024186 [bsr_uuid_compare] [RESYNC/OV:197] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x80): Local and peer current UUIDs are the same. rule(40), res(0) 00000104 10/27/2020 13:34:16.3024298 [bsr_resync_authoritative] [RESYNC/OV:93] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(UpToDate), prpl(Established), pdvflag(0x100600): Becoming WFBitMapS due to authoritative node changed 00000105 10/27/2020 13:34:16.3024466 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(UpToDate), prpl(Established), pdvflag(0x100600): change_repl_state, pdsk( UpToDate -> Consistent ) repl( Established -> WFBitMapS ) 00000106 10/27/2020 13:34:16.3024729 [receive_state] [STATE:8] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Consistent), prpl(WFBitMapS), pdvflag(0x100600): Receive new repl state "WFBitMapS" 00000107 10/27/2020 13:34:16.3024751 [bsr_queue_bitmap_io] [RESYNC/OV:33] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x80): send_bitmap (WFBitMapS), worker.task(FFFFE001426CD990), current(FFFFE001426CD990) 00000108 10/27/2020 13:34:16.3024839 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Consistent), prpl(WFBitMapS), pdvflag(0x100600): receive_state, pdsk( Consistent -> Outdated ) 00000109 10/27/2020 13:34:16.5009976 [INFO_bm_xfer_stats] [BITMAP:50] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Outdated), prpl(WFBitMapS), pdvflag(0x100600): send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% 00000110 10/27/2020 13:34:16.5036471 [INFO_bm_xfer_stats] [BITMAP:50] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Outdated), prpl(WFBitMapS), pdvflag(0x100600): receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% 00000111 10/27/2020 13:34:16.5036520 [receive_bitmap_finished] [BITMAP:52] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Outdated), prpl(WFBitMapS), pdvflag(0x100600): Send that bitmap exchange has been completed 00000112 10/27/2020 13:34:16.7011344 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Outdated), prpl(WFBitMapS), pdvflag(0x100600): bsr_start_resync, pdsk( Outdated -> Inconsistent ) repl( WFBitMapS -> SyncSource ) 00000113 10/27/2020 13:34:16.7011903 [bsr_start_resync] [RESYNC/OV:144] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): Began resync as SyncSource (will sync 0 KB [0 bits set]). 00000114 10/27/2020 13:34:16.7049274 [receive_state] [STATE:8] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): Receive new repl state "SyncSource" 00000115 10/27/2020 13:34:17.1004158 [__receive_uuids] [UUID:16] bsr_warn<4> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): receive new current but not update UUID: CE5CC8B0672E645C 00000116 10/27/2020 13:34:17.1004241 [bsr_uuid_compare] [RESYNC/OV:197] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x80): Local and peer current UUIDs are the same. rule(40), res(0) 00000117 10/27/2020 13:34:17.6882020 [bsr_resync_finished] [RESYNC/OV:116] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): Resync done (total 1 sec; paused 0 sec; 0 K/sec), hit bit (in sync 0; marked rl 0) 00000118 10/27/2020 13:34:17.6882313 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): bsr_resync_finished, pdsk( Inconsistent -> UpToDate ) repl( SyncSource -> Established ) 00000119 10/27/2020 13:34:17.6883402 [bsr_queue_bitmap_io] [RESYNC/OV:33] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x80): write from resync_finished, worker.task(FFFFE001426CD990), current(FFFFE001426CD990) 00000120 10/27/2020 13:34:17.6891134 [receive_state] [STATE:8] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(UpToDate), prpl(Established), pdvflag(0x100600): Receive new repl state "Established"
다음 로그를 보면 연결 종료 시점에 send timeout 로그를 확인 할 수 있으며 로그의 분류는 SOCKET 이며 인덱스는 47 입니다.
00000124 10/27/2020 13:34:32.0972611 [Send] [SOCKET:47] bsr_info<6> [0xFFFFE00143DB5040] Send not completed in time-out(5000ms). current state 8(0xFFFFE00143F82198) size(4096)
로그 사전에서 로그 분류 SOCKET 에서 로그 인덱스 47을 조회하면 다음과 같습니다.
47 | info | Send not completed in time-out(%dms). current state %d(0x%p) size(%lu) | 전송 대기시간을 초과했습니다. | bsr 소켓 전송 대기시간은 resource 설정중 net 섹션에서 timeout 으로 설정 할 수 있습니다. |
설명에 있듯이 전송 대기 시간을 초과 했을 때 발생하는 로그이며 출력 로그에서 5000ms을 대기 했지만 전송이 완료되지 않았습니다. 여러가지 원인이 있을 수 있지만 보통 네트워크 대역폭이 부족해지거나 트래픽이 과도하게 발생했을 때 전송 타임아웃이 유발될 수 있습니다.
00000125 10/27/2020 13:34:32.0973540 [print_state_change] [STATE:19] bsr_info<6> bsr r0 pnode-id:1, cs(Connected), prole(Secondary), cflag(0xe), scf(0x21): bsr_sender, conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown ) 00000126 10/27/2020 13:34:32.0973579 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(UpToDate), prpl(Established), pdvflag(0x100600): bsr_sender, pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) 00000127 10/27/2020 13:34:32.0973703 [Receive] [SOCKET:66] bsr_info<6> [0xFFFFE00143F91880] receiver => Recv multiWait error. err(STATUS_CONNECTION_ABORTED) wsk(0xFFFFE00143F82198) size(16) 00000128 10/27/2020 13:34:32.0973749 [bsr_recv] [SOCKET:14] bsr_info<6> bsr r0 pnode-id:1, cs(Connected), prole(Secondary), cflag(0xe), scf(0x21): Data stream socket reset by peer. 00000129 10/27/2020 13:34:32.9202027 [bsr_ack_receiver] [CONNECTION:16] bsr_info<6> bsr r0 pnode-id:1, cs(NetworkFailure), prole(Unknown), cflag(0xe), scf(0x0): ack_receiver terminated 00000130 10/27/2020 13:34:32.9202081 [bsr_thread_setup] [THREAD:9] bsr_info<6> bsr r0 pnode-id:1, cs(NetworkFailure), prole(Unknown), cflag(0xe), scf(0x0): Terminating ack_recv thread 00000131 10/27/2020 13:34:32.9203724 [w_resync_timer] [RESYNC/OV:105] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(DUnknown), prpl(Off), pdvflag(0x0): The resync is complete because it is not in VerifyS or SyncTarget replication state. 00000135 10/27/2020 13:34:32.9217385 [conn_disconnect] [CONNECTION:9] bsr_info<6> bsr r0 pnode-id:1, cs(NetworkFailure), prole(Unknown), cflag(0xe), scf(0x0): Connection closed 00000136 10/27/2020 13:34:32.9217777 [print_state_change] [STATE:19] bsr_info<6> bsr r0 pnode-id:1, cs(NetworkFailure), prole(Unknown), cflag(0xe), scf(0x22): conn_disconnect, conn( NetworkFailure -> Unconnected ) 00000137 10/27/2020 13:34:32.9218018 [bsr_thread_setup] [THREAD:7] bsr_info<6> bsr r0 pnode-id:1, cs(Unconnected), prole(Unknown), cflag(0xe), scf(0x0): Restarting receiver thread 00000138 10/27/2020 13:34:32.9218062 [state_change_lock] [STATE:40] bsr_info<6> [0xFFFFE00143F91880] Worker should not initiate state changes with CS_SERIALIZE current:FFFFE00143831DB0 resource->worker.task:FFFFE001426CD990 00000139 10/27/2020 13:34:32.9218231 [print_state_change] [STATE:19] bsr_info<6> bsr r0 pnode-id:1, cs(Unconnected), prole(Unknown), cflag(0xe), scf(0x2a): conn_connect, conn( Unconnected -> Connecting ) 00000140 10/27/2020 13:34:32.9219750 [__bsr_uuid_new_current] [UUID:3] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x6): ensure_current_uuid, 5350137EA9A809C3 UUID has been generated. weak nodes FFFFFFFFFFFFFFFE