Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

개요

오류 로그는 사용자의 커널 로그 분석 시 접근성을 높이기 위해 로그 사전을 이용한 오류 발생 시 로그 분석을 설명합니다.

...

로그 사전을 활용한 로그 분석의 예 입니다.

send timeout 연결 단절

Code Block
00000102 10/27/2020 13:34:16.2073112 [NetlinkWorkThread] [NETLINK:20] bsr_info<6> [0xFFFFE001428903C0] BSR_ADM_PRIMARY:14 command execution terminated. (pending command:1)
00000103 10/27/2020 13:34:16.3024186 [bsr_uuid_compare] [RESYNC/OV:197] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x80): Local and peer current UUIDs are the same. rule(40), res(0)
00000104 10/27/2020 13:34:16.3024298 [bsr_resync_authoritative] [RESYNC/OV:93] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(UpToDate), prpl(Established), pdvflag(0x100600): Becoming WFBitMapS due to authoritative node changed
00000105 10/27/2020 13:34:16.3024466 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(UpToDate), prpl(Established), pdvflag(0x100600): change_repl_state, pdsk( UpToDate -> Consistent ) repl( Established -> WFBitMapS )
00000106 10/27/2020 13:34:16.3024729 [receive_state] [STATE:8] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Consistent), prpl(WFBitMapS), pdvflag(0x100600): Receive new repl state "WFBitMapS"
00000107 10/27/2020 13:34:16.3024751 [bsr_queue_bitmap_io] [RESYNC/OV:33] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x80): send_bitmap (WFBitMapS), worker.task(FFFFE001426CD990), current(FFFFE001426CD990)
00000108 10/27/2020 13:34:16.3024839 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Consistent), prpl(WFBitMapS), pdvflag(0x100600): receive_state, pdsk( Consistent -> Outdated )
00000109 10/27/2020 13:34:16.5009976 [INFO_bm_xfer_stats] [BITMAP:50] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Outdated), prpl(WFBitMapS), pdvflag(0x100600): send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
00000110 10/27/2020 13:34:16.5036471 [INFO_bm_xfer_stats] [BITMAP:50] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Outdated), prpl(WFBitMapS), pdvflag(0x100600): receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
00000111 10/27/2020 13:34:16.5036520 [receive_bitmap_finished] [BITMAP:52] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Outdated), prpl(WFBitMapS), pdvflag(0x100600): Send that bitmap exchange has been completed
00000112 10/27/2020 13:34:16.7011344 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Outdated), prpl(WFBitMapS), pdvflag(0x100600): bsr_start_resync, pdsk( Outdated -> Inconsistent ) repl( WFBitMapS -> SyncSource )
00000113 10/27/2020 13:34:16.7011903 [bsr_start_resync] [RESYNC/OV:144] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): Began resync as SyncSource (will sync 0 KB [0 bits set]).
00000114 10/27/2020 13:34:16.7049274 [receive_state] [STATE:8] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): Receive new repl state "SyncSource"
00000115 10/27/2020 13:34:17.1004158 [__receive_uuids] [UUID:16] bsr_warn<4> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): receive new current but not update UUID: CE5CC8B0672E645C
00000116 10/27/2020 13:34:17.1004241 [bsr_uuid_compare] [RESYNC/OV:197] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x80): Local and peer current UUIDs are the same. rule(40), res(0)
00000117 10/27/2020 13:34:17.6882020 [bsr_resync_finished] [RESYNC/OV:116] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): Resync done (total 1 sec; paused 0 sec; 0 K/sec), hit bit (in sync 0; marked rl 0)
00000118 10/27/2020 13:34:17.6882313 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): bsr_resync_finished, pdsk( Inconsistent -> UpToDate ) repl( SyncSource -> Established )
00000119 10/27/2020 13:34:17.6883402 [bsr_queue_bitmap_io] [RESYNC/OV:33] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x80): write from resync_finished, worker.task(FFFFE001426CD990), current(FFFFE001426CD990)
00000120 10/27/2020 13:34:17.6891134 [receive_state] [STATE:8] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(UpToDate), prpl(Established), pdvflag(0x100600): Receive new repl state "Established"

다음 로그를 보면 연결 종료 시점에 send timeout 로그를 확인 할 수 있으며 로그의 분류는 SOCKET 이며 인덱스는 47 입니다.

Code Block
00000124 10/27/2020 13:34:32.0972611 [Send] [SOCKET:47] bsr_info<6> [0xFFFFE00143DB5040] Send not completed in time-out(5000ms). current state 8(0xFFFFE00143F82198) size(4096)

로그 사전에서 로그 분류 SOCKET 에서 로그 인덱스 47을 조회하면 다음과 같습니다.

47

info

Send not completed in time-out(%dms). current state %d(0x%p) size(%lu)

전송 대기시간을 초과했습니다.

bsr 소켓 전송 대기시간은 resource 설정중 net 섹션에서 timeout 으로 설정 할 수 있습니다.

로그 사전의 설명에 따라 전송 대기 시간을 초과 했을 때 발생하는 로그이며 출력 로그에서 5000ms을 대기 했지만 전송이 완료되지 않았습니다. 여러가지 원인이 있을 수 있지만 보통 네트워크 대역폭이 부족해지거나 트래픽이 과도하게 발생했을 때 전송 타임아웃이 유발될 수 있습니다.

Code Block
00000125 10/27/2020 13:34:32.0973540 [print_state_change] [STATE:19] bsr_info<6> bsr r0 pnode-id:1, cs(Connected), prole(Secondary), cflag(0xe), scf(0x21): bsr_sender, conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
00000126 10/27/2020 13:34:32.0973579 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(UpToDate), prpl(Established), pdvflag(0x100600): bsr_sender, pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
00000127 10/27/2020 13:34:32.0973703 [Receive] [SOCKET:66] bsr_info<6> [0xFFFFE00143F91880] receiver => Recv multiWait error. err(STATUS_CONNECTION_ABORTED) wsk(0xFFFFE00143F82198) size(16)
00000128 10/27/2020 13:34:32.0973749 [bsr_recv] [SOCKET:14] bsr_info<6> bsr r0 pnode-id:1, cs(Connected), prole(Secondary), cflag(0xe), scf(0x21): Data stream socket reset by peer.
00000129 10/27/2020 13:34:32.9202027 [bsr_ack_receiver] [CONNECTION:16] bsr_info<6> bsr r0 pnode-id:1, cs(NetworkFailure), prole(Unknown), cflag(0xe), scf(0x0): ack_receiver terminated
00000130 10/27/2020 13:34:32.9202081 [bsr_thread_setup] [THREAD:9] bsr_info<6> bsr r0 pnode-id:1, cs(NetworkFailure), prole(Unknown), cflag(0xe), scf(0x0): Terminating ack_recv thread
00000131 10/27/2020 13:34:32.9203724 [w_resync_timer] [RESYNC/OV:105] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(DUnknown), prpl(Off), pdvflag(0x0): The resync is complete because it is not in VerifyS or SyncTarget replication state.
00000135 10/27/2020 13:34:32.9217385 [conn_disconnect] [CONNECTION:9] bsr_info<6> bsr r0 pnode-id:1, cs(NetworkFailure), prole(Unknown), cflag(0xe), scf(0x0): Connection closed
00000136 10/27/2020 13:34:32.9217777 [print_state_change] [STATE:19] bsr_info<6> bsr r0 pnode-id:1, cs(NetworkFailure), prole(Unknown), cflag(0xe), scf(0x22): conn_disconnect, conn( NetworkFailure -> Unconnected )
00000137 10/27/2020 13:34:32.9218018 [bsr_thread_setup] [THREAD:7] bsr_info<6> bsr r0 pnode-id:1, cs(Unconnected), prole(Unknown), cflag(0xe), scf(0x0): Restarting receiver thread
00000138 10/27/2020 13:34:32.9218062 [state_change_lock] [STATE:40] bsr_info<6> [0xFFFFE00143F91880] Worker should not initiate state changes with CS_SERIALIZE current:FFFFE00143831DB0 resource->worker.task:FFFFE001426CD990
00000139 10/27/2020 13:34:32.9218231 [print_state_change] [STATE:19] bsr_info<6> bsr r0 pnode-id:1, cs(Unconnected), prole(Unknown), cflag(0xe), scf(0x2a): conn_connect, conn( Unconnected -> Connecting )
00000140 10/27/2020 13:34:32.9219750 [__bsr_uuid_new_current] [UUID:3] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x6): ensure_current_uuid, 5350137EA9A809C3 UUID has been generated. weak nodes FFFFFFFFFFFFFFFE