Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

개요

로그 분석 방법과 로그 사전 이용 방법을 사용자와 공유하기 위해 이슈 또는 오류 발생 시 수집한 로그의 로그 분석 방법과 로그 사전 이용 방법을 제공합니다.

오류

send timeout 반복 발생

다음은 지속적으로 연결이 종료되는 로그의 일부입니다.

00000102 10/27/2020 13:34:16.2073112 [NetlinkWorkThread] [NETLINK:20] bsr_info<6> [0xFFFFE001428903C0] BSR_ADM_PRIMARY:14 command execution terminated. (pending command:1)
00000103 10/27/2020 13:34:16.3024186 [bsr_uuid_compare] [RESYNC/OV:197] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x80): Local and peer current UUIDs are the same. rule(40), res(0)
00000104 10/27/2020 13:34:16.3024298 [bsr_resync_authoritative] [RESYNC/OV:93] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(UpToDate), prpl(Established), pdvflag(0x100600): Becoming WFBitMapS due to authoritative node changed
00000105 10/27/2020 13:34:16.3024466 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(UpToDate), prpl(Established), pdvflag(0x100600): change_repl_state, pdsk( UpToDate -> Consistent ) repl( Established -> WFBitMapS )
00000106 10/27/2020 13:34:16.3024729 [receive_state] [STATE:8] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Consistent), prpl(WFBitMapS), pdvflag(0x100600): Receive new repl state "WFBitMapS"
00000107 10/27/2020 13:34:16.3024751 [bsr_queue_bitmap_io] [RESYNC/OV:33] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x80): send_bitmap (WFBitMapS), worker.task(FFFFE001426CD990), current(FFFFE001426CD990)
00000108 10/27/2020 13:34:16.3024839 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Consistent), prpl(WFBitMapS), pdvflag(0x100600): receive_state, pdsk( Consistent -> Outdated )
00000109 10/27/2020 13:34:16.5009976 [INFO_bm_xfer_stats] [BITMAP:50] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Outdated), prpl(WFBitMapS), pdvflag(0x100600): send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
00000110 10/27/2020 13:34:16.5036471 [INFO_bm_xfer_stats] [BITMAP:50] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Outdated), prpl(WFBitMapS), pdvflag(0x100600): receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
00000111 10/27/2020 13:34:16.5036520 [receive_bitmap_finished] [BITMAP:52] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Outdated), prpl(WFBitMapS), pdvflag(0x100600): Send that bitmap exchange has been completed
00000112 10/27/2020 13:34:16.7011344 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Outdated), prpl(WFBitMapS), pdvflag(0x100600): bsr_start_resync, pdsk( Outdated -> Inconsistent ) repl( WFBitMapS -> SyncSource )
00000113 10/27/2020 13:34:16.7011903 [bsr_start_resync] [RESYNC/OV:144] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): Began resync as SyncSource (will sync 0 KB [0 bits set]).
00000114 10/27/2020 13:34:16.7049274 [receive_state] [STATE:8] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): Receive new repl state "SyncSource"
00000115 10/27/2020 13:34:17.1004158 [__receive_uuids] [UUID:16] bsr_warn<4> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): receive new current but not update UUID: CE5CC8B0672E645C
00000116 10/27/2020 13:34:17.1004241 [bsr_uuid_compare] [RESYNC/OV:197] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x80): Local and peer current UUIDs are the same. rule(40), res(0)
00000117 10/27/2020 13:34:17.6882020 [bsr_resync_finished] [RESYNC/OV:116] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): Resync done (total 1 sec; paused 0 sec; 0 K/sec), hit bit (in sync 0; marked rl 0)
00000118 10/27/2020 13:34:17.6882313 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(Inconsistent), prpl(SyncSource), pdvflag(0x100600): bsr_resync_finished, pdsk( Inconsistent -> UpToDate ) repl( SyncSource -> Established )
00000119 10/27/2020 13:34:17.6883402 [bsr_queue_bitmap_io] [RESYNC/OV:33] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x80): write from resync_finished, worker.task(FFFFE001426CD990), current(FFFFE001426CD990)
00000120 10/27/2020 13:34:17.6891134 [receive_state] [STATE:8] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(UpToDate), prpl(Established), pdvflag(0x100600): Receive new repl state "Established"
  • 다음 로그를 보면 연결 종료 시점에 send timeout 로그를 확인 할 수 있으며 로그 분류는 SOCKET, 로그 인덱스는 47인걸 확인 할 수 있습니다.

00000124 10/27/2020 13:34:32.0972611 [Send] [SOCKET:47] bsr_info<6> [0xFFFFE00143DB5040] Send not completed in time-out(5000ms). current state 8(0xFFFFE00143F82198) size(4096)
  • 로그 분류 SOCKET, 로그 인덱스 47 의 로그 사전은 다음과 같습니다.

47

info

Send not completed in time-out(%dms). current state %d(0x%p) size(%lu)

전송 대기시간을 초과했습니다.

bsr 소켓 전송 대기시간은 resource 설정중 net 섹션에서 timeout 으로 설정 할 수 있습니다.

  • 로그 사전 설명을 보면 전송 대기 시간을 초과했을때 발생하는 로그이며 로그에서 5000ms을 대기했지만 전송이 완료되지 않아 연결이 종료되는걸 알 수 있습니다. 원인으로는 네트워크 대역폭이 부족하거나 많은 트래픽이 과도하게 발생했을 가능성이 높습니다.

  • send timeout을 해결을 위해서 네트워크 대역폭 확보가 필요하지만 bsr에서 적용 할 수 있는 임시방편으로는 timeout 설정 시간을 증가시켜 해결 할 수 있습니다. timeout 시간을 늘리는 방법은 로그 사전 비고에 작성되어있는 resource 설정 중 net 섹션에서 timeout설정을 통해 증가시킬 수 있습니다.

00000125 10/27/2020 13:34:32.0973540 [print_state_change] [STATE:19] bsr_info<6> bsr r0 pnode-id:1, cs(Connected), prole(Secondary), cflag(0xe), scf(0x21): bsr_sender, conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
00000126 10/27/2020 13:34:32.0973579 [print_state_change] [STATE:21] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(UpToDate), prpl(Established), pdvflag(0x100600): bsr_sender, pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
00000127 10/27/2020 13:34:32.0973703 [Receive] [SOCKET:66] bsr_info<6> [0xFFFFE00143F91880] receiver => Recv multiWait error. err(STATUS_CONNECTION_ABORTED) wsk(0xFFFFE00143F82198) size(16)
00000128 10/27/2020 13:34:32.0973749 [bsr_recv] [SOCKET:14] bsr_info<6> bsr r0 pnode-id:1, cs(Connected), prole(Secondary), cflag(0xe), scf(0x21): Data stream socket reset by peer.
00000129 10/27/2020 13:34:32.9202027 [bsr_ack_receiver] [CONNECTION:16] bsr_info<6> bsr r0 pnode-id:1, cs(NetworkFailure), prole(Unknown), cflag(0xe), scf(0x0): ack_receiver terminated
00000130 10/27/2020 13:34:32.9202081 [bsr_thread_setup] [THREAD:9] bsr_info<6> bsr r0 pnode-id:1, cs(NetworkFailure), prole(Unknown), cflag(0xe), scf(0x0): Terminating ack_recv thread
00000131 10/27/2020 13:34:32.9203724 [w_resync_timer] [RESYNC/OV:105] bsr_info<6> bsr r0/0 minor 2 pnode-id:1, pdsk(DUnknown), prpl(Off), pdvflag(0x0): The resync is complete because it is not in VerifyS or SyncTarget replication state.
00000135 10/27/2020 13:34:32.9217385 [conn_disconnect] [CONNECTION:9] bsr_info<6> bsr r0 pnode-id:1, cs(NetworkFailure), prole(Unknown), cflag(0xe), scf(0x0): Connection closed
00000136 10/27/2020 13:34:32.9217777 [print_state_change] [STATE:19] bsr_info<6> bsr r0 pnode-id:1, cs(NetworkFailure), prole(Unknown), cflag(0xe), scf(0x22): conn_disconnect, conn( NetworkFailure -> Unconnected )
00000137 10/27/2020 13:34:32.9218018 [bsr_thread_setup] [THREAD:7] bsr_info<6> bsr r0 pnode-id:1, cs(Unconnected), prole(Unknown), cflag(0xe), scf(0x0): Restarting receiver thread
00000138 10/27/2020 13:34:32.9218062 [state_change_lock] [STATE:40] bsr_info<6> [0xFFFFE00143F91880] Worker should not initiate state changes with CS_SERIALIZE current:FFFFE00143831DB0 resource->worker.task:FFFFE001426CD990
00000139 10/27/2020 13:34:32.9218231 [print_state_change] [STATE:19] bsr_info<6> bsr r0 pnode-id:1, cs(Unconnected), prole(Unknown), cflag(0xe), scf(0x2a): conn_connect, conn( Unconnected -> Connecting )
00000140 10/27/2020 13:34:32.9219750 [__bsr_uuid_new_current] [UUID:3] bsr_info<6> bsr r0/0 minor 2, ds(UpToDate), dvflag(0x6): ensure_current_uuid, 5350137EA9A809C3 UUID has been generated. weak nodes FFFFFFFFFFFFFFFE

  • No labels