[MIP-52] DK 관련 Error 해결 방안 요청

Subject
DK 관련 Error 해결 방안 요청

APPLIES TO:
"- 물리적인 서버 4대와 XenServer(virtual OS 4대)로 1:1 구성임(virtual 구성시 P To V 로 생성하였음)
- MCCS version : 2월 17일자 버젼(93946) 약 한달 전에 기존것 upgrade 했음
- DK version: 7.0.5
- OS 환경, windows 2003 (한글, 일본등)"




SYMPTOMS
"- DK 에서 불규칙적으로 미러링 상태 값이 변경(아래 1번 본문)되기도 하고, MCCS 구축 이후 Error 발생 (2번 본문)
- 7.0.5 에서 7.2로 upgrade 하지 않은 이유는 우선 cluster 구성이 마무리 된 시점이 최근이며, sync 시간이 약 4~6시간 정도 소요등으로 인해 구성 마무리를 우선순위로 하였음.
- upgrade 관련, 담당자에게 명확하게 전달은 하지 않았으나, upgrade 로 해결이 된다면 담당자에게 재요청 할 예정임.
- 고객사 사정상, upgrade 요청을 한다고 하여, 바로 진행할 수 없고, 내부 결제가 상당히 오래 걸림.(일본 은행이라서...)


< 1. 미러링 상태 변경에 대한 Warning 은 4sets 에서 발생하고 있음>
The mirrored state of a Source volume has been changed.
Current State: Paused
Current State: Resync
Current State: Mirror

< 2. 아래의 Error 는 App1 과 App2 sets에서 발생하고 있음>
During mirror resynchronization, a large number of passes to update the mirror has been made due to incoming writes. In order to ensure that the resynchronization will complete soon, please stop all write access to this volume. Volume Device: Source Volume: D Target Machine: 192.168.30.19 Target Volume: D Resync Pass: 201 The resynchronization will stop and the mirror will be broken. Please resynchronize it manually.

1번 Warning 은 말 그대로 경고라고 하여, 크게 문제가 없다고 고객사에 알렸으나, Error 는 회사 내부적으로 확인 하겠다고 하고 철수한 상태임.


"



SOLUTION
"Bob 의 답변


No there is no problem.

Your original question had to do with the event log message about too many passes through the bitmap file without being able to return to the mirroring state. In the case where there are many writes continually being written to the SOURCE volume, you may see this event.

You will also see the cycle of events I explained below, where the ASYNC queue fills up temporarily and we have to PAUSE the mirror, then RESYNC before finally returning to the mirror state.

All of this behavior is consistent with a sustained period of heavy writes.

There is one other scenario where you will see a similar PAUSE, RESYNC, MIRROR cycle of events in the event log. And that is if the system temporarily runs out of Non Paged Memory. We use Non Paged Memory to process writes that are being sent to the TARGET system. Again, if many writes come down to the SOURCE for a long period of time, it is possible to run out of Non Paged Memory before the Async Queue hits the High Water Mark. If this happens, you will see the same cycle of events as hitting the High Water Mark.

In Win2003, there is 256 MB of Non Paged Memory available to the system. If you run with the /3GB switch enabled in Win2003, that reduces the amount of Non Paged Memory to 128 MB so you are much more likely to run out of Non Paged Memory with that switch enabled. That is why I was asking if your customer was running with it.

In Win2008, there is a lot more Non Paged Memory available to the system, so we rarely run out of Non Paged Memory with Win2008.

I hope all that makes sense.

I believe the customer is running into a scenario where there are a lot of writes being written to the SOURCE volume for a long period of time and the message that DK is logging are normal.

Bob
"