Overview
bsrmon is a performance monitoring tool that tracks the entire replication process of the bsr engine that performs a BSR engine performing replication and records the time required for taken by each logical section of the engine's logic in real time to identify bottlenecks in the engine operation's behavior. The bsr BSR kernel engine maintains performance-related statistical information such as time records for each section and accumulated calculation values at the time the engine was operated in the form of a file log through the internal jiffies record code. Administrators can monitor the statistics information of these kernel engines by querying them statistics in the form of file logs, including a section-by-section time history of when the engine was running through internal time (jiffies) records and cumulative calculations. Administrators can monitor these statistical information of the kernel engine by querying with the bsrmon utility. bsrmon is supported since from version 1.6.1.
Functions
Features provided by bsrmon include:
/start
...
Info |
---|
In a linux environment, the Performance Monitor feature |
...
Activation of the performance monitor is performed through the following process.
windows
set HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\bsr\bsrmon_run registry value to 1
Activating the performance data aggregation logic of the bsr engine
Enable bsrservice's bsrmon /file execution logic
linux
Set the /etc/bsr.d/.bsrmon_run value to 1.
Activating the performance data aggregation logic of the bsr engine
/lib/bsr/bsrmon-run Run the script. Run the bsrmon /file command periodically
/stop
Disable Performance Monitor. When disabled, the engine stops aggregating performance data and logging files.
Deactivation is performed as follows.
windows
Set HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\bsr\bsrmon_run registry value to 0
Disable performance data aggregation logic of bsr engine
Disable bsrservice's bsrmon /file execution logic
linux
Set /etc/bsr.d/.bsrmon_run value to 0
Disable performance data aggregation logic of bsr engine
Stop the /lib/bsr/bsrmon-run script
/status
Query the operating status of the performance monitor.
/file
Saves performance data and memory information to a file. This is a hidden command and is used by the bsrservice and bsrmon-run scripts.
The file storage location is as follows.
windows : C:\Program Files\bsr\log\perfmon
linux : /var/log/bsr/perfmon/
Data is saved only for the inquiry time and the numerical value for each item, and when the /watch or /report command is executed, the data is parsed and output according to the format.
Code Block |
---|
2021-07-20_05:55:21.443 955 955 244480 244480 1042 1042 266752 266752 |
/watch {types} [/scroll]
Data that is being aggregated by type is output in real time. When the /scroll option is used, the output is displayed in newline format.
/report {types} [/f {filename}] [/d {YYYY-MM-DD}] [/s {hh:mm[:ss]}] [/e {hh:mm[:ss]}]
Outputs the statistics of the data recorded by the file for each type.
When generating statistics, meaningless data such as numerical values are excluded from the target sample.
Data written to zero with no I/O occurring
I/O data generated temporarily (when the I/O number is not consecutively more than 2 times)
ex) vnr0_IO_STAT
Code Block |
---|
2021-07-07_21:27:52.304 0 0 0 0 0 0 0 0 // Excluded from samples as recorded performance data is zero
2021-07-07_21:27:53.325 0 0 0 0 2 2 1 1 // Excluded from samples as transient I/O performance data
2021-07-07_21:27:54.346 0 0 0 0 0 0 0 0
...
2021-07-07_21:30:28.549 0 0 0 0 408 408 104448 104448
2021-07-07_21:30:29.580 0 0 0 0 409 409 104704 104704 // Performance data is not zero and is being recorded continuously. selected as samples
2021-07-07_21:30:30.605 0 0 0 0 404 404 103424 103424 |
[/f {filename}]
the file name for the report. Used for statistical calculation of backed up files.
[/d {YYYY-MM-DD}] [/s {hh:mm[:ss]}] [/e {hh:mm[:ss]}]
This option allows you to inquire the figures for a specific section.
/d : Outputs statistics of data recorded on that date. If not entered, statistics for the entire period are output.
/s : Outputs statistics of data recorded from that time. If not input, statistics of recorded data are output from 00:00:00.
/e : Outputs statistics of data recorded up to that time. If not input, statistics of data collected until 23:59:59 are output.
/set {period, file_size, file_cnt} {value}
Command to adjust the numerical values related to monitoring.
period
Set file saving and monitoring cycle. It is set in seconds, and the default value is 1 second.
file_size
Sets the file rolling size. Set in MB unit, the default value is 50 MB.
file_cnt
Sets the number of file rolling. The default value is three.
/io_delay_test {flag} {delay point} {delay time}
Verifies the functionality of the bsr performance monitor by intentionally incurring I/O performance degradation. This feature is for developers.
/debug cmds options
Command for windows to inquire debugfs information.
types
Performance monitoring targets.
iostat, ioclat, reqstat, peer_reqstat, alstat, network, sendbuf, memstat
iostat {resource} {vnr}
Aggregate data for read and write I/O of replication volumes (iops, ios, kbs, kb)
Expand |
---|
|
Code Block | > bsrmon /watch iostatis only available in kernels with the debugfs feature enabled, and debugfs must be mounted in the /sys/kernel/debug path. You can check if the debugfs feature is enabled using the following command. Code Block |
---|
# grep CONFIG_DEBUG_FS /boot/config-`uname -r`
CONFIG_DEBUG_FS=y |
Since debugfs is automatically mounted starting with CentOS7, and not automatically mounted in CentOS6, you can use the performance monitor feature after mounting it in the /sys/kernel/debug path. If you want to use the performance monitor feature even after a system reboot, you need to register that mount path in fstab. Code Block |
---|
# mkdir /sys/kernel/debug
# mount -t debugfs debugfs /sys/kernel/debug |
|
Types
The target types of the performance measurement. Corresponds to the types argument of bsrmon.
iostat, ioclat, reqstat, peer_reqstat, alstat, network, sendbuf, memstat
iostat {resource} {vnr}
Aggregate data for read and write I/O on the replica volume (io count, kbs, kb)
Expand |
---|
|
Code Block |
---|
> bsrmon /watch iostat r0 0
2020-12-06_08:22:36.223
read : IO count=0, BW=0kb/s (0KB)
write: IO count=216, BW=110208kb/s (220416KB)
2020-12-06_08:22:37.640
read : IO count=0, BW=0kb/s (0KB)
write: IO count=34, BW=16896kb/s (16896KB)
2020-12-06_08:22:39.32
read : IO count=0, BW=0kb/s (0KB)
write: IO count=326, BW=155176kb/s (155176KB)
2020-12-06_08:22:40.106
read : IO count=0, BW=0kb/s (0KB)
write: IO count=330, BW=164624kb/s (164624KB)
... |
| → Situations that write I/O is constantly occurring |
Code Block |
---|
> bsrmon /report iostat r0 0
Report r0 [IO STAT - vnr0]
Run: 2021-07-20_08:14:27.405 - 2021-08-04_18:59:53.573
read : io count=24855, bw=6325194kbyte
BW (kbyte/s): min=40, max=284928, avg=248578, samples=24
write: io count=8360345, bw=2139182512kbyte
BW (kbyte/s): min=12, max=401408, avg=320256, samples=6671 |
| → Data aggregated for the period 2021-07-20_08:14:27.405 - 2021-08-04_18:59:53.573 → 24 read I/O samples. 248 average, 40 minimum, 284 maximum Mbytes/s of IO processed. → 6670 write I/O samples. 320 average, 12 minimum, 401 maximum Mbytes/s of IO processed. |
Code Block |
---|
> bsrmon /report iostat r0 0 /f vnr0_IO_STAT_2021-07-20_081427.005
Report r0 [vnr0_IO_STAT_2021-07-20_081427.005]
Run: 2021-07-07_21:24:24.383 - 2021-07-20_08:14:26.371
read : ios=6332219, bw=1595360917kbyte
IOPS : min=1, max=1233, avg=980, samples=6452
BW (kbyte/s): min=4, max=428032, avg=247033, samples=6452
write: ios=22550577, bw=3961426883kbyte
IOPS : min=1, max=6478, avg=786, samples=28468
BW (kbyte/s): min=4, max=1085440, avg=138461, samples=28484 |
| → Extract data from the file vnr0_IO_STAT_2021-07-20_081427.005 → Data aggregated for the period 2021-07-07_21:24:24.383 - 2021-07-20_08:14:26.371 → 6452 read I/O samples. Measured with an average of 980 IOPS, a minimum of 1 IOPS, and a maximum of 1233 IOPS. → 28468 write I/O samples. Measured with an average of 786 IOPS, a minimum of 1 IOPS, and a maximum of 6478 IOPS. |
|
ioclat {resource} {vnr}
local/master I/O 의 complete latency (usec)
Expand |
---|
|
Code Block |
---|
> bsrmon /watch ioclat r0 0
2020-12- |
| 06082236223readIOPS0(IOs0)BW=0kb/s (0KB)
write: IOPS=216 (IOs=432), BW=110208kb/s (220416KB)avg=10347
master clat (usec): min=610, max=256852, avg=28202
2020-12- |
| 06082237640readIOPS0(IOs0)BW=0kb/s (0KB)
write: IOPS=34 (IOs=34), BW=16896kb/s (16896KB)avg=44316
master clat (usec): min=1499, max=492829, avg=114106
2020-12- |
| 0608223932readIOPS0(IOs0)BW0kb/s (0KB)writeIOPS=326 (IOs=326)BW=155176kb/s (155176KB)
2020-12-06_08:22:40.106
read : IOPS=0 (IOs=0), BW=0kb/s (0KB)
write: IOPS=330 (IOs=330), BW=164624kb/s (164624KB)
A situation in which Situations where write I/O is | continuously occurring Code Block |
---|
constantly occurring → Output minimum, maximum, and average values for the time it took for local/master I/O to complete. |
Code Block |
---|
> bsrmon /report |
| iostatioclat r0 0
Report r0 [IO |
| STAT2021-200814274052021-04185953573read:ios=24855, bw=6325194kbyte
IOPS 10111397224 BWkbyte/s4028492824857824
write: ios=8360345, bw=2139182512kbyte
IOPS : min=3, max=1568, avg=1252, samples=6670
BW (kbyte/s): min=12, max=401408, avg=320256, samples=6671→ Data aggregated during the period 2021-07-20_08:14:27.405 - 2021-08-04_18:59:53.573 → 24 read I/O samples. Measured at an average of 972 iops, a minimum of 10 iops, and a maximum of 1113 iops → 6670 write I/O samples. Measured at 1252 iops average, 3 iops min, 1568 iops max | |
Code Block |
---|
> bsrmon /report iostat r0 0 /f vnr0_IO_STAT_2021-07-20_081427.005
Report r0 [vnr0_IO_STAT_2021-07-20_081427.005]
Run: 2021-07-07_21:24:24.383 - 2021-07-20_08:14:26.371
read : ios=6332219, bw=1595360917kbyte
IOPS : min=1, max=1233, avg=980, samples=6452
BW (kbyte/s): min=4, max=428032, avg=247033, samples=6452
write: ios=22550577, bw=3961426883kbyte
IOPS : min=1, max=6478, avg=786, samples=28468
BW (kbyte/s): min=4, max=1085440, avg=138461, samples=28484 |
|
→ Extract data from file vnr0_IO_STAT_2021-07-20_081427.005 → Data aggregated for the period 2021-07-07_21:24:24.383 - 2021-07-20_08:14:26.371 → 6452 read I/O samples. Measured at an average of 980 iops, a minimum of 1 iops, and a maximum of 1233 iops → 28468 write I/O samples. Measured average 786 iops, min 1 iops, max 6478 iops |
ioclat {resource} {vnr}
Complete latency of local/master I/O (usec)
Expand |
---|
|
Code Block | > bsrmon /watch ioclat r0 0
2020-12-08_09:33:19.171
local clat | → Data aggregated for the period 2020-12-07_02:42:31.440 - 2020-12-08_09:37:44.115 → 50 local clat samples. measured to have taken an average of 8601, a minimum of 153, and a maximum of 1635570 usec. → 50 master clat samples. measured to have taken an average of 21801, a minimum of 217, and a maximum of 1667260 usecs. |
|
reqstat {resource} {vnr}
Aggregate data on the time taken to reach a specific segment based on when the request was created (usec)
requests : Number of requests processed during the monitoring period (cycles)
before_queue :Time spent before submitting work queued up
before_al_begin : Time spent before getting LRU of ACT_LOG
in_actlog : Time spent before RQ_IN_ACT_LOG was set.
submit : Time taken for request submit to be executed
bio_endio: Time taken for bio end logic to be executed
pre_send: Time spent before sending data to peer
acked: Time spent before RQ_NET_PENDING was removed after sending.
net_done: Time spent until RQ_NET_DONE
destroy : Time taken until request is released
Time for each leg of the AL UPTATE to complete (usec)
al_update : Number of active logs updated during the monitoring period (cycle)
before_bm_write : Before bsr_bm_write_hinted() execution
after_bm_write : After execution of bsr_bm_write_hinted()
after_sync_page : after execution of bsr_md_sync_page_io()
Expand |
---|
|
Code Block |
---|
> bsrmon /watch reqstat r0 0
2020-12-08_09:42:18.549
requests : 445
before_queue (usec): min=0, max=19, avg=0
before_al_begin (usec): min=0, max=4907, avg=11
in_actlog (usec): min=1, max=7191, avg=35
submit (usec): min= |
| 52011200010347masterclat61025685228202
2020-12-08_09:33:20.351localclat78047838744316masterclatbefore_bm_write (usec): min= |
| 1499492829114106
2020-12-08_09:33:21.509
local clat 0
after_bm_write (usec): min=0, max=0, avg=0
after_sync_page (usec): min= |
| 478198054523masterclat577243356303 ...→ A situation in which write I/O is continuously occurring → Outputs the minimum, maximum, and average values of the time taken for local/master I/O to complete | Code Block | > bsrmon /report ioclat r0 0
Report r0 [IO COMPLETE - vnr0]
Run: 2020-12-07_02:42:31.440 - 2020-12-08_09:37:44.115
local clat acked (usec): min=0, max=0, avg=0
net_done (usec): min= |
| 15316355708601, samples=50
master clat0
PEER 100.100.10.31:7792:
pre_send (usec): min=571, max=214862, avg=15703
acked (usec): min= |
| 217166726021801, samples=50→ Data aggregated for the period 2020-12-07_02:42:31.440 - 2020-12-08_09:37:44.115 → 50 local clat samples. Measured to have consumed an average of 8601, a minimum of 153, and a maximum of 1635570 usec → 50 master clat samples. Measured to have consumed an average of 21801, a minimum of 217, and a maximum of 1667260 usec | |
reqstat {resource} {vnr}
Aggregate data on the time taken from the time of request creation to a specific section (usec)
...
requests : The number of requests processed during the monitoring period (cycle)
...
before_al_begin : Time elapsed before getting the lru of act_log
...
in_actlog : Time taken for RQ_IN_ACT_LOG to be set
...
submit : Time taken until request submit is executed
...
bio_endio : Time taken for bio completion logic to be executed
...
pre_send : Time taken before sending data to peer
...
acked : Time elapsed until RQ_NET_PENDING is removed after send
...
net_done : Time taken to RQ_NET_DONE
...
destroy : The time it takes for the request to be released
Time required for each section until al uptate is completed (usec)
al_update : The number of active logs updated during the monitoring period (cycle)
before_bm_write : Before executing bsr_bm_write_hinted()
after_bm_write : After executing bsr_bm_write_hinted()
after_sync_page : After executing bsr_md_sync_page_io()
Expand |
---|
|
Code Block | > bsrmon /watch reqstat r0 0
2020-12-08_09:42:18.549
requests : 445
before_queue 7966
net_done (usec): min=575, max=214978, avg=15710
... |
| → Write I/O is occurring continuously → Disconnected from peer bsr03 → AL update is not performed |
Code Block |
---|
> bsrmon /report reqstat r0 0
Report r0 [REQUEST STAT - vnr0]
Run: 2020-12-06_05:46:36.678 - 2020-12-08_09:45:46.219
requests : total=38180
before_queue (usec): min=0, max=0, avg=0, samples=0
before_al_begin (usec): min=6966, max=460112, avg=4264, samples=2
in_actlog (usec): min= |
| 019589946, avg=549, samples= |
| 0before_al_begin (usec): min=0, max=4907, avg=11in_actlog 1719135
submit (usec): min=3, max=7215, avg=46 47219589987, avg=579, samples= |
| 505772149801571146380, samples=161
al_uptate |
| : total=0
before_bm_write (usec): min=0, max=0, avg=0, samples=0
after_bm_write (usec): min=0, max=0, avg=0, samples=0
after_sync_page (usec): min=0, max=0, avg=0, samples=0
PEER |
| bsr03100.100.10.31:7792:
pre_send (usec): min= |
| 00022364, samples=161
acked (usec): min= |
| 001635581, avg=15737, samples= |
| 0161
net_done (usec): min= |
| 000100.100.10.31:7792bsr03:
pre_send (usec): min= |
| 5712148621570355521000479660, samples=0
net_done (usec): min= |
| 57521497815710
...→ A situation in which write I/O is continuously occurring → Disconnect from peer bsr03 → al update is not performed | Request performance data statistics output | → 38180 requests were processed during the period 2020-12-06_05:46:36.678 - 2020-12-08_09:45:46.219 → al_update never occurred |
|
peer_reqstat {resource} {vnr}
Aggregate data on the time taken to reach a certain interval based on when the peer request was created (USEC)
peer_requests : Number of peer requests processed during the monitoring period (cycle)
submit : Time taken for a peer request submit to be executed
bio_endio: Time taken for the bio completion logic of a peer request to be executed.
destroy: Time taken for peer request to be destroyed
Expand |
---|
|
reportReport r0 [REQUEST STAT - vnr0]
Run: 2020-12-06_05:46:36.678 - 2020-12-08_09:45:46.219
2021-07-19_21:52:02.191
PEER bsr-03:
peer requests : |
| total=38180before_queuesubmit (usec): min=0, max=0, avg=0 |
| ,samples=0 before_al_begin69664601124264,samples=2 in_actlog 1589946549, samples=280
PEER bsr-02:
peer requests |
| submit: 100
submit (usec): min= |
| 2721793349,samples=72 3589987579, samples=28
83236274946380, samples=161
al_uptate : total=0
before_bm_write (usec): min=0, max=0, avg=0, samples=0
after_bm_write (usec): min=0, max=0, avg=0, samples=0
after_sync_page | → 100 peer requests for peer bsr-02 are processed → No peer requests for peer bsr-03 occurred |
Code Block |
---|
> bsrmon /report peer_reqstat r0 0
Report r0 [PEER REQUEST STAT - vnr0]
Run: 2021-07-11_22:23:08.890 - 2021-07-19_21:56:18.179
PEER bsr-02:
peer requests : total=344054
submit (usec): min= |
| 0000PEER100.100.10.31:7792:
pre_send7166718722364161acked 11163558115737161 net_done (usec): min=82, max=2362749, avg=39406, samples=161
PEER bsr03:
bsr-03:
peer requests : total=133288
submit |
| pre_send 0000acked 0000net_done0000 38180 requests processed during Aggregated data for the period | 20201206054636678 20201208094546219→ al_update never occurs |
...
179 → BIO completion logic took an average of 0.1 seconds to execute |
|
alstat {resource} {vnr}
Aggregate data (usec) about the time taken from the peer request creation time to a specific section
...
peer_requests : Number of peer requests processed during the monitoring period (cycle)
...
submit : Time taken until peer request submit is executed
...
bio_endio : Time taken until the bio completion logic of the peer request is executed
...
for usage figures in active log
Aggregate used values, maximum value of used values
Aggregate incremental values of HITS, MISSES, STARVING, LOCKED, CHANGED
Count of AL_WAIT RETRY
Aggregation of AL shortage causes
starving, pending, used, busy, wouldblock
Expand |
---|
title | Usage Exampleexamples |
---|
|
Code Block |
---|
> bsrmon /watch |
| peer_reqstat
07_215202.191PEERbsr-03peerrequests:0submit(usec)min=0,max=0,avg=0bio_endiousec): min=0, max=0, avg=0destroyusec): min=0, max=0, avg=0PEERbsr-02:peerrequests100submitusec)min=421,max=6907,avg=2184bio_endiousec): min=1021, max=7312, avg=2563total=5854)
al_wait retry : |
| destroyusec): min=17397955, avg=3244→ 100 peer requests processed for bsr-02 → No peer request for bsr-03 | Code Block | > bsrmon /report peer_reqstat r0 0
Report r0 [PEER REQUEST STAT - vnr0]
Run: 2021-07-11_22:23:08.890 - 2021-07-19_21:56:18.179
PEER bsr-02:
peer requests : total=3440540)
pending_changes : 1/64
error : 0
NOBUFS - starving : 0
- pending slot : 0
|
| submit(usec):min=1,max=36103, avg=99, samples=6902
bio_endio (usec): min=1, max=116988, avg=96161, samples=9517destroy (usec)min=47, max=117495, avg=96340, samples=9518
PEER bsr-03:
peer requests : total=133288
submit (usec): min=1, max=1670, avg=5, samples=6037
bio_endio (usec): min=1, max=117000, avg=104660, samples=8708
destroy (usec): min=63, max=125871, avg=104839, samples=8709→ Data aggregated during the period 2021-07-11_22:23:08.890 - 2021-07-19_21:56:18.179 → It took an average of 0.1 seconds for the bio completion logic to be executed. | |
alstat {resource} {vnr}
Aggregate data for usage figures in active log
Aggregation of used values, the maximum value of used values
Aggregating increments of hits, misses, starving, locked, changed
al_wait retry count count
aggregation of causes of AL shortages
starving, pending, used, busy, wouldblock
Expand |
---|
|
|
Code Block |
---|
> bsrmon /watch alstat r0 0
2021-08-04_19:10:11.463
used : 3/67 (max=8)
hits : 579 (total=2843)
misses : 2250 (total=17564)
starving: 0 (total=0)
locked : 0 (total=0)
changed : 749 (total=5854)
al_wait retry : 0 (total=0, max=0)
pending_changes : 1/64
error : 0
NOBUFS - starving : 0
- pending slot : 0
- used slot : 0
BUSY : 0
WOULDBLOCK : 0
flags : __LC_DIRTY __LC_LOCKED |
|
→ There are 3 al slots in use at the time → 1 pending al slot → The dirty, locked flag is set |
Code Block |
> bsrmon /report alstat r0 0
Report r0 [AL STAT - vnr0]
Run: 2021-08-03_04:48:33.721 - 2021-08-03_22:47:28.326
al_extents : 6001
used : max=0(all_slot_used=0), avg=0
hits : total=0
misses : total=0
starving 0
flags : __LC_DIRTY __LC_LOCKED |
|
→ 3 AL slots in use at that time → 1 al slot is pending → dirty, locked flag is set |
Code Block |
---|
> bsrmon /report alstat r0 0
Report r0 [AL STAT - vnr0]
Run: 2021-08-03_04:48:33.721 - 2021-08-03_22:47:28.326
al_extents : 6001
used : max=0(all_slot_used=0), avg=0
hits : total=0
misses : total=0
starving : total=0
locked : total=0
changed : total=0
al_wait retry count : max=0, total=0
pending_changes : max=0, total=0
error : total=0
NOBUFS - starving : total=0
- pending slot : total=0
- used slot : total=0
BUSY : total=0
WOULDBLOCK : total=0
-> al_extents changed
Run: 2021-08-03_22:47:52.895 - 2021-08-04_19:27:06.522
al_extents : 67
used : max=67(all_slot_used=59), avg=3
hits : total=337528
misses : total=2020409
starving : total=2
locked : total=0
changed : total=673370
al_wait retry count : max=2, total=501
pending_changes : max=64, total=258
error : total=1004
NOBUFS - starving : total=501
- pending slot : total=0
- used slot : total=503
BUSY : total=0
|
|
locked
changed : total=0
al_wait retry count : max=0, total=0
pending_changes : max=0, total=0
error : total=0
NOBUFS - starving : total=0
- pending slot : total=0
- used slot : total=0
BUSY : total=0
WOULDBLOCK : total=0
-> al_extents changed
Run: 2021-08-03_22:47:52.895 - 2021-08-04_19:27:06.522
al_extents : 67
used : max=67(all_slot_used=59), avg=3
hits : total=337528
misses : total=2020409
starving : total=2
locked : total=0
changed : total=673370
al_wait retry count : max=2, total=501
pending_changes : max=64, total=258
error : total=1004
NOBUFS - starving : total=501
- pending slot : total=0
|
→ al_extents value changed from 6001 to 67 on 2021-08-03_22:47:52.895 → When AL is set to 67, all AL slots are used 59 times, STARVING situation occurs 2 times, and AL_WAIT RETRY occurs 501 times |
network {resource}
Replication network sending and receiving speed (byte/s)
Expand |
---|
|
Code Block |
---|
> bsrmon /watch network r0
2020-12-08_09:47:45.84
PEER 100.100.10.31:7792:
send (byte/s): 62932184
recv (byte/s): 4820
PEER bsr03:
send (byte/s): 0
recv (byte/s): 0
2020-12-08_09:47:46.160
PEER 100.100.10.31:7792:
send (byte/s): 104896568
recv (byte/s): 12320
PEER bsr03:
send (byte/s): 0
recv (byte/s): 0
... |
| → Disconnected from peer BSR03 → Data transfer is in progress to peer 100.100.10.31:7792 |
Code Block |
---|
> bsrmon /report network r0
Report r0 [NETWORK SPEED]
Run: 2020-11-30_06:38:44.653 - 2020-12-08_09:53:31.455
PEER 100.100.10.31:7792: send=52497905byte/s, receive=5902byte/s
send (byte/s): min=3, max=115395016, avg=52497905, samples=392
recv (byte/s): min=3, max=15004, avg=5902, samples=367
PEER bsr03: send=0byte/s, receive=0byte/s
send (byte/s): min=0, max=0, avg=0, samples=0
recv (byte/s): min=0, max=0, avg=0, samples=0 |
| → On average, 52.4 MB of data is being sent to peer 100.100.10.31 node |
|
sendbuf {resource}
Transmit buffer usage (bytes)
Expand |
---|
|
Code Block |
---|
> bsrmon /watch sendbuf r0
2020-12-08_09:54:13.735
PEER 100.100.10.31:7792:
highwater: 11246, fill: 37094400bytes
ap_in_flight: 8881 (9094144bytes)
rs_in_flight: 2365 (28000256bytes)
data stream
size (bytes): 10485760000
used (bytes): 27098576
[P_DATA] - |
| used cnt: 7542 size: 8024616bytes
|
| slot:total=503BUSY[P_RS_DATA_REPLY] - cnt: 1596 size: |
| total=0WOULDBLOCK: total=0→ al_extents value changed from 6001 to 67 on 2021-08-03_22:47:52.895 → If al is set to 67, the number of times that all al slots are used 59 times, starving situation occurs 2 times, and al_wait retry occurs 501 times. | |
network {resource}
Replication network transmit/receive speed (byte/s)
Expand |
---|
|
Expand |
---|
title | Usage Example Code Block | > bsrmon /watch network r0
2020-12-08_09:47:45.84
PEER 100.100.10.31:7792:
send (byte/s): 62932184
recv (byte/s): 4820
PEER bsr03:
send (byte/s): 0
recv (byte/s): 0
2020-12-08_09:47:46.160
PEER 100.100.10.31:7792:
send (byte/s): 104896568
recv (byte/s): 12320
PEER bsr03:
send (byte/s): 0
recv (byte/s [P_BARRIER] - cnt: 549 size: 13176bytes
[P_UNPLUG_REMOTE] - cnt: 7406 size: 118496bytes
control stream
size (bytes): 5242880
used (bytes): 0
PEER bsr03:
highwater: 0, fill: 0bytes
ap_in_flight: 0 (0bytes)
rs_in_flight: 0 (0bytes)
data stream
size (bytes): 10485761
used (bytes): 0
control stream
size (bytes): 5242881
used (bytes): 0
... |
| → peer 100.100.10.31:7792, bsr03 has a transmit buffer assigned to it → Disconnected from peer bsr03 → Data transmission is in progress to peer 100.100.10.31:7792 | is in progress Code Block | > → The number of replication and synchronization data pending on peer 100.100.10.31:7792 is 11246 (8881 replication data, 2365 synchronization data) |
Code Block |
---|
> bsrmon /report |
| networkNETWORKSPEED11300638446535331455718
PEER 100.100.10.31:7792: data |
| send=52497905byte/s, receive=5902byte/sstream size=10485761byte, control stream size=5242881byte
|
| sendbyte/s311539501652497905392recvbyte/s3150045902367PEERbsr03send0byte/sreceive=0byte/s
max=8014, avg=760, samples=999
PEER bsr03: data stream size=10485761byte, control stream size=5242881byte
|
| sendbyte/sbytes): min=0, max=0, avg=0, samples=0
|
| recvbyte/sbytes): min=0, max=0, avg=0, samples=0 |
| → On average, | 524 MB of data is being transferred to the peer 100.100.10.31 node. |
---|
sendbuf {resource}
Send buffer usage (bytes)
7MB of the 10MB buffer is being used |
|
memstat
Memory usage by users and modules
In user space, the memory used by the bsradm, bsrsetup, bsrcon, bsrmon, and bsrservice processes is counted.
The following information about the memory used by the kernel module is aggregated by OS.
windows
linux
slab cache information
/sys/kernel/slab/bsr_req
/sys/kernel/slab/bsr_al
/sys/kernel/slab/bsr_bm
/sys/kernel/slab/bsr_ee
Expand |
---|
|
Code Block |
---|
> bsrmon /watch |
| sendbufr0
5413735PEER 100.100.10.31:7792:datastream size(bytes)10485761used(bytes):204920 [P_RS_DATA_REPLY]-cnt:3size:209016bytescontrolstreamsize(bytes):5242881used(bytes):0PEERbsr03:datastreamsize(bytes):10485761used(bytes):0controlstreamsize (bytes): 5242881used(bytes):0...→ The send buffer is allocated to peer 100.100.10.31:7792, bsr03 → Disconnected from peer bsr03 → Data transmission to peer 100.100.10.31:7792 is in progress | |
Code Block |
---|
> bsrmon /report sendbuf r0
Report r0 [SEND BUFFER]
Run: 2020-12-05_13:26:59.969 - 2020-12-08_09:56:33.718
PEER 100.100.10.31:7792: data stream size=10485761byte, control stream size=5242881byte
data-used (bytes): min=2097232, max=7603084, avg=4787174, samples=5
cntl-used (bytes): min=0, max=0, avg=0, samples=0
PEER bsr03: data stream size=10485761byte, control stream size=5242881byte
data-used (bytes): min=0, max=0, avg=0, samples=0
cntl-used (bytes): min=0, max=0, avg=0, samples=0 |
|
→ On average, 4.7 MB of buffer out of 10 MB is used |
memstat
Memory usage by users and modules
In user space, the memory used by the bsradm, bsrsetup, bsrcon, bsrmon, and bsrservice processes is aggregated.
As for the memory used by the kernel module, the following information is aggregated by OS.
windows
linux
slab cache informations
/sys/kernel/slab/bsr_req
/sys/kernel/slab/bsr_al
/sys/kernel/slab/bsr_bm
/sys/kernel/slab/bsr_ee
Expand |
---|
|
Code Block | > bsrmon /watch memstat
2020-12-08_09:57:27.171
module (bytes)
BSR_REQ : 16334336
BSR_AL : 803760
BSR_BM : 4161536
BSR_EE : 2782560
user (kbytes)
name pid rsz vsz bsrmon 113913 1068 12724
bsrmon 113978 1308 12728 |
| → Multiple bsrmon processes are running. The presence of an unterminated bsrmon process. |
Code Block |
---|
> bsrmon /report memstat
Report [MEMORY]
Run: 2020-12-06_19:04:12.177 - 2020-12-08_09:59:17.716
module (bytes)
BSR_REQ: 16303104 - 16459264
BSR_AL : 803760
BSR_BM : 4063232 - 5271552
BSR_EE : 2782560
user (kbytes)
name rsz vsz
bsradm 0 0
bsrsetup 0 |
| bsrmon29304119212724bsrmon37474120012720bsrmon112177119212724bsrmon113913106812724bsrmon113978130812728 → Multiple bsrmon processes running. A non-terminating bsrmon process exists | Code Block | > bsrmon /report memstat
Report [MEMORY]
Run: 2020-12-06_19:04:12.177 - 20201068 - 1368 12720 - 12732 |
| → Print out the memory usage range of data collected during the period 2020-12-06_19:04:12.177 - 2020-12-08_09:59:17.716 |
module (bytes)
: 16303104 - 16459264
BSR_AL : 803760
BSR_BM : 4063232 - 5271552
BSR_EE : 2782560
user (kbytes)
name rsz vsz
bsradm 0 0
bsrsetup 0 0
bsrmeta 0 0
bsrmon 1068 - 1368 12720 - 12732 → Output the memory usage range of the data collected during the period 2020-12-06_19:04:12.177 - 2020-12-08_09:59:17.716 → BSR_REQ uses 16303104 ~ 16459264 bytes → Considering that there is no data collected in user space except for bsrmon, it can be seen that the cli commands were executed within 1 second(The measurement period of bsrmon is 1 second.)used 16303104 ~ 16459264 bytes → We can see that there is no data collected in the user space except bsrmon, which means that the CLI commands were executed in less than 1 second. |
|
resync_ratio {resource} {vnr}
Real-time replication, the amount of synchronization transfers and the ratio of synchronization transfers to it.
Updated only on the source node when synchronization is in progress.
Expand |
---|
|
Code Block |
---|
> bsrmon /watch resync_ratio r0 0
2022-04-12_15:34:52.206
svr06
replcation(144100kb)/resync(18508kb), resync ratio 11% |
| → The connected svr06 node was sending 144100 KB of replication data and 18508 KB of synchronization data per second, for a synchronization transfer rate of 11%. |
|
Commands
/start
Enables the Performance Monitor feature and starts aggregating performance data and logging files. By default, Performance Monitor is enabled.
Enabling Performance Monitor is done through the following process
windows
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\bsr\bsrmon_run registry value to 1
Enable the BSR Engine's performance data aggregation logic
Enable bsrmon /file execution logic in bsrservice
linux
Set the value of /etc/bsr.d/.bsrmon_run to 1
Enable BSR Engine's performance data aggregation logic
Run the /lib/bsr/bsrmon-run script. Run the bsrmon /file command periodically
/stop
Disable the performance monitor. Disabling stops the engine's performance data aggregation and file logging.
Deactivation is performed as follows
windows
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\bsr\bsrmon_run Set the registry value to 0
Disable the BSR Engine's performance data aggregation logic
Disable the bsrmon /file run logic in bsrservice
linux
Set the value of /etc/bsr.d/.bsrmon_run to 0
Disable BSR Engine's performance data aggregation logic
Exit the /lib/bsr/bsrmon-run script
/status
Query the behavioral status of the performance monitor.
/file
Save performance data and memory information to a file. This is a hidden command and is used by the bsrservice and bsrmon-run scripts.
The file is stored in the following location
windows : C:\Program Files\bsr\log\perfmon
linux : /var/log/bsr/perfmon/
Only the lookup time and the numbers for each item are stored, and the data is parsed and formatted for output when you run the /show, /watch, or /report command.
Code Block |
---|
2021-07-20_05:55:21.443 955 955 244480 244480 1042 1042 266752 266752 |
/show
[/t {types[,...]|all}] [/r {resource[,...]|all}] [/j|/json] [/c|/continue]
Outputs performance data for all entries for all resources at once. The output target is the last data written to the performance file.
[/t {types[,...]|all}] [/r {resource[,...]|all}]
Provides the option to specify specific performance items and resources for output. Enter performance item and resource names separated by ',' and without spaces.
[/j|/json]
Outputs data in JSON format.
[/c|/continue]
Outputs performance data periodically. The output period follows the period value set in bsrmon.
Usage examples
Output everything on all resources
Expand |
---|
|
timestamp is output by memory and resource. IOSTAT, IOCLAT, REQSTAT, PEER_REQSTAT, AL_STAT, and RESYNC_RATIO are output separated by VNR. MEMORY performance data is output differently depending on the OS (WINDOWS/LINUX).
Code Block |
---|
[root@cent79_01 bsr-utils]# bsrmon /show
bsrmon {
memory {
system {
total_memory 7990028; # kbytes
used_memory 1843756; # kbytes
free_memory 2511096; # kbytes
buff/cache 3635176; # kbytes
}
module {
slab {
bsr_req 15982; # kbytes
bsr_al 1020; # kbytes
bsr_bm 7880; # kbytes
bsr_ee 2384; # kbytes
total_bio_set 520; # kbytes
kmalloc 157; # kbytes
}
vmalloc 0; # kbytes
total_page_pool 33344; # kbytes
}
user {
top_process {
name gnome-shell;
pid 2482;
rsz 250212; # kbytes
vsz 4082020; # kbytes
}
bsr_process {
name bsrmon;
pid 19370;
rsz 1364; # kbytes
vsz 12780; # kbytes
}
}
timestamp 2022-10-04_00:10:55.805;
}
resource r0 {
vnr 0 {
iostat {
read_iops 0;
read_iocnt 0;
read_kbs 0; # kbytes/second
read_kb 0; # kbytes
write_iops 0;
write_iocnt 0;
write_kbs 0; # kbytes/second
write_kb 0; # kbytes
}
ioclat {
local_min 0; # usec
local_max 0; # usec
local_avg 0; # usec
master_min 0; # usec
master_max 0; # usec
master_avg 0; # usec
}
reqstat {
requests {
count 0;
before_queue_min 0; # usec
before_queue_max 0; # usec
before_queue_avg 0; # usec
before_al_begin_min 0; # usec
before_al_begin_max 0; # usec
before_al_begin_avg 0; # usec
in_actlog_min 0; # usec
in_actlog_max 0; # usec
in_actlog_avg 0; # usec
submit_min 0; # usec
submit_max 0; # usec
submit_avg 0; # usec
bio_endio_min 0; # usec
bio_endio_max 0; # usec
bio_endio_avg 0; # usec
destroy_min 0; # usec
destroy_max 0; # usec
destroy_avg 0; # usec
}
al_update {
count 0;
before_bm_write_min 0; # usec
before_bm_write_max 0; # usec
before_bm_write_avg 0; # usec
after_bm_write_min 0; # usec
after_bm_write_max 0; # usec
after_bm_write_avg 0; # usec
after_sync_page_min 0; # usec
after_sync_page_max 0; # usec
after_sync_page_avg 0; # usec
}
peer cent79_03 {
pre_send_min 0; # usec
pre_send_max 0; # usec
pre_send_avg 0; # usec
acked_min 0; # usec
acked_max 0; # usec
acked_avg 0; # usec
net_done_min 0; # usec
net_done_max 0; # usec
net_done_avg 0; # usec
}
}
peer_reqstat {
peer cent79_03 {
count 0;
submit_min 0; # usec
submit_max 0; # usec
submit_avg 0; # usec
bio_endio_min 0; # usec
bio_endio_max 0; # usec
bio_endio_avg 0; # usec
destroy_min 0; # usec
destroy_max 0; # usec
destroy_avg 0; # usec
}
}
al_stat {
al-extents 6001;
al_used 0;
al_used_max 0;
hits 0;
hits_total 2064;
misses 0;
misses_total 1998;
starving 0;
starving_total 0;
locked 0;
locked_total 0;
changed 0;
changed_total 249;
al_wait_retry_cnt 0;
al_wait_total_retry_cnt 0;
al_wait_max_retry_cnt 0;
pending_changes 0;
max_pending_changes 64;
error {
nobufs_starving 0;
nobufs_pending_slot 0;
nobufs_used_slot 0;
busy 0;
wouldblock 0;
}
flags __LC_DIRTY,__LC_LOCKED;
}
resync_ratio {
peer cent79_02 {
replication 0; # byte/second
resync 0; # byte/second
resync_ratio 0; # percent
}
peer cent79_03 {
replication 0; # byte/second
resync 0; # byte/second
resync_ratio 0; # percent
}
}
vnr 1 {
iostat {
read_iops 0;
read_iocnt 0;
read_kbs 0; # kbytes/second
read_kb 0; # kbytes
write_iops 0;
write_iocnt 0;
write_kbs 0; # kbytes/second
write_kb 0; # kbytes
}
ioclat {
local_min 0; # usec
local_max 0; # usec
local_avg 0; # usec
master_min 0; # usec
master_max 0; # usec
master_avg 0; # usec
}
reqstat {
requests {
count 0;
before_queue_min 0; # usec
before_queue_max 0; # usec
before_queue_avg 0; # usec
before_al_begin_min 0; # usec
before_al_begin_max 0; # usec
before_al_begin_avg 0; # usec
in_actlog_min 0; # usec
in_actlog_max 0; # usec
in_actlog_avg 0; # usec
submit_min 0; # usec
submit_max 0; # usec
submit_avg 0; # usec
bio_endio_min 0; # usec
bio_endio_max 0; # usec
bio_endio_avg 0; # usec
destroy_min 0; # usec
destroy_max 0; # usec
destroy_avg 0; # usec
}
al_update {
count 0;
before_bm_write_min 0; # usec
before_bm_write_max 0; # usec
before_bm_write_avg 0; # usec
after_bm_write_min 0; # usec
after_bm_write_max 0; # usec
after_bm_write_avg 0; # usec
after_sync_page_min 0; # usec
after_sync_page_max 0; # usec
after_sync_page_avg 0; # usec
}
peer cent79_03 {
pre_send_min 0; # usec
pre_send_max 0; # usec
pre_send_avg 0; # usec
acked_min 0; # usec
acked_max 0; # usec
acked_avg 0; # usec
net_done_min 0; # usec
net_done_max 0; # usec
net_done_avg 0; # usec
}
}
peer_reqstat {
peer cent79_03 {
count 0;
submit_min 0; # usec
submit_max 0; # usec
submit_avg 0; # usec
bio_endio_min 0; # usec
bio_endio_max 0; # usec
bio_endio_avg 0; # usec
destroy_min 0; # usec
destroy_max 0; # usec
destroy_avg 0; # usec
}
}
al_stat {
al-extents 6001;
al_used 0;
al_used_max 0;
hits 0;
hits_total 0;
misses 0;
misses_total 0;
starving 0;
starving_total 0;
locked 0;
locked_total 0;
changed 0;
changed_total 0;
al_wait_retry_cnt 0;
al_wait_total_retry_cnt 0;
al_wait_max_retry_cnt 0;
pending_changes 0;
max_pending_changes 64;
error {
nobufs_starving 0;
nobufs_pending_slot 0;
nobufs_used_slot 0;
busy 0;
wouldblock 0;
}
flags NONE;
}
resync_ratio {
peer cent79_02 {
replication 0; # byte/second
resync 0; # byte/second
resync_ratio 0; # percent
}
peer cent79_03 {
replication 0; # byte/second
resync 0; # byte/second
resync_ratio 0; # percent
}
}
network {
peer cent79_02 {
send 180; # byte/second
recv 384; # byte/second
}
peer cent79_03 {
send 0; # byte/second
recv 0; # byte/second
}
}
sendbuf {
peer cent79_02 {
ap_in_flight {
size 17301504; # bytes
count 33;
}
rs_in_flight {
size 0; # bytes
count 0;
}
highwater 33;
fill 17301504; # bytes
data_stream {
size 20971520; # bytes
used 0; # bytes
packet {
name P_DATA
count 1;
size 0; # bytes
}
}
control_stream {
size 5242880; # bytes
used 0; # bytes
}
}
peer cent79_03 {
ap_in_flight {
size 0; # bytes
count 0;
}
rs_in_flight {
size 0; # bytes
count 0;
}
highwater 0;
fill 0; # bytes
data_stream {
size 20971520; # bytes
used 0; # bytes
}
control_stream {
size 5242880; # bytes
used 0; # bytes
}
}
}
timestamp 2022-09-29_22:54:25.064;
}
resource r1 {
...
timestamp 2022-10-04_00:10:55.805;
}
} |
|
Output specific items from all resources
Expand |
---|
title | bsrmon /show /t iostat |
---|
|
Code Block |
---|
[root@cent79_01 bsr-utils]# bsrmon /show /t iostat
bsrmon {
resource r0 {
vnr 0 {
iostat {
read_iops 0;
read_iocnt 0;
read_kbs 0; # kbytes/second
read_kb 0; # kbytes
write_iops 0;
write_iocnt 0;
write_kbs 0; # kbytes/second
write_kb 0; # kbytes
}
}
vnr 1 {
iostat {
read_iops 0;
read_iocnt 0;
read_kbs 0; # kbytes/second
read_kb 0; # kbytes
write_iops 0;
write_iocnt 0;
write_kbs 0; # kbytes/second
write_kb 0; # kbytes
}
}
timestamp 2022-09-29_22:54:25.064;
}
resource r1 {
vnr 0 {
iostat {
read_iops 0;
read_iocnt 0;
read_kbs 0; # kbytes/second
read_kb 0; # kbytes
write_iops 0;
write_iocnt 0;
write_kbs 0; # kbytes/second
write_kb 0; # kbytes
}
}
timestamp 2022-10-04_00:17:52.249;
}
} |
|
Output specific items from specific resources
Expand |
---|
title | bsrmon /show /t network,sendbuf /r r0,r1 |
---|
|
Code Block |
---|
[root@cent79_01 bsr-utils]# bsrmon /show /t network,sendbuf /r r0,r1
bsrmon {
resource r0 {
network {
peer cent79_02 {
send 180; # byte/second
recv 384; # byte/second
}
peer cent79_03 {
send 0; # byte/second
recv 0; # byte/second
}
}
sendbuf {
peer cent79_02 {
ap_in_flight {
size 17301504; # bytes
count 33;
}
rs_in_flight {
size 0; # bytes
count 0;
}
highwater 33;
fill 17301504; # bytes
data_stream {
size 20971520; # bytes
used 0; # bytes
packet {
name P_DATA
count 1;
size 0; # bytes
}
}
control_stream {
size 5242880; # bytes
used 0; # bytes
}
}
peer cent79_03 {
ap_in_flight {
size 0; # bytes
count 0;
}
rs_in_flight {
size 0; # bytes
count 0;
}
highwater 0;
fill 0; # bytes
data_stream {
size 20971520; # bytes
used 0; # bytes
}
control_stream {
size 5242880; # bytes
used 0; # bytes
}
}
}
timestamp 2022-09-29_22:54:24.021;
}
resource r1 {
network {
peer cent79_02 {
send 0; # byte/second
recv 0; # byte/second
}
peer cent79_03 {
send 0; # byte/second
recv 0; # byte/second
}
}
sendbuf {
peer cent79_02 {
ap_in_flight {
size 0; # bytes
count 0;
}
rs_in_flight {
size 0; # bytes
count 0;
}
highwater 0;
fill 0; # bytes
data_stream {
size 0; # bytes
used 0; # bytes
}
control_stream {
size 0; # bytes
used 0; # bytes
}
}
peer cent79_03 {
ap_in_flight {
size 0; # bytes
count 0;
}
rs_in_flight {
size 0; # bytes
count 0;
}
highwater 0;
fill 0; # bytes
data_stream {
size 0; # bytes
used 0; # bytes
}
control_stream {
size 0; # bytes
used 0; # bytes
}
}
}
timestamp 2022-10-04_00:19:53.026;
}
} |
|
/watch
Displays the data being aggregated by type in real-time. When using the /scroll option, the output is displayed in a line-by-line fashion.
/report
{types} [/f {filename}] [/p {peer_name[,...]}] [/d {YYYY-MM-DD}] [/s {timestamp}] [/e {timestamp}]
Output statistics of data written to files (including data from backed up files) by type.
Enter the name of the file to report on. This is used to calculate statistics for a specific file.
Enter the name of the peer to report on. This is used to calculate statistics for a specific peer. Separate multiple PEER entries with commas (,) without spaces. If not entered, output statistics to the PEERs of the resource configured on the server.
[/d {YYYY-MM-DD}] [/s {YYYY-MM-DD|hh:mm[:ss]|YYYY-MM-DD_hh:mm[:ss]}] [/e {YYYY-MM-DD|hh:mm[:ss]|YYYY-MM-DD_hh:mm[:ss]}]
This option allows you to query figures for a specific period. The /d option outputs statistics of data recorded on that date. /s specifies the start date and time of the query. /e specifies the end date and time of the query. When entering the /s, /e options, the date and time are separated by an underscore (_).
Expand |
---|
|
bsrmon /report iostat r0 0 /d 2022-11-01 bsrmon /report iostat r0 0 /s 2022-11-01 /e 2022-11-10 bsrmon /report iostat r0 0 /s 2022-11-01_09:00 /e 2022-11-10_20:00 bsrmon /report iostat r0 0 /s 09:00 /e 20:00
|
/set
{period, file_size, file_cnt} {value}
Commands to adjust numbers related to monitoring.
period
Sets the file saving and monitoring interval. Set in seconds, the default value is 1 second.
file_size
Set the file rolling size. Set in MB, the default value is 50 MB.
file_cnt
Set the number of file rolls. The default value is 3.
/get
{all, period, file_size, file_cnt}
Commands to get numbers related to monitoring.
/io_delay_test
{flag} {delay point} {delay time}
Intentionally cause I/O performance degradation to validate the functionality of the bsr performance monitor. This is a developer feature.
/debug cmds options
Commands for windows to retrieve debugfs information.