Voting?

Tags: 磁盤, 算法,

Voting Disk多數可用算法

為了保證Voting Disk的安全,需要配置多個Voting Disk,Voting disk使用的是一種“多數可用算法”。當有多個Voting disk時,則必須一半以上的Votedisk同時使用,Clusterware才能正常使用。votedisk如果使用dg,有且只能有一個dg,如果採用normal冗餘策略,建立一個3個冗餘磁盤的dg。

測試環境:rhel6.4 oracle 11g R2 RAC

OCR 環境:gridnewdg和gridmirdg 鏡像關係

Voting Disk 環境:

datadg創建語句:

CREATE DISKGROUP datadg NORMAL REDUNDANCY

FAILGROUP datafail01 DISK '/dev/mapper/dataA01','/dev/mapper/dataB01'

FAILGROUP datafial02 DISK '/dev/mapper/dataA02','/dev/mapper/dataB02'

ATTRIBUTE 'compatible.asm' = '11.2.0.0.0';

gridnewdg 創建語句:

CREATE DISKGROUP gridnewdg NORMAL REDUNDANCY

FAILGROUP gnewfail01 DISK '/dev/mapper/crsA12'

FAILGROUP gnewfial02 DISK '/dev/mapper/crsA13'

FAILGROUP gnewfial03 DISK ''/dev/mapper/crsB13'

ATTRIBUTE 'compatible.asm' = '11.2.0.0.0';

gridmirdg創建語句:

CREATE DISKGROUP gridmirdg NORMAL REDUNDANCY

FAILGROUP gmirfail01 DISK '/dev/mapper/crsB11'

FAILGROUP gmirfial02 DISK '/dev/mapper/crsB12'

ATTRIBUTE 'compatible.asm' = '11.2.0.0.0';

磁盤crsA*屬於存儲A,磁盤crsB*屬於存儲B

測試步驟:

1:檢測OCR鏡像DG是否在不同的存儲上

2:關閉存儲之前查看vote存放磁盤

3:正常關閉存儲B,等待2分鐘時間

4:集群正常情況下啟動存儲B

5:解決GRIDMIRDG不能掛載問題

6:解決vote磁盤問題

7:正成關閉存儲A,等待2分鐘左右

8:Voting Disk多數可用算法 成立

注意:

以下“SQL>”表示都在ASM實例下執行

如何進入ASM管理工具執行參考我的經驗:

sqlplus sys as sysasm進入ASM管理工具

/shuma/2049642hzg.html

測試步驟:

1:檢測OCR鏡像DG是否在不同的存儲上

[[email protected] ~]$ ocrcheck

Status of Oracle Cluster Registry is as follows :

Version : 3

Total space (kbytes) : 262120

Used space (kbytes) : 2780

Available space (kbytes) : 259340

ID : 1679364394

Device/File Name : +gridnewdg

Device/File integrity check failed

Device/File Name : +gridmirdg

Device/File integrity check failed

Device/File not configured

Device/File not configured

Device/File not configured

[[email protected] ~]$ more /etc/oracle/ocr.loc

#Device/file getting replaced by device +gridmirdg

ocrconfig_loc=+gridnewdg

ocrmirrorconfig_loc=+gridmirdg 鏡像關係

local_only=false

[[email protected] ~]$

從上面可以看出ocr文件存放在gridnewdg和gridmirdg,兩者是鏡像關係。

接下來是看兩個DG所屬磁盤了。

SQL> select name ,path from v$asm_disk order by name;

NAME PATH

-------------------- ------------------------------

DATADG_0000 /dev/mapper/dataA01

DATADG_0001 /dev/mapper/dataB01

DATADG_0002 /dev/mapper/dataA02

DATADG_0003 /dev/mapper/dataB02

GRIDMIRDG_0000 /dev/mapper/crsB11

GRIDMIRDG_0001 /dev/mapper/crsB12

GRIDNEWDG_0000 /dev/mapper/crsB13

GRIDNEWDG_0002 /dev/mapper/crsA12

GRIDNEWDG_0004 /dev/mapper/crsA13

/dev/mapper/dataA04

/dev/mapper/crsA11

NAME PATH

-------------------- ------------------------------

/dev/mapper/crsA15

/dev/mapper/dataA03

/dev/mapper/dataB05

/dev/mapper/dataB03

/dev/mapper/dataB04

/dev/mapper/crsB15

/dev/mapper/crsB14

/dev/mapper/dataA05

/dev/mapper/crsA14

從上面可以看出停掉某一個存儲OCR文件一定是正常的。

Voting Disk多數可用算法

2:關閉存儲之前查看vote存放磁盤

[[email protected] grid]# crsctl query css votedisk

## STATE File Universal Id File Name Disk group

-- ----- ----------------- --------- ---------

1. ONLINE 84711683bef84f64bfcee1ba4130bd1b (/dev/mapper/crsA13) [GRIDNEWDG]

2. ONLINE 18a84823cb424fd6bfc009014cb99bfe (/dev/mapper/crsA12) [GRIDNEWDG]

3. ONLINE 2b2e9291c9b24f5dbfc979db5c0bbdd9 (/dev/mapper/crsB13) [GRIDNEWDG]

3:正常關閉存儲B

關閉存儲B後,集群正常,此時查看vote信息和DG信息

[[email protected] grid]# crsctl query css votedisk

## STATE File Universal Id File Name Disk group

-- ----- ----------------- --------- ---------

1. ONLINE 84711683bef84f64bfcee1ba4130bd1b (/dev/mapper/crsA13) [GRIDNEWDG]

2. ONLINE 18a84823cb424fd6bfc009014cb99bfe (/dev/mapper/crsA12) [GRIDNEWDG]

從上面可以看到存儲B的crsB13不在了

SQL> select name,path from v$asm_disk order by name;

NAME PATH

-------------------- ------------------------------

DATADG_0000 /dev/mapper/dataA01

DATADG_0001 /dev/mapper/dataB01

DATADG_0002 /dev/mapper/dataA02

DATADG_0003 /dev/mapper/dataB02

GRIDNEWDG_0002 /dev/mapper/crsA12

GRIDNEWDG_0004 /dev/mapper/crsA13

/dev/mapper/crsB13

/dev/mapper/crsB12

/dev/mapper/crsA11

/dev/mapper/crsA15

/dev/mapper/crsB11

NAME PATH

-------------------- ------------------------------

/dev/mapper/dataA05

/dev/mapper/dataA03

/dev/mapper/crsB15

/dev/mapper/crsA14

/dev/mapper/dataB03

/dev/mapper/dataB05

/dev/mapper/dataA04

/dev/mapper/crsB14

/dev/mapper/dataB04

Voting Disk多數可用算法

4:集群正常情況下啟動存儲B

啟動存儲B之後設備已經存在,發現以下問題:

1、datadg磁盤都正常

2、GRIDMIRDG沒有掛載,需要在兩個節點上執行掛載命令。

3、vote磁盤不在crsB13上了,且gridnewdg將 crsB13磁盤已經剔除出去了

Voting Disk多數可用算法

Voting Disk多數可用算法

Voting Disk多數可用算法

5:解決GRIDMIRDG不能掛載問題

在兩個節點執行mount命令即可

alter diskgroup gridmirdg mount;

Voting Disk多數可用算法

6:解決vote磁盤問題

SQL> select NAME,FAILGROUP,PATH,MOUNT_STATUS from v$asm_disk order by name;可以看到磁盤/dev/mapper/crsB13處於closed狀態。

[[email protected] mapper]# crsctl query css votedisk

## STATE File Universal Id File Name Disk group

-- ----- ----------------- --------- ---------

1. ONLINE 84711683bef84f64bfcee1ba4130bd1b (/dev/mapper/crsA13) [GRIDNEWDG]

2. ONLINE 18a84823cb424fd6bfc009014cb99bfe (/dev/mapper/crsA12) [GRIDNEWDG]

需要重新添加crsB13到gridnewdg中,但是在添加的時候報錯,原因是crsB13還存在之前的vote文件殘留信息。需要dd清除(清除的時候沒有必要把整個磁盤清除,否則花費時間較長。估計vote大小,然後清除即可)

SQL> alter diskgroup gridnewdg add failgroup gnewfial04 disk '/dev/mapper/crsB13';

alter diskgroup gridnewdg add failgroup gnewfial04 disk '/dev/mapper/crsB13'

*

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15033: disk '/dev/mapper/crsB13' belongs to diskgroup "GRIDNEWDG"

[[email protected] mapper]# dd if=/dev/zero of=/dev/mapper/crsB13 bs=8K count=200000 清除磁盤

200000+0 records in

200000+0 records out

1638400000 bytes (1.6 GB) copied, 69.0531 s, 23.7 MB/s

SQL> alter diskgroup gridnewdg add failgroup gnewfial04 disk '/dev/mapper/crsB13';

Diskgroup altered.

[[email protected] ~]$ crsctl query css votedisk

## STATE File Universal Id File Name Disk group

-- ----- ----------------- --------- ---------

1. ONLINE 84711683bef84f64bfcee1ba4130bd1b (/dev/mapper/crsA13) [GRIDNEWDG]

2. ONLINE 18a84823cb424fd6bfc009014cb99bfe (/dev/mapper/crsA12) [GRIDNEWDG]

3. ONLINE 2f845578e55a4f4bbf25d0864b0bc26b (/dev/mapper/crsB13) [GRIDNEWDG]

添加成功之後 查看vote存放在crsB13中了

Voting Disk多數可用算法

Voting Disk多數可用算法

7:正成關閉存儲A,等待2分鐘左右

發現兩臺服務器自動重啟,其集群不能正常啟動。

為了恢復集群:

1、分別在兩臺服務器上關閉集群:crsctl stop crs -f

2、啟動存儲A

3、重啟兩臺數據庫服務器

這時候,集群正常啟動,所有DG都正常

8:Voting Disk多數可用算法 成立

相關問題答案