Thursday, 30 June 2016

Golden Gate Interview questions and answers Part -2

1) What are processes/components in GoldenGate?

Ans:

Manager, Extract, Replicat, Data Pump

2) What is Data Pump process in GoldenGate ?

he Data Pump (not to be confused with the Oracle Export Import Data Pump) is an optional secondary Extract group that is created on the source system. When Data Pump is not used, the Extract process writes to a remote trail that is located on the target system using TCP/IP. When Data Pump is configured, the Extract process writes to a local trail and from here Data Pump will read the trail and write the data over the network to the remote trail located on the target system.

The advantages of this can be seen as it protects against a network failure as in the absence of a storage device on the local system, the Extract process writes data into memory before the same is sent over the network. Any failures in the network could then cause the Extract process to abort (abend). Also if we are doing any complex data transformation or filtering, the same can be performed by the Data Pump. It will also be useful when we are consolidating data from several sources into one central target where data pump on each individual source system can write to one common trail file on the target.


3) What is the command line utility in GoldenGate (or) what is ggsci?


ANS: Golden Gate Command Line Interface essential commands – GGSCI

GGSCI   -- (Oracle) GoldenGate Software Command Interpreter


4) What is the default port for GoldenGate Manager process?

ANS:

7809

5) What are important files GoldenGate?

GLOBALS, ggserr.log, dirprm, etc ...


6) What is checkpoint table?

ANS:

Create the GoldenGate Checkpoint table

GoldenGate maintains its own Checkpoints which is a known position in the trail file from where the Replicat process will start processing after any kind of error or shutdown.
This ensures data integrity and a record of these checkpoints is either maintained in files stored on disk or table in the database which is the preferred option.


7) How can you see GoldenGate errors?

ANS:

ggsci> VIEW GGSEVT
ggserr.log file


8 )GoldenGate supports the following topologies. More details can be found here.



  • Unidirectional
  • Bidirectional
  • Peer-to-peer
  • Broadcast
  • Consolidation
  • Cascasding


What are the main components of the Goldengate replication?

The replication configuration consists of the following processes.



  • Manager
  • Extract
  • Pump
  • Replicate

What transaction types does Goldengate support for Replication?

Goldengate supports both DML and DDL Replication from the source to target.
What are the supplemental logging pre-requisites?



  • The following supplemental logging is required.
  • Database supplemental logging
  • Object level logging

Why is Supplemental logging required for Replication?

When a transaction is committed on the source database, only new data is written to the Redo log. However for Oracle to apply these transactions on the destination database, the before image key values are required to identify the effected rows. This data is also placed in the trail file and used to identify the rows on the destination, using the key value the transactions are executed against them.
List important considerations for bi-directional replication?

The customer should consider the following points in an active-active replication environment.
Primary Key: Helps to identify conflicts and Resolve them.
Sequences: Are not supported. The work around is use to use odd/even, range or concatenate sequences.
Triggers: These should be disabled or suppressed to avoid using uniqueness issue
Data Looping: This can easy avoided using OGG itself
LAG: This should be minimized. If a customer says that there will not be any LAG due to network or huge load, then we don’t need to deploy CRDs. But this is not the case always as there would be some LAG and these can cause Conflicts.
CDR (Conflict Detection & Resolution): OGG has built in CDRs for all kind of DMLs that can be used to detect and resolve them.
Packaged Application: These are not supported as it may contain data types which are not support by OGG or it might not allow the application modification to work with OGG.

Are OGG binaries supported on ASM Cluster File System (ACFS)?

Yes, you can install and configure OGG on ACFS.

Are OGG binaries supported on the Database File System (DBFS)? What files can be stored in DBFS?

No, OGG binaries are not supported on DBFS. You can however store parameter files, data files (trail files), and checkpoint files on DBFS.

What is the default location of the GLOBALS file?

A GLOBALS file is located under Oracle GoldenGate installation directory (OGG HOME)

Where can filtering of data for a column be configured?

Filtering of the columns of a table can be set at the Extract, Pump or Replicat level.

Is it a requirement to configure a PUMP extract process in OGG replication?

A PUMP extract is an option, but it is highly recommended to use this to safe guard against network failures. Normally it is configured when you are setting up OGG replication across the network.

What are the differences between the Classic and integrated Capture?

Classic Capture:
The Classic Capture mode is the traditional Extract process that accesses the database redo logs (optionally archive logs) to capture the DML changes occurring on the objects specified in the parameter files.
At the OS level, the GoldenGate user must be a part of the same database group which owns the database redo logs.
This capture mode is available for other RDBMS as well.
There are some data types that are not supported in Classic Capture mode.
Classic capture can’t read data from the compressed tables/tablespaces.
[/sociallocker]
Integrated Capture (IC):
In the Integrated Capture mode, GoldenGate works directly with the database log mining server to receive the data changes in the form of logical change records (LCRs).
IC mode does not require any special setup for the databases using ASM, transparent data encryption, or Oracle RAC.
This feature is only available for oracle databases in Version 11.2.0.3 or higher.
 It also supports various object types which were previously not supported by Classic Capture.
This Capture mode supports extracting data from source databases using compression.
Integrated Capture can be configured in an online or downstream mode.

List the minimum parameters that can be used to create the extract process?

The following are the minimium required parameters which must be defined in the extract parameter file.



  • EXTRACT NAME
  • USERID
  • EXTTRAIL
  • TABLE


I want to configure multiple extracts to write to the same exttrail file? Is this possible?

Only one Extract process can write to one exttrail at a time. So you can’t configure multiple extracts to write to the same exttrail.

What type of Encryption is supported in Goldengate?

Oracle Goldengate provides 3 types of Encryption.
Data Encryption using Blow fish.
Password Encryption.
Network Encryption.

What are the different password encrytion options available with OGG?

You can encrypt a password in OGG using
Blowfish algorithm and
Advance Encryption Standard (AES) algorithm

What are the different encryption levels in AES?

You can encrypt the password/data using the AES in three different keys

a) 128 bit

b) 192 bit and
c) 256 bit

Golden Gate Interview Question Part -1

What type of Topology does Goldengate support?

GoldenGate supports the following topologies. More details can be found here.

Unidirectional
Bidirectional
Peer-to-peer
Broadcast
Consolidation
Cascasding

What are the main components of the Goldengate replication?

The replication configuration consists of the following processes.


  1. Manager
  2. Extract
  3. Pump
  4. Replicate
  5. What database does GoldenGate support for replication?
  6. Oracle Database
  7. TimesTen
  8. MySQL
  9. IBM DB2
  10. Microsoft SQL Server
  11. Informix
  12. Teradata
  13. Sybase
  14. Enscribe
  15. SQL/MX



What transaction types does Goldengate support for Replication?

Goldengate supports both DML and DDL Replication from the source to target.

What are the supplemental logging pre-requisites?

The following supplemental logging is required.


  • Database supplemental logging
  • Object level logging
  • Why is Supplemental logging required for Replication?
  • When a transaction is committed on the source database, only new data is written to the Redo log. However for Oracle to apply these transactions on the destination database, the before image key values are required to identify the effected rows. This data is also placed in the trail file and used to identify the rows on the destination, using the key value the transactions are executed against them.


List important considerations for bi-directional replication?

The customer should consider the following points in an active-active replication environment.


  • Primary Key: Helps to identify conflicts and Resolve them.
  • Sequences: Are not supported. The work around is use to use odd/even, range or concatenate sequences.
  • Triggers: These should be disabled or suppressed to avoid using uniqueness issue
  • Data Looping: This can easy avoided using OGG itself
  • LAG: This should be minimized. If a customer says that there will not be any LAG due to network or huge load, then we don’t need to deploy CRDs. But this is not the case always as there would be some LAG and these can cause Conflicts.
  • CDR (Conflict Detection & Resolution): OGG has built in CDRs for all kind of DMLs that can be used to detect and resolve them.
  • Packaged Application: These are not supported as it may contain data types which are not support by OGG or it might not allow the application modification to work with OGG.

Are OGG binaries supported on ASM Cluster File System (ACFS)?

Yes, you can install and configure OGG on ACFS.

Are OGG binaries supported on the Database File System (DBFS)? What files can be stored in DBFS?

No, OGG binaries are not supported on DBFS. You can however store parameter files, data files (trail files), and checkpoint files on DBFS.

What is the default location of the GLOBALS file?

A GLOBALS file is located under Oracle GoldenGate installation directory (OGG HOME)

Where can filtering of data for a column be configured?

Filtering of the columns of a table can be set at the Extract, Pump or Replicat level.

Is it a requirement to configure a PUMP extract process in OGG replication?

A PUMP extract is an option, but it is highly recommended to use this to safe guard against network failures. Normally it is configured when you are setting up OGG replication across the network.

What are the differences between the Classic and integrated Capture?

Classic Capture:

The Classic Capture mode is the traditional Extract process that accesses the database redo logs (optionally archive logs) to capture the DML changes occurring on the objects specified in the parameter files.
At the OS level, the GoldenGate user must be a part of the same database group which owns the database redo logs.
This capture mode is available for other RDBMS as well.
There are some data types that are not supported in Classic Capture mode.
Classic capture can’t read data from the compressed tables/tablespaces.

Integrated Capture (IC):

In the Integrated Capture mode, GoldenGate works directly with the database log mining server to receive the data changes in the form of logical change records (LCRs).
IC mode does not require any special setup for the databases using ASM, transparent data encryption, or Oracle RAC.
This feature is only available for oracle databases in Version 11.2.0.3 or higher.
 It also supports various object types which were previously not supported by Classic Capture.
This Capture mode supports extracting data from source databases using compression.
Integrated Capture can be configured in an online or downstream mode.

List the minimum parameters that can be used to create the extract process?

The following are the minimum required parameters which must be defined in the extract parameter file.


  • EXTRACT NAME
  • USERID
  • EXTTRAIL
  • TABLE

What are macros?

Macro is an easier way to build your parameter file. Once a macro is written it can be called from different parameter files. Common parameters like username/password and other parameters can be included in these macros. A macro can either be another parameter file or a library.

Where can macros be invoked?

The macros can be called from the following parameter files.


  • Manager
  • Extract
  • Replicat
  • Gobals

How is a macro defined?

A macro statement consists of the following.

Name of the Macro
Parameter list
Macro body
Sample:
MACRO #macro_name
PARAMS (#param1, #param2, …)
BEGIN
< macro_body >
END;

I want to configure multiple extracts to write to the same exttrail file? Is this possible?

Only one Extract process can write to one exttrail at a time. So you can’t configure multiple extracts to write to the same exttrail.

What type of Encryption is supported in Goldengate?

Oracle Goldengate provides 3 types of Encryption.


  • Data Encryption using Blow fish.
  • Password Encryption.
  • Network Encryption.


What are the different password encrytion options available with OGG?

You can encrypt a password in OGG using

Blowfish algorithm and
Advance Encryption Standard (AES) algorithm
What are the different encryption levels in AES?
You can encrypt the password/data using the AES in three different keys

a) 128 bit
b) 192 bit and
c) 256 bit

Is there a way to check the syntax of the commands in the parameter file without actually running the GoldenGate process

Yes, you can place the SHOWSYNTAX parameter in the parameter file and try starting. If there is any error you will see it.

How can you increase the maximum size of the read operation into the buffer that holds the results of the reads from the transaction log?

If you are using the Classical Extract you may use the TRANSLOGOPTION ASMBUFSIZE parameter to control the read size for ASM Databases.

What information can you expect when there us data in the discard file?

When data is discarded, the discard file can contain:
1. Discard row details
2. Database Errors
3. Trail file number

What command can be used to switch writing the trail data to a new trail file?

You can use the following command to write the trail data to a new trail file.
SEND EXTRACT ext_name, ROLLOVER

How can you determine if the parameters for a process was recently changed

When ever a process is started, the parameters in the .prm file for the process is written to the process REPORT. You can look at the older process reports to view the parameters which were used to start up the process. By comparing the older and the current reports you can identify the changes in the parameters.



Tuesday, 28 June 2016

Block Corruption and how to identify corrupted blocks


Block corruption is while the data is being written to the data blocks, if the write to the block fails abruptly, which means there is a partial write in the block, may be because of power disruption or I/O problem, leaving no time for header to be updated, or row data to be populated, oracle leaves the block corrupt.In case of block corruption you can normally use the database unless you try to read that particular block, against which it shoots up the block corruption error.Generally block corruption occurs if write fails on the block, when the transaction is being committed. You can find detail information about block corruption in alert.log file

block corruption can happens at

1)Physical Level corruption (which means media corrupt)

2)Logical Level corruption (which means soft corrupt)

Physical corruption can be caused by defected memory boards, controllers or broken sectors on a hard disk;


Logical corruption can among other reasons be caused by an attempt to recover through a NOLOGGING action.


To Differentiate between both the corruption levels (logical and physical)


Logical corruption is header – footer – that is one of the checks, yes (it is looking for fractured blocks and when it hits one, it’ll re-read it, that is why there is no need for “alter tablespace begin backup” with rman)

Physical corruption is “we cannot read the block from disk, something is physically preventing us from doing so”

The methods to detect Block corruptions are


1) DBVerify utility

2) Block checking parameters (DB_BLOCK_CHCEKSUM) – In Oracle 10g db_block_checksum value TYPICAL which should be TRUE and db_block_checking value FULL should be TRUE.
3) ANALYZE TABLE tablename VALIDATE STRUCTURE CASCADE SQL statement
4) RMAN BACKUP command with THE VALIDATE option.


SQL> DESC V$DATABASE_BLOCK_CORRUPTION;


SQL> SELECT OBJECT_NAME, BLOCK_ID, CORRUPT_TYPE, MARKED_CORRUPT,

CORRUPT_DESCRIPTION, REPAIR_DESCRIPTION
FROM REPAIR_TABLE WHERE BLOCK_ID=**;

1.DBVERIFY -


The primary tool for checking for corruption in an Oracle database is DBVERIFY. It can be used to perform a physical data structure integrity check on data files whether the database is online or offline. The big benefit of this is that DBVERIFY can be used to check backup data files without adding load to the database server. You invoke DBVERIFY from the operating system command line like this:

$ dbv file=data01.dbf logfile=verify.log blocksize=8192 feedback=100
In this example data01.dbf is the data file to check, and the tablespace this file belongs to has a block size of 8192 bytes. The feedback parameter tells DBVERIFY to draw a period on the screen after every 100 pages (blocks) of the file are verified.
In the log file you’ll see output like this:

DBVERIFY – Verification starting : FILE = data01.dbf


DBVERIFY – Verification complete


Total Pages Examined : 640

Total Pages Processed (Data) : 631
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing (Index): 0
Total Pages Processed (Other): 9
Total Pages Empty : 0
Total Pages Marked Corrupt : 0
Total Pages Influx : 0

The Total Pages Failing values show the number of blocks that failed either the data block or index block checking routine. The Total Pages Marked Corrupt figure shows the number of blocks for which the cache header is invalid, thereby making it impossible for DBVERIFY to identify the block type. And the Total Pages Influx is the number of blocks for which DBVERIFY could not get a consistent image. (This could happen if the database is open when DBVERIFY is run. DBVERIFY reads blocks multiple times to try to get a consistent image, but DBVERIFY cannot get a consistent image of pages that are in flux.)

If you want to verify only a portion of a data file, you can specify a starting and ending block when running DBVERIFY. If you want to verify the entire database, you can generate a short shell script to run DBVERIFY on every data file in the database. You can do this easily using SQL*Plus:

SQL> SPOOL dbv_on_all_files.sh

SQL> SELECT ‘dbv file=’ || file_name ||
2 ‘ logfile=file’ || ROWNUM ||
3 ‘.log blocksize=8192′
4 FROM dba_data_files;
SQL> SPOOL OFF

After running the shell script you can quickly scan all of the DBVERIFY log files with Unix commands like:


$ grep Failing file*.log

$ grep Corrupt file*.log
$ grep Influx file*.log
You can also use DBVERIFY to validate a single data or index segment. To do this you must be logged onto the database with SYSDBA privileges. During the verification the segment is locked; if the segment is an index then the parent table is also locked.
There are other ways to check for database corruption besides DBVERIFY. You can take a full database export, with the dump file optionally specified as a null device. This will read every row in every user table in the database, discovering any corrupted data blocks along the way. However, this technique does not access every index entry or the entire data dictionary.

If you want to check one table and all of its indexes, you can use the ANALYZE statement to read every row of the table, read every entry in each of the table’s indexes, and make sure the table and index data are consistent with each other:


SQL> ANALYZE TABLE emp VALIDATE STRUCTURE CASCADE;


This will lock the table, preventing DML on the table, unless you specify the ONLINE keyword. Online validation reduces the amount of validation performed to allow for concurrency.


There are several ways to check for corruption in an Oracle database, but the DBVERIFY tool is the most versatile. DBVERIFY does not limit concurrency or DML while it is running, and it can be run against a database backup. Just remember that if DBVERIFY detects corruption in your database and you are planning to recover the corrupt file from a backup, you should perform a DBVERIFY validation on the backup file before beginning the recovery. This will tell you if the corruption exists in the backup also.


DB_BLOCK_CHECKSUM (default is true in 9i, false in 8i) determines whether DBWn and the direct loader will calculate a checksum (a number calculated from all the bytes stored in the block) and store it in the cache header of every data block when writing it to disk. Checksums are verified when a block is read-only if this parameter is true and the last write of the block stored a checksum. In addition, Oracle gives every log block a checksum before writing it to the current log.


If this parameter is set to false, DBWn calculates checksums only for the SYSTEM tablespace, but not for user tablespaces.


Checksums allow Oracle to detect corruption caused by underlying disks, storage systems, or I/O systems. Turning on this feature typically causes only an additional 1% to 2% overhead. Therefore, Oracle Corporation recommends that you set DB_BLOCK_CHECKSUM to true.


DB_BLOCK_CHECKING (default value is false) controls whether Oracle performs block checking for data blocks. When this parameter is set to true, Oracle performs block checking for all data blocks. When it is set to false, Oracle does not perform block checking for blocks in the user tablespaces. However, block checking for the SYSTEM tablespace is always turned on.



Oracle checks a block by going through the data on the block, making sure it is self-consistent. Block checking can often prevent memory and data corruption. Block checking typically causes 1% to 10% overhead, depending on workload. The more updates or inserts in a workload, the more expensive it is to turn on block checking. You should set DB_BLOCK_CHECKING to true if the performance overhead is acceptable.





RMAN (BACKUP VALIDATE, RESTORE VALIDATE, VALIDATE)

Oracle Recovery Manager (RMAN) can validate the database using the BACKUP VALIDATE command.

RMAN> BACKUP VALIDATE DATABASE ARCHIVELOG ALL;

The process outputs the same information you would see during a backup, but no backup is created. Any block corruptions are visible in the V$DATABASE_BLOCK_CORRUPTION view, as well as in the RMAN output.

By default the command only checks for physical corruption. Add the CHECK LOGICAL clause to include checks for logical corruption.

RMAN> BACKUP VALIDATE CHECK LOGICAL DATABASE ARCHIVELOG ALL;

RMAN can validate the contents of backup files using the RESTORE VALIDATE command.

RMAN> RESTORE DATABASE VALIDATE;
RMAN> RESTORE ARCHIVELOG ALL VALIDATE;

In a similar way to the BACKUP VALIDATE command, the RESTORE VALIDATE command mimics the process of a restore, without actually performing the restore.

Prior to 11g, the straight VALIDATE command could only be used to validate backup related files. In Oracle 11g onward, the VALIDATE command can also validate datafiles, tablespaces or the whole database, so you can use it in place of the BACKUP VALIDATE command.

RMAN> VALIDATE DATAFILE 1;
RMAN> VALIDATE DATAFILE '/u01/app/oracle/oradata/ORCL/system01.dbf';

RMAN> VALIDATE CHECK LOGICAL DATAFILE 1;
RMAN> VALIDATE CHECK LOGICAL DATAFILE '/u01/app/oracle/oradata/ORCL/system01.dbf';

RMAN> VALIDATE TABLESPACE users;
RMAN> VALIDATE CHECK LOGICAL TABLESPACE users;

RMAN> VALIDATE DATABASE;
RMAN> VALIDATE CHECK LOGICAL DATABASE;


Any block corruptions are visible in the V$DATABASE_BLOCK_CORRUPTION view. You can identify the objects containing a corrupt block using a query like this.

COLUMN owner FORMAT A20
COLUMN segment_name FORMAT A30

SELECT DISTINCT owner, segment_name
FROM   v$database_block_corruption dbc
       JOIN dba_extents e ON dbc.file# = e.file_id AND dbc.block# BETWEEN e.block_id and e.block_id+e.blocks-1
ORDER BY 1,2;

Sunday, 26 June 2016

Managing an ASM Disk Group


  1. Adding disks to an ASM Disk group
  2. Dropping Disks From an ASM Disk group
  3. Undropping disks from an ASM Disk group
  4. Resizing disks assigned to an ASM Disk group
  5. Manually rebalance disks assigned to an ASM Disk group
  6. Manually Mount and Unmount an ASM Disk group
  7. Check the consistency of a disk group


1 Adding disks to an ASM Disk group

As databases grow you need to add disk space. The ALTER DISKGROUP command allows you to add disks to a given disk group to increase the amount of space available to a given disk group. Adding a disk to an existing disk group is can be done with the ALTER DISKGROUP command as seen in this example:

ALTER DISKGROUP cooked_dgroup1
add disk 'c:\oracle\asm_disk\_file_disk3'
name new_disk;
When you add a disk to a disk group, Oracle will start to rebalance the load on that disk group. Also, notice that in the example above that we did not assign the disk to a specific failgroup. As a result, each disk will be assigned to it's own failgroup when it's created. For example, when we added the disk above to the cooked_dgroup1 disk group, a new failgroup called cooked_dgroup1_0002 was created as seen in this output:

SQL> select disk_number, group_number, failgroup from v$asm_disk;

DISK_NUMBER GROUP_NUMBER failgroup
----------- ------------ ------------------------------
          1            0
          0            1 DISKCONTROL1
          1            1 DISKCONTROL2
          2            1 COOKED_DGROUP1_0002

We can add a disk to an existing failgroup by using the failgroup parameter as seen in this example:

ALTER DISKGROUP cooked_dgroup1
add failgroup DISKCONTROL1
disk 'c:\oracle\asm_disk\_file_disk4'
name new_disk;

2. Dropping Disks From an ASM Disk group

The ALTER DISKGROUP command allows you to remove disks from an ASM disk group using the drop disk parameter. ASM will first rebalance the data on the disks to be dropped, assuming enough space is available. If insufficient space is available to move the data from the disk to be dropped to another disk, then an error will be raised. You can use the force parameter to force ASM to drop the disk, but this can result in data loss. Here is an example of dropping a disk from a disk group:

ALTER DISKGROUP cooked_dgroup1
drop disk 'c:\oracle\asm_disk\_file_disk4'
Another option that the ALTER DISKGROUP command gives you is the option to drop all disks from a failgroup that are assigned to a disk group. Use the in failgroup clause to indicate the name of the failgroup as seen in this example:

ALTER DISKGROUP cooked_dgroup1
drop disks in failgroup diskcontrol1;

When you drop a disk from a disk group, the operation is an asynchronous operation. Therefore when the SQL prompt returns this does not indicate that the operation has completed. To determine if the operation has completed you will need to review the V$ASM_DISK view. When the disk drop is complete the column HEADER_STATUS will take on the value of FORMER as seen in this example:

SQL> select disk_number, header_status from v$asm_disk;

DISK_NUMBER HEADER_STATU
----------- ------------
          0 FORMER
          1 FORMER
          1 MEMBER
          2 MEMBER
If the drop is not complete (the v$asm_disk column STATE will read dropping), you can check the V$ASM_OPERATION view and it will give you an idea of how long the operation is expected to take before it is complete. Here is an example query that will provide you with this information:

SQL>select group_number, operation, state, power, est_minutes
from v$asm_operation;

3. Undropping disks from an ASM Disk group

If you accidentally drop a disk from a disk group and you realize your mistake only after the drop operation has completed, you can recover. If you have accidentally dropped a disk, simply use the ALTER DISKGROUP command using the undrop disks parameter as seen here:

ALTER DISKGROUP sp_dgroup2 undrop disks;

This will cancel the pending drop of disks from that disk group. You can not use this command to restore disks dropped if you dropped the entire disk group with the DROP DISKGROUP command.


4) Resizing disks assigned to an ASM Disk group

Sometimes when you need more space, all a disk administrator needs to do is add that additional space to the disk devices that are being presented for ASM to use. If this is the case, you will want to indicate to ASM that it needs to update it's metadata to represent the correct size of the disks it's using so you get the benefit of the additional space. This is accomplished using the ALTER DISKGROUP command using the resize all command as seen in this example:

ALTER DISKGROUP cooked_dgroup1 resize all;

This command will query the operating system for the current size of all of the disk devices attached to the disk group and will automatically resize all disks in that disk group accordingly. You can indicate that a specific disk needs to be resized by including the disk name (from the NAME column in V$ASM_DISK) as seen in this example:

ALTER DISKGROUP cooked_dgroup1 resize disk FILE_DISKB1;

You can also resize an entire failgroup at one time:

ALTER DISKGROUP cooked_dgroup1 resize disks in failgroup DISKCONTROL2;

5) Manually rebalance disks assigned to an ASM Disk group:

Manually rebalancing disks within ASM is typically not required since ASM will perform this operation automatically. However, in cases where you might wish to have some more granular control over the disk rebalance process, you can use the ALTER DISKGROUP command along with the rebalance parameter to manually rebalance ASM disks.

When we discuss rebalancing disks in ASM we often discuss the power that is assigned to that rebalance operation. The setting of power with regards to a rebalance operation really defines the urgency of that operation with respect to other operations occurring on the system (e.g. other databases or applications). When a rebalance operation occurs with a low power (e.g. 1, the typical default) then that operation is not given a high priority on the system As a result the rebalance operation can take some time. When a higher power setting is used (e.g. 11, the maximum) the ASM is given higher priority. This can have an impact on other operations on the system. If you use a power of 0, this will have the effect of suspending the rebalance operation. You can set the default power limit for the ASM instance by changing the asm_power_limit parameter.

Here is an example of starting a manual rebalance of a disk group:

ALTER DISKGROUP cooked_dgroup1 rebalance power 5 wait;
In this example, notice that we used the wait parameter. This makes this rebalance operation synchronous for our session. Thus when the SQL prompt returns we know the rebalance operation has completed. The default is nowait which will cause the operation to be synchronous in nature. You can check the status of the rebalance operation using the V$ASM_OPERATION view during asynchronous rebalance operations. If you use the wait parameter and you wish to convert the operation to an asynchronous operation, you can simply hit control-c on most platforms and an error will be returned along with the SQL prompt. The rebalance operation will continue, however.

If you do not use the power parameter during a manual rebalance operation, or if an implicit rebalance operation is occurring (because you are dropping a disk for example) you can effect the power of that rebalance operation by dynamically changing the ASM_POWER_LIMIT parameter to a higher value with the alter system command.

Finally, you can also use the rebalance parameter along with the power parameter when adding, dropping or resizing disks within a disk group as seen in this example:

ALTER DISKGROUP cooked_dgroup1 resize all rebalance power 5;


6. Manually Mount and Unmount an ASM Disk group

If an ASM disk group is not assigned to the ASM_DISKGROUPS parameter, of if the disk group is unmounted for some other reason, you will need to mount the ASM disk group. You can use the ALTER DISKGROUP command with the mount clause to mount the disk group.

Additionally if you need to dismount an ASM disk group, you can use the ALTER DISKGROUP command to unmount that disk group. Here are some examples:

ALTER DISKGROUP sp_dgroup2 dismount;
ALTER DISKGROUP sp_dgroup2 mount;

Note that when you dismount a disk group, that disk group will be automatically removed from the ASM_DISKGROUPS parameter if you are using a SPFILE. This means that when ASM is restarted, that diskgroup will not be remounted. If you are using a regular text parameter file, you will need to remove the disk group manually (assuming it's in the parameter to begin with) or ASM will try to remount the disk group when the system is restarted.

7. Check the consistency of a disk group

On occasion you might wonder if there is some problem with an ASM disk group, and you will want to check the consistency of the ASM disk group metadata. This need might arise because of an error that occurs when the ASM instance is started, or as the result of an Oracle Database error that might be a result of some ASM corruption. To perform this check simply use the ALTER DISKGROUP command with the check all parameter as seen in this example:

ALTER DISKGROUP sp_dgroup2 check all;

That you executed a ALTER DISKGROUP check all command and the results are written to the alert log of the database. ASM will attempt to correct any errors that are detected. Here is an example of the ASM instance alert log entry after an ALTER DISKGROUP check all run that was successful:

Sun Jan 28 08:08:09 2007

SQL> ALTER DISKGROUP cooked_dgroup1 check all

Sun Jan 28 08:08:09 2007
NOTE: starting check of diskgroup COOKED_DGROUP1
SUCCESS: check of diskgroup COOKED_DGROUP1 found no errors

Saturday, 25 June 2016

Data Protection in Oracle

In Oracle, the data files are written asynchronously at different intervals unrelated to the data changes and commits. In other words, when you commit a change, the data files may not have that changed data. In fact the change occurs in the memory only (called a buffer cache) and may not exist in the data files for hours afterwards. Similarly when you make a change but not commit, the data can still be persisted to the data files. Let me repeat that: the data files are updated with the changed data even if you didn’t commit yet. This is the reason why if you have a storage or operating system level replication solution—even synchronous—replicating the data files, the remote site may or may not have the data, even hours after the change. 

How does Oracle protect the data that was changed and committed but in the memory, if the data files do not have them? It captures the pre- and post-change data and packages them into something called redo blocks. Remember, these have nothing to do with data blocks. These are merely changes created by activities performed on the database. This redo data—also known as redo vector—is written to a special area in memory called log buffer. When you commit, the relevant redo blocks from the log buffer are written to special files in the database called redo log files, also known as online redo log files. The commit waits until this writing—known as redo flushing—has ended. You can check the Oracle sessions waiting for this flushing to complete by looking at the event “log file sync”. Since the changes—most importantly—committed changes are recorded in the redo log files, Oracle does not need to rely on the memory alone to know which changes are committed and which are not. In case of a failure, Oracle examines the redo logs to find these changes and updates the data files accordingly. Redo logs are very small compared to the data files.

By the way, redo flushing also occurs at other times: every three seconds, every filled 1 MB of log buffer, when a third of the log buffer is full and some other events; but those additional flushes merely make sure the redo log file is up to date, even if there is no commit.

As you can see, this redo log file becomes the most important thing in the data recovery process. When a disaster occurs, you may have copies of the data files at the remote site (thanks to the replication); but as you learned in the previous section the copies are not useful yet since they may not have all the committed changes and may even have uncommitted changes. In other words, this copy is not considered “consistent” by Oracle. After a disaster, Oracle needs to check the redo log files and apply the changes to make the data files consistent. This is known as a “media recovery”. You have to initiate the media recovery process. In case of a synchronous replication at storage or Operating System level, the redo logs are perfectly in sync with the primary site and Oracle has no trouble getting to the last committed transaction just before the failure. There will be no data lost as a result of the recovery process. In case of Data Guard with maximum protection, this is not required since the changes are updated at the remote site anyway. But what about your cheaper, commodity network with asynchronous replication? The redo logs at the remote site will not be up to date with the primary site’s redo logs. When you perform media recovery, you can’t get to the very last change before the failure, simply because you may not have them. Therefore you can perform a recovery only up to the last available information in the redo at the remote site. This is known as “incomplete” media recovery, distinguished from the earlier described “complete” media recovery. To complete the media recovery, you need the information in the redo log files at the primary site; but remember, the primary site is destroyed. This information is gone. You end up with a data loss. Perhaps even worse, you won’t even know exactly how much you lost, since you don’t have the access to the primary redo log files.

Now, consider this situation carefully. All you need is the last redo log file from the primary database to complete the recovery. Unfortunately the file is not available because it’s destroyed or otherwise inaccessible since the site itself is inaccessible. This tiny little file is the only thing that stays between you and complete recovery. What if you somehow magically had access to this file, even though the rest of the data center is gone? You would have been able to complete the recovery with no data loss and looked like a hero and that too without a synchronous replication solution with super expensive network.

Friday, 24 June 2016

Rebalancing act ASM_POWER_LIMIT




Property                Description

Parameter type   :     Integer
Default value   :    1
Modifiable           :    ALTER SESSION, ALTER SYSTEM
Range of values   :    0 to 11 
Oracle RAC   :    Multiple instances can have different values.


Beginning with Oracle Database 11g Release 2 (11.2.0.2), if the COMPATIBLE.ASM disk group attribute is set to 11.2.0.2 or higher, then the range of values is 0 to 1024.


Note:


This parameter may only be specified in an Automatic Storage Management instance.

ASM_POWER_LIMIT specifies the maximum power on an Automatic Storage Management instance for disk rebalancing. The higher the limit, the faster rebalancing will complete. Lower values will take longer, but consume fewer processing and I/O resources.

If the POWER clause of a rebalance operation is not specified, then the default power will be the value of ASM_POWER_LIMIT.



Re-balancing act-



ASM ensures that file extents are evenly distributed across all disks in a disk group. This is true for the initial file creation and for file resize operations. That means we should always have a balanced space distribution across all disks in a disk group.

Rebalance operation

Disk group rebalance is triggered automatically on ADD, DROP and RESIZE disk operations and on moving a file between hot and cold regions. Running rebalance by explicitly issuing ALTER DISKGROUP ... REBALANCE is called a manual rebalance. We might want to do that to change the rebalance power for example. We can also run the rebalance manually if a disk group becomes unbalanced for any reason.

The POWER clause of the ALTER DISKGROUP ... REBALANCE statement specifies the degree of parallelism of the rebalance operation. It can be set to a minimum value of 0 which halts the current rebalance until the statement is either implicitly or explicitly re-run. A higher values may reduce the total time it takes to complete the rebalance operation.

The ALTER DISKGROUP ... REBALANCE command by default returns immediately so that we can run other commands while the rebalance operation takes place in the background. To check the progress of the rebalance operations we can query V$ASM_OPERATION view.

Three phase power

The rebalance operation has three distinct phases. First, ASM has to come up with the rebalance plan. That will depend on the rebalance reason, disk group size, number of files in the disk group, whether or not partnership has to modified, etc. In any case this shouldn't take more than a couple of minutes.

The second phase is the moving or relocating the extents among the disks in the disk group. This is where the bulk of the time will be spent. As this phase is progressing, ASM will keep track of the number of extents moved, and the actual I/O performance. Based on that it will be calculating the estimated time to completion (GV$ASM_OPERATION.EST_MINUTES). Keep in mind that this is an estimate and that the actual time may change depending on the overall (mostly disk related) load. If the reason for the rebalance was a failed disk(s) in a redundant disk group, at the end of this phase the data mirroring is fully re-established.

The third phase is disk(s) compacting (ASM version 11.1.0.7 and later). The idea of the compacting phase is to move the data as close to the outer tracks of the disks as possible. Note that at this stage or the rebalance, the EST_MINUTES will keep showing 0. This is a 'feature' that will hopefully be addressed in the future. The time to complete this phase will again depend on the number of disks, reason for rebalance, etc. Overall time should be a fraction of the second phase.

Some notes about rebalance operations

Rebalance is per file operation.
An ongoing rebalance is restarted if the storage configuration changes either when we alter the configuration, or if the configuration changes due to a failure or an outage. If the new rebalance fails because of a user error a manual rebalance may be required.
There can be one rebalance operation per disk group per ASM instance in a cluster.
Rebalancing continues across a failure of the ASM instance performing the rebalance.
The REBALANCE clause (with its associated POWER and WAIT/NOWAIT keywords) can also be used in ALTER DISKGROUP commands for ADD, DROP or RESIZE disks.
Tuning rebalance operations

If the POWER clause is not specified in an ALTER DISKGROUP statement, or when rebalance is implicitly run by ADD/DROP/RESIZE disk, then the rebalance power defaults to the value of the ASM_POWER_LIMIT initialization parameter. We can adjust the value of this parameter dynamically. Higher power limit should result in a shorter time to complete the rebalance, but this is by no means linear and it will depends on the (storage system) load, available throughput and underlying disk response times.

The power can be changed for a rebalance that is in progress. We just need to issue another ALTER DISKGROUP ... REBALANCE command with different value for POWER. This interrupts the current rebalance and restarts it with modified POWER.

Relevant initialization parameters and disk group attributes

ASM_POWER_LIMIT

The ASM_POWER_LIMIT initialization parameter specifies the default power for disk rebalancing in a disk group. The range of values is 0 to 11 in versions prior to 11.2.0.2. Since version 11.2.0.2 the range of values is 0 to 1024, but that still depends on the disk group compatibility (see the notes below). The default value is 1. A value of 0 disables rebalancing.
For disk groups with COMPATIBLE.ASM set to 11.2.0.2 or greater, the operational range of values is 0 to 1024 for the rebalance power.
For disk groups that have COMPATIBLE.ASM set to less than 11.2.0.2, the operational range of values is 0 to 11 inclusive.
Specifying 0 for the POWER in the ALTER DISKGROUP REBALANCE command will stop the current rebalance operation (unless you hit bug 7257618).
_DISABLE_REBALANCE_COMPACT

Setting initialization parameter _DISABLE_REBALANCE_COMPACT=TRUE will disable the compacting phase of the disk group rebalance - for all disk groups.

_REBALANCE_COMPACT

This is a hidden disk group attribute. Setting _REBALANCE_COMPACT=FALSE will disable the compacting phase of the disk group rebalance - for that disk group only.

_ASM_IMBALANCE_TOLERANCE

This initialization parameter controls the percentage of imbalance between disks. Default value is 3%.

Background Processes involved during disk re - balancing -


ARBn - ASM Rebalance Process. Rebalances data extents within an ASM disk group. Possible processes are ARB0-ARB9 and ARBA.
RBAL  - ASM Rebalance Master Process. Coordinates rebalance activity. In an ASM instance, it coordinates rebalance activity for disk groups. In a database instances, it manages ASM disk groups.

Xnnn  - Exadata only - ASM Disk Expel Slave Process. Performs ASM post-rebalance activities. This process expels dropped disks at the end of an ASM rebalance.

Add Data file

 ************************************************** Database status Tablespace Usage ASM Disk Group Usage(RAC) Backup Filesystem Usage Alert...