Skip to main content

Data Consistency on Multiple Systems

When mirroring or other mechanisms are used to maintain a copy of data on another system, you may want to check the consistency of that data between the two systems. DataCheck provides this checking and includes provisions to recheck transient discrepancies.

DataCheck Overview

DataCheck provides a mechanism to compare the state of data on two systems — the DataCheck source and the DataCheck destination — to determine whether or not they match. All configuration, operational controls and results of the check are provided on the destination system; the source system is essentially passive.

On the instance of InterSystems IRIS® that is to act as the DataCheck destination, you must create a DataCheck destination configuration. You can create multiple destination configurations on the same instance, which you can configure to check data against multiple source systems (or configure them to check different data against a single source). If you are using DataCheck to check the consistency of a mirror, see DataCheck for Mirror Configurations for more details.

The following subsections describe DataCheck topics in more detail:

DataCheck Queries

The destination system submits work units called DataCheck “queries” to the source system. Each query specifies a database, an initial global reference, a number of nodes, and a target global reference. Both systems calculate an answer by traversing the specified number of global nodes starting with the initial global reference, and hashing the global keys and values. If the answers match, the destination system records the results and resubmits the query with a larger number of nodes and the initial global reference advanced; if they don’t match, the query is resubmitted with a smaller number of nodes until the discrepancy is isolated down to the configured minimum query size.

You can display information about the queries submitted by the destination system using the View Queries option of the View Details submenu of the ^DATACHECK routine, including the globals that remain to be processed (or global ranges if subscript include/exclude ranges are used), and the active queries currently being worked on by DataCheck.

DataCheck Jobs

The answer to each query is calculated by DataCheck worker jobs running on both the source system and the destination system. The number of worker jobs is determined by the dynamically tunable performance settings of the destination system; for more information, see “Performance Considerations” in this chapter.

In addition to the worker jobs, there are other jobs on each system. The following additional jobs run on the destination system:

  • Manager job — Loads and dispatches queries, compares query answers, and manages the progression through the workflow phases; this job is connected to the source system Manager job.

  • Receiver job — Receives answers from the source system.

The following additional jobs run on the source system:

  • Manager job — Receives requests from the destination system Manager job and sends them to worker jobs.

  • Sender job — Receives query answers from the worker jobs and sends them to the destination system Receiver job; this job is connected to the destination system Receiver job.

DataCheck Results

The results of the check lists global subscript ranges with one of the following states:

  • Unknown — DataCheck has not yet checked this range.

  • Matched — DataCheck has found that this range matches.

  • Unmatched — DataCheck has found a discrepancy in this range.

  • Collation Discrepancy — Global was found to have differing collation between the source system and the destination system.

  • Excluded — This range is excluded from checking.

You can view the results from the current check and the final results from the last check on the destination system; for more information, see the SYS.DataCheck.RangeListOpens in a new tab class. For all subscript ranges within DataCheck, the beginning of a range is inclusive and the end exclusive. See Specifying Globals and Subscript Ranges to Check in this chapter for information about subscript ranges.

The following provides a sample check result:

c:\InterSystems\iris\mgr\mirror2 ^XYZ       Unmatched
        ^XYZ --Matched--> ^XYZ(3001,4)
        ^XYZ(3001,4) --Unmatched--> ^XYZ(5000)
        ^XYZ(5000) --Matched--> [end]

This result indicates that the nodes in the range starting at ^XYZ up to but not including ^XYZ(3001,4) are matched, while there is at least one discrepancy in the range of nodes from ^XYZ(3001,4) up to but not including ^XYZ(5000). The nodes in the range from ^XYZ(5000) to the end are matched.

The minimum number and frequency of discrepancies in the unmatched range depends on the minimum query size (see Performance Considerations). For example, if the minimum query size is set to the default of 32 in this case, there is at least one discrepancy every 32 nodes from ^XYZ(3001,4) until ^XYZ(5000); if there were a sequence within this range of more than 32 nodes without a discrepancy, it would appear in the results as a separate matched range.

DataCheck Workflow

During the check, data may be changing and transient discrepancies may be recorded. Rechecking may be required to eliminate these transient discrepancies. The destination system has a workflow that defines a strategy for how to check the globals.

A typical workflow begins with the “Check” phase as phase #1. (Phase #1 should always be defined as the logical starting point of the check cycle, since it is used by the workflow timeout and the Start dialog of the ^DATACHECK routine to indicate a "reset" from beginning, as described in the next section.) At the beginning of this phase, the current set of results are saved as the last completed results and a new set of active results is established. DataCheck makes an initial pass through all globals specified for inclusion in the check.

Following the Check phase, the “Recheck Discrepancies” phase is typically specified with the desired number of iterations. Each iteration rechecks all unmatched ranges in an effort to eliminate transient discrepancies.

As each phase of the workflow is completed, DataCheck moves to the next phase. The workflow is implicitly restarted from phase #1 after the last phase is complete. The “Stop” phase shuts down all DataCheck jobs and the “Idle” phase causes DataCheck to wait for you to manually specify the next phase.

Starting/Stopping/Reconnecting DataCheck

You can stop and start DataCheck at any time; when you start DataCheck, it resumes the workflow from where it left off. In addition, you can specify a different workflow phase to follow the current phase and/or abort the current phase at any time.

If, during a check, DataCheck is stopped, becomes disconnected, or pauses due to mirroring, the routine reports why the system was stopped, what phase it stopped in, and what it will do when it starts (for example, resume processing, move to the next phase, change phase due to user request or restart at phase #1 due to workflow timeout). If, upon starting, DataCheck is going to resume processing the current phase or make a transition to any phase other than phase #1, you are offered the option of restarting at phase #1, as in the following example:

Option? 4

Configuration Name: test

State:  Stopped due to Stop Requested
Current Phase: 1 - Check
Workflow Phases:
  1 - Check
  2 - RecheckDiscrepancies, Iterations=10
  3 - Stop
  (restart)
Workflow Timeout: 432000
New Phase Requested: 2
Abort Current Phase Requested

DataCheck is set to abort the current phase and transition to phase #2.

You may enter RESTART to restart at phase #1

Start Datacheck configuration 'test'? (yes/no/restart)

In cases in which DataCheck becomes disconnected and reconnects only after an extended period, it may be more desirable to restart from phase #1 of the workflow instead. For example, if the systems were disconnected for several weeks in the middle of a check and then the check is resumed, the results are of questionable value, having been collected in part from two weeks prior and in part from the present time. The workflow has a Timeout property that specifies the time, in seconds, within which DataCheck may resume a partially completed workflow phase. If the timeout is exceeded, DataCheck restarts from phase #1 the next time it reaches the running state. The default value is five days (432000 seconds), based on the assumption that a large amount of data is checked by this DataCheck configuration and the check may take hours or days to complete normally; a smaller value may be preferable for configurations that complete a check in a shorter amount of time. A value of zero means no timeout.

Note:

As noted, you should define phase #1 to be the logical starting point of the check cycle, since it is used by the workflow timeout and the Start dialog of the ^DATACHECK routine to indicate a "reset" from beginning, as shown in the previous example.

DataCheck for Mirror Configurations

Upon creating a DataCheck destination configuration, if the system is a member of a mirror (see the “Mirroring” chapter of the High Availability Guide), you are given the option to configure DataCheck to check the mirrored data. If you choose this option, you need only select the mirror member to act as the DataCheck source, and the rest of the configuration is automatic.

When a check begins, all mirrored databases are included in the check; you do not have to map databases individually. You can specify which globals are checked or exclude entire databases, as described in Specifying Globals and Subscript Ranges to Check. A mirror-based DataCheck configuration cannot be used to check nonmirrored databases, but a separate nonmirrored DataCheck configuration can be created for such purposes.

This section discusses the following topics:

Planning DataCheck within the Mirror

Each DataCheck destination configuration connects to one source mirror member. Although the source member should not be changed, additional DataCheck configurations can be created to check against more than one source mirror member (or to check different sets of data from the same source).

This section includes the following member-specific subsections:

Checking Data Between Failover Members

When checking between failover mirror members, the check is typically run with the backup failover member configured as the DataCheck destination for the following reasons:

  • The DataCheck destination uses more resources than the source in order to maintain the results of the check and other state information (which is itself journaled).

  • If the backup failover member is the DataCheck destination, the results are available for review on the backup if the primary failover member goes down.

    Note:

    In most configurations, it is assumed that the failover has already occurred and any review of the results probably happens after the failover decision point.

Whenever DataCheck loses its connection to the source, it retries the connection, waiting indefinitely for the source machine to become available again. If a mirror-based DataCheck is started on the destination when it was not the primary failover member, and that member becomes the primary, DataCheck stops rather than automatically try to reconnect. This prevents DataCheck from unintentionally running on the primary. For more information about reconnecting, see Starting/Stopping/Reconnecting DataCheck in this chapter.

Checking Data on Async Members

When mirror-based DataCheck is checking between a failover member and an async member, the async member is typically the destination. This is for the same reasons mentioned above (see Checking Data Between Failover Members) in regards to checking between failover members, but primarily because the results of the check should be stored on the async member during disaster recovery.

When there are two failover members, it is often desirable to create one DataCheck destination configuration on an async member for each of the two failover members as sources. The ^DATACHECK routine offers to create both for you, and offers settings for how they behave with respect to which of the two is the primary failover member.

Each DataCheck configuration has a setting to govern how it behaves based on the source failover member’s status as the primary member. The settings are:

  • No restriction

    Checking both without restriction (the default) is desirable because it uses the async member as an agent to check both failover members without needing to run DataCheck between the failover members.

  • Check primary only (pause until DataCheck source is primary)

    Checking against the primary only is desirable because the primary is the true source of the data for this async member.

  • Do not check primary (pause when DataCheck source is primary)

    Checking against the backup is desirable because it does not consume resources on the production primary system.

Note:

For information about reconnecting after a pause, see Starting/Stopping/Reconnecting DataCheck in this chapter.

For DataCheck configurations that are run manually (on demand) by a system administrator, these settings may not be of particular importance; they are more important for DataCheck configurations that are run continuously (or nearly so).

Any member may check another member without any particular relation. For example, if an async member is being used to check both failover members, it could also be used as the source of a check for other async members, thus avoiding the need to have any other async members check against the failover members.

Selecting Globals to Check

All mirrored databases that exist when DataCheck is run are checked automatically; for information about controlling which globals and databases are checked, see Specifying Globals and Subscript Ranges to Check in this chapter.

DataCheck Setup Procedure

You can set up DataCheck destination systems with the ^DATACHECK routine and enable DataCheck source systems through the Management Portal. To set up a new DataCheck system, do the following:

  1. Create new destination system.

  2. Set up/edit destination system configurations, as follows:

    1. For non-mirror-based configurations, specify the hostname/IP address, superserver port, and optional TLS configuration for the TCP connection to the source system.

      For mirror-based configurations, specify the mirror member you want to check.

    2. For non-mirror-based configurations, specify the set of databases to be checked and their corresponding paths on the source system.

      For mirror-based configurations, all mirrored databases are included.

    3. Optionally, specify global selection masks and subscript ranges for fine-grained control over which databases, globals, and global ranges to include or exclude. For more information, see Specifying Globals and Subscript Ranges to Check in this chapter.

    4. Optionally, adjust the dynamically tunable settings to control the performance and system resource consumption for the check. For more information, see Performance Considerations in this chapter.

    5. Optionally, modify the workflow specifying the strategy for the check. For more informations, see “DataCheck Workflow” in this chapter.

  3. Enable the %Service_DataCheck service on the source system. For more information, see “Enabling the DataCheck Service” in this chapter.

  4. Start the destination system, which controls the checking.

  5. Monitor the status of the check, as follows:

    • On the source system, view the status and log file.

    • On the destination system, view the status and log file, as well lists of queries and results.

Enabling the DataCheck Service

Use the Management Portal from the InterSystems IRIS instance running on the source system to enable the data checking service and, optionally, restrict connections:

  1. Navigate to the Services page (System Administration > Security > Services) of the Management Portal.

  2. Click %Service_DataCheck in the list of service names to edit the data checking service properties.

  3. Select the Service enabled check box. Before clicking Save, you may want to first restrict which IP addresses can connect to this database source. If so, perform the next step, and then click Save.

    Note:

    When configured to check a mirror, DataCheck uses TLS if the mirror is set to use TLS (for more information, see DataCheck for Mirror Configurations in this chapter). The DataCheck service, however, does not automatically restrict access only to mirror members. If you wish to restrict DataCheck connections from other systems, you must configure the Allowed Incoming Connections for the %Service_DataCheck service.

  4. Optionally, to restrict access to the service, in the Allowed Incoming Connections box (which displays previously entered server addresses), click Add to add an IP Address. Repeat this step until you have entered all permissible addresses.

    You may delete any of these addresses individually by clicking Delete in the appropriate row, or click Delete All to remove all addresses, therefore allowing connections from any address.

Specifying Globals and Subscript Ranges to Check

DataCheck lets you specify global names and subscript ranges to include in or exclude from checking using the options detailed in the following.

Note:

Only literal values are accepted as global names and subscripts when specifying global and subscript ranges.

  • Check All Globals in All [Mapped / Mirrored] Databases — Checks all globals in all mapped databases in non-mirror-based configurations; in mirror-based configurations, checks all globals in all mirrored databases.

  • Include/Exclude Some Globals/Databases — Checks globals in the selected database based on the specified mask(s). Subscripts are not allowed.

    You can add/edit a mask or comma-separated list of masks, as follows:

    • * — Checks all globals (default).

    • * as the last character — Checks all globals starting with the preceding character(s).

    • ' before a mask — Excludes globals from being checked.

    For example:

    • ABC* — all global names starting with ABC

    • A:D — all global names from A to D

    • A:D,Y* — all global names from A to D, and starting with Y

    • *,'C*,'D* — all globals except those starting with C or D

    • '* — exclude all globals

    In addition to defining a global selection mask for specific databases, you can explicitly set a “default global selection mask” that is used for databases for which no global selection mask has been defined. Initially, the default mask to set to *.

    Note:

    For mirror-based DataCheck, newly added mirrored databases are included in the next check. Therefore, if you do not want newly added mirrored databases to be checked automatically, set the default mask to '*.

    For example, to specify a default mask for all databases for which no mask is defined (*,'^DontCheckMe) as well as specify a global selection mask (A:D) specifically for the USER and USER2 databases, do the following from the Edit Configuration submenu of the ^DATACHECK routine (see ^DATACHECK Routine, later in this chapter):

    1) Import Settings from a Shadow
    2) Connection Settings
    3) Database Mappings
    4) Globals to Check
    5) Performance Settings
    6) Manage Workflow
     
    Option? 4
     
    1) Check All Globals in All Mapped Databases
    2) Include/Exclude Some Globals/Databases
    3) Include/Exclude Some Globals/Databases and Subscript Ranges
     
    Option? 1 => 2
    Save changes? Yes =>
     
    1) Options for selecting globals to check
    2) Set default include/exclude mask for databases with no mask defined
    3) Add or remove include/exclude mask for databases
    4) View include/exclude masks
     
    Option? 2
    Enter a mask string, * to include all, '* to exclude all, ? for help
    Mask: * => *,'^DontCheckMe
    Save changes? Yes =>
     
    1) Options for selecting globals to check
    2) Set default include/exclude mask for databases with no mask defined
    3) Add or remove include/exclude mask for databases
    4) View include/exclude masks
     
    Option? 3
     
    1) C:\InterSystems\IRIS\mgr\docbook\ [no mask defined, use default]
    2) C:\InterSystems\IRIS\mgr\user\ [no mask defined, use default]
    3) C:\InterSystems\IRIS\mgr\user2\ [no mask defined, use default]
     
    Database (multiple selections allowed): 2,3
    Enter a mask string, * to include all, '* to exclude all, ? for help
                         ! to delete this mask and revert to default
    Mask: A:D
     
    1) C:\InterSystems\IRIS\mgr\docbook\ [no mask defined, use default]
    2) C:\InterSystems\IRIS\mgr\user\ [A:D]
    3) C:\InterSystems\IRIS\mgr\user2\ [A:D]
     
    Database (multiple selections allowed):
    Save changes? Yes =>
     
    1) Options for selecting globals to check
    2) Set default include/exclude mask for databases with no mask defined
    3) Add or remove include/exclude mask for databases
    4) View include/exclude masks
     
    Option?
    
  • Include/Exclude Some Globals/Databases and Subscript Ranges — In addition to letting you perform the same tasks as the Include/Exclude Some Globals/Databases option, this option lets you identify subscript ranges in specific globals; global subscript ranges marked for inclusion are included whether or not the global is included in the global selection mask. For all subscript ranges within DataCheck, the beginning of a range is inclusive and the end exclusive.

    Note:

    DataCheck may mark data in an excluded range as matched if this is determined in the course of its operation. Discrepancies in excluded ranges, however, are never marked.

    For example, continuing with the preceding example, in which you specified a global selection mask (A:D) for the USER2 database; you can include a subscript range in the ^NAME global by responding to the prompts, as follows:

    1) Options for selecting globals to check
    2) Set default include/exclude mask for databases with no mask defined
    3) Add or remove include/exclude mask for databases
    4) View include/exclude masks
     
    Option? 1
     
    1) Check All Globals in All Mapped Databases
    2) Include/Exclude Some Globals/Databases
    3) Include/Exclude Some Globals/Databases and Subscript Ranges
     
    Option? 2 => 3
    Save changes? Yes =>
     
    1) Options for selecting globals to check
    2) Set default include/exclude mask for databases with no mask defined
    3) Add or remove include/exclude mask for databases
    4) View include/exclude masks
    5) Add/Edit Subscript Ranges for a Global
    6) Delete All Subscript Ranges for a Global
    7) Delete All Subscript Ranges
    8) View Defined Subscript Ranges
     
    Option? 5
     
    1) C:\InterSystems\IRIS\mgr\docbook\
    2) C:\InterSystems\IRIS\mgr\user\
    3) C:\InterSystems\IRIS\mgr\user2\
     
    Database: 3 C:\InterSystems\IRIS\mgr\user2\
    Global Name: ^NAME
    There are no subscript ranges defined for this global.
    You may start by including all or excluding all subscripts.
    Answer YES to include, NO to exclude: no
     
    C:\InterSystems\IRIS\mgr\user2\        ^NAME
            ^NAME --Excluded--> [end]
     
    From (inclusive):  ?
     
      Enter a global reference with or without subscripts or null for end.
      The leading ^ may be omitted.  For subscripted references the entire
      global name may be omitted and simply begin with open parentheses
     
    From (inclusive):  (10)
    To (exclusive):  (20)
    Answer YES to include, NO to exclude: yes
     
    C:\InterSystems\IRIS\mgr\user2\        ^NAME
            ^NAME --Excluded--> ^NAME(10)
            ^NAME(10) --Included--> ^NAME(20)
            ^NAME(20) --Excluded--> [end]
     
    From (inclusive):
    Continue editing subscript ranges for this global? Yes => no
     
    C:\InterSystems\IRIS\mgr\user2\        ^NAME
            ^NAME --Excluded--> ^NAME(10)
            ^NAME(10) --Included--> ^NAME(20)
            ^NAME(20) --Excluded--> [end]
     
    Save changes? Yes =>
     
    1) Options for selecting globals to check
    2) Set default include/exclude mask for databases with no mask defined
    3) Add or remove include/exclude mask for databases
    4) View include/exclude masks
    5) Add/Edit Subscript Ranges for a Global
    6) Delete All Subscript Ranges for a Global
    7) Delete All Subscript Ranges
    8) View Defined Subscript Ranges
     
    Option? 
    

    You can view the mask information as follows:

    Option? 4
    The default include/exclude mask is:
        *,'^DontCheckMe
     
    The following databases are using non-default global selection criteria
     
      C:\InterSystems\IRIS\mgr\user\
        A:D
      C:\InterSystems\IRIS\mgr\user2\
        * Has additional global subscript ranges to include/exclude that apply
          regardless of whether those globals are included in this mask.
        A:D
     
    1) Options for selecting globals to check
    2) Set default include/exclude mask for databases with no mask defined
    3) Add or remove include/exclude mask for databases
    4) View include/exclude masks
    5) Add/Edit Subscript Ranges for a Global
    6) Delete All Subscript Ranges for a Global
    7) Delete All Subscript Ranges
    8) View Defined Subscript Ranges
     
    Option? 
    

    Since the mask information includes a note about subscript ranges, you can display that information, as follows:

    Option? 8
    Device:
    Right margin: 80 =>
     
    DataCheck Destination System: GLOBTEST
    Global Selection Subscript Ranges
     
    C:\InterSystems\IRIS\mgr\user2\        ^NAME
            ^NAME --Excluded--> ^NAME(10)
            ^NAME(10) --Included--> ^NAME(20)
            ^NAME(20) --Excluded--> [end]
     
    1) Options for selecting globals to check
    2) Set default include/exclude mask for databases with no mask defined
    3) Add or remove include/exclude mask for databases
    4) View include/exclude masks
    5) Add/Edit Subscript Ranges for a Global
    6) Delete All Subscript Ranges for a Global
    7) Delete All Subscript Ranges
    8) View Defined Subscript Ranges
     
    Option?
    

^DATACHECK Routine

You can use the ^DATACHECK routine (in the %SYS namespace) to configure and manage the data checking. To obtain Help at any prompt, enter ?.

To start the ^DATACHECK routine, do the following:

  1. Enter the following commands in the Terminal:

    set $namespace = "%SYS"
    %SYS>do ^DATACHECK
    
    
  2. The main menu is displayed. Enter the number of your choice or press Enter to exit the routine:

    1) Create New Configuration
    2) Edit Configuration
    3) View Details
    4) Start
    5) Stop
    6) Delete Configuration
    7) Incoming Connections to this System as a DataCheck Source
    
    Option? 
    
    
    Note:

    For options 2 through 6, if you created multiple destination systems, a list is displayed so that you can select the destination system on which to perform the action.

    The main menu lets you select DataCheck tasks to perform as described in the following table:

    Option Description
    1) Create New Configuration

    Prompts for the name of a new DataCheck destination system configuration via the Create New Configuration prompt.

    2) Edit Configuration

    Displays the Edit Configuration submenu.

    3) View Details

    Displays the View Details submenu.

    4) Start

    Starts/restarts the destination system. If you are restarting, it resumes from where you stopped it.

    5) Stop

    Stops the destination system. If you restart the destination system after stopping it, it resumes from where you stopped it.

    6) Delete Configuration

    Deletes the specified destination system configuration.

    7) Incoming Connections to this System as a DataCheck Source

    Displays the Incoming Connections to this System as a DataCheck Source submenu.

    Note:

    This option must be selected on a source system.

Create New Configuration

This submenu lets you configure the destination system. When you select this option, the following prompt is displayed:

Configuration Name: 

If you are creating a DataCheck configuration on a system that is not a mirror member, the Edit Settings submenu is displayed, and you complete the configuration manually as described in Editing DataCheck Configurations on Non-mirror-based Systems.

If you are creating a DataCheck configuration on a system that is a mirror member, you are prompted for additional information that is dependent upon whether or not you want to base the data checking on mirroring. Choosing to configure DataCheck that is not based on mirroring displays the Edit Settings submenu, which you use to complete the configuration manually as described in Editing DataCheck Configurations on Non-mirror-based Systems. However, choosing to configure DataCheck based on mirroring restricts data checking to mirrored databases, and subsequent prompts are dependent on whether the destination system is a failover or async mirror member; for more information, see DataCheck for Mirror Configurations in this chapter.

Edit Configuration

The submenu lets you modify the destination system configurations. The options in the submenus are different depending on whether you are editing mirror-based or non-mirror-based configurations. For more information, see the following subsections:

Editing DataCheck Configurations on Non-mirror-based Systems

On a non-mirror-based system, when you select this option, the following prompts are displayed:

Configuration Name: dc_test
 
1) Import Settings from a Shadow   (static)
2) Connection Settings             (static)
3) Database Mappings               (static)
4) Globals to Check                (dynamic)
5) Performance Settings            (dynamic)
6) Manage Workflow                 (dynamic)

Option? 

Note:

In edit mode, if you created multiple destination systems, a list is displayed so that you can select a destination system to edit. In addition, before you edit the settings for options 1 through 3, you must stop the system.

Enter the number of your choice or press ^ to return to the previous menu. The options in this submenu let you configure the destination system as described in the following table:

Option Description
1) Import Settings from a Shadow

Deprecated; do not use.

2) Connection Settings

Information to connect to the source system.

3) Database Mappings

Lets you add, delete, or list database mappings on the source and destination systems.

4) Globals to Check

Globals to check or exclude from checking. For more information, see Specifying Globals and Subscript Ranges to Check in this chapter.

5) Performance Settings

Adjusts system resources (throttle) used and/or granularity with which DataCheck isolates discrepancies (minimum query size). For more information, see Performance Considerations in this chapter.

6) Manage Workflow

Manages the order of workflow phases. For more informations, see DataCheck Workflow in this chapter.

Editing Mirror-based DataCheck Configurations

On a mirror-based system, the following submenu is displayed:

Configuration Name: MIRRORSYS2_MIRRORX201112A_1
 
1) Globals to Check     
2) Performance Settings 
3) Manage Workflow      
4) Change Mirror Settings (Advanced) 

Option? 

Enter the number of your choice or press ^ to return to the previous menu. The options in this submenu let you configure the destination system as described in the following table:

Option Description
1) Globals to Check

Globals to check or exclude from checking. For more information, see Specifying Globals and Subscript Ranges to Check in this chapter.

2) Performance Settings

Adjusts system resources (throttle) used and/or granularity with which DataCheck isolates discrepancies (minimum query size). For more information, see Performance Considerations in this chapter.

3) Manage Workflow

Manages the order of workflow phases. For more informations, see DataCheck Workflow in this chapter.

4) Change Mirror Settings (Advanced)

See Planning DataCheck within the Mirror in the “Mirroring Considerations” section of this chapter

View Details

This submenu lets you monitor the status of the destination system, as well as view detailed information about the queries that are running and the results of data checking:

System Name: dc_test
 
1) View Status
2) View Results
3) View Queries
3) View Log

Option? 

Enter the number of your choice or press ^ to return to the previous menu. The options in this submenu let you view information about the destination system as described in the following table:

Option Description
1) View Status

Displays information about the selected destination system, including performance metrics for the DataCheck worker jobs, the source and state, current phase, workflow timeout, new phases requested, percentage of queries completed in the current phase, and the number of discrepancies recorded in this phase.

2) View Results

Displays the results for the selected destination system. For more information, see DataCheck Results in this chapter.

3) View Queries

Displays information about the queries submitted by the selected destination system (see DataCheck Queries). This includes the globals that remain to be processed (or global ranges if subscript include/exclude ranges are used), and indicates the active queries currently being worked on by DataCheck. A summary count is displayed at the end of the list.

4) View Log

Displays the selected destination system log file.

Note:

When ^DATACHECK is run against the two copies of a mirrored database on two mirror member instances, and that database is experiencing the rapid setting and killing of a whole global, it can display confusing results from the View Status option when compared to the View Results option. For example, it will report that there are unmatched answers in status, but will not actually report the globals that caused these answers in results (because further passes resolved the discrepancies). In addition, displayed answer counts can be larger than the actual number of globals within the instance (as displayed in the management portal, and as actually reported in the results).

When View Status shows Answers Rcvd having a non-zero unmatched value but discrepancies having a zero value, this is indicative of transient globals, not a data issue.

Incoming Connections to this System as a DataCheck Source

This submenu lets you view information about the source system:

1) List Source Systems
2) View Log

Option? 

Enter the number of your choice or press ^ to return to the previous menu. The options in this submenu let you view information about the source system as described in the following table:

Option Description
1) List Source Systems

Displays information about the DataCheck source system.

2) View Log

Displays the source system log file.

Special Considerations for Data Checking

Review the following special considerations when using DataCheck:

Performance Considerations

While data checking is useful to ensure consistency of databases on multiple systems, it consumes resources on both the source and destination systems. This could negatively impact performance of other processes on either system, depending on load and the configured DataCheck settings. DataCheck includes controls to help you manage performance.

The throttle is an integer between 1 and 10 that controls how much of the available system resources (CPU, disk I/O, database cache) DataCheck may use. The throttle value can be changed at any time, to take effect immediately; for example, the value can be increased during periods when the system load is otherwise expected to be light, and decreased during periods when system load is heavy. This is useful for checks that are expected to run for an extended period of time. (The DataCheck routine can also be stopped during periods of high load; upon being restarted, it automatically resumes at the point in the check at which it was stopped.)

The characteristics of every system are different, but the following general descriptions of throttle values apply:

  • A throttle setting of 1 uses no more resources than one process for performing DataCheck queries. In other words, it uses at most one CPU and executes only one disk I/O at a time. Whether the resources used are primarily CPU or primarily disk I/O depends on whether the data is in buffers already; this can vary as the check progresses.

  • As the throttle is raised up to 8, more system resources are consumed at each step. For systems with large amounts of resources (many CPUs, and so on), each interval is scaled to increase resource consumption by, very roughly, the same multiplicative factor, such that at a throttle setting of 8, DataCheck uses a large portion of system resources, taking into account the number of CPUs and other factors. At a throttle setting of 8, however, the system is still expected to be responsive to a light load of application activity, and settings of 6, 7, or 8 may be appropriate on a typical system at off-peak hours.

  • A throttle setting of 9 is like 8, but allows DataCheck jobs to use the entire buffer pool (unsets the batch flag).

  • A throttle setting of 10 attempts to utilize nearly all system resources for completing the check.

The View Status option on the ^DATACHECK View Details submenu shows performance metrics for the DataCheck worker jobs, helping you understand performance characteristics and how they relate to the throttle setting.

The implementation of the throttle may differ over time as software and hardware characteristics evolve.

The minimum query size represents the minimum number of global nodes allowed to traverse a query; in other words, it determines the minimum size of the range of global nodes to which DataCheck isolates discrepancies. Lower values help locate discrepancies more easily, while higher values significantly improve the speed of the check through unmatched sections. For example, if the minimum query size were set to 1 (not recommended), each discrepant node could be reported as a separate unmatched range, or at least as a range of all unmatched globals, precisely identifying the discrepancies but greatly impacting performance; if the minimum query size were set to 1000 (also not recommended), one or more discrepancies would be reported as a range of at least 1000 unmatched nodes, making it difficult to find them, but the check would be much faster. The default is 32, which is small enough to allow for relatively easy visual inspection of the global nodes in a range using the Management Portal (see the “Managing Globals” chapter of Using Globals) while not greatly impacting performance.

Security Considerations

The destination system stores subscript ranges for globals that it has checked and is checking (results and queries). (See Specifying Globals and Subscript Ranges to Check in this chapter.) This subscript data is stored in the ^SYS.DataCheck* globals in the %SYS namespace (in the IRISSYS database by default). Global values are not stored; only subscripts are stored. These global subscripts from other databases that are stored in the %SYS namespace may contain sensitive information that may not otherwise be visible to some users, depending on the security configuration. Therefore, some special care is needed in secured deployments.

Use of the ^DATACHECK routine, including the ability to configure, start, and stop, requires both %Admin_Operate:Use privilege and Read/Write privilege (Write for configuring a check, Read for all other tasks) on the database containing the ^SYS.DataCheck* globals which, by default, is IRISSYS. The configuration and results data stored in the ^SYS.DataCheck* globals can be viewed and manipulated outside of the routine by anyone with sufficient database privileges.

For any secure deployment in which %DB_IRISSYS:Read privilege is given to users that should not have access to DataCheck data, you can add a global mapping to the %SYS namespace to map ^SYS.DataCheck* globals to a separate database other than IRISSYS. This database can be assigned a new resource name; read permission for the resource can then be restricted to those roles authorized to use DataCheck.

The ability for another destination system to connect to this system as a source is governed by this system's %Service_DataCheck service. This service is disabled by default on new installations and can be configured with a list of allowed IP addresses. For more information, see Enabling the DataCheck Service in this chapter.

For encryption of the communication between the two systems, the destination system can be configured to use TLS to connect to the source. See Configuring the InterSystems IRIS Superserver to Use TLS for details.

FeedbackOpens in a new tab