Skip to main content

Recover from Startup Errors Using ^STURECOV

Recover from Startup Errors Using ^STURECOV

During the InterSystems IRIS startup procedure if the journal or transaction restore process encounters errors, such as <FILEFULL> or <DATABASE>, the procedure logs the errors in the messages log (messages.log) and starts the system in single-user mode.

InterSystems IRIS provides a utility, ^STURECOV, to help you recover from the errors and start InterSystems IRIS in multiuser mode. The routine has several options which you can use to retry the failed operation and bring the system up, or ignore the errors and bring the system up. The journal restore phase tries to do as much work as possible before it aborts. If a database triggers more than three errors, it aborts the recovery of that database and leaves the database dismounted.

Note:

The ^STURECOV utility does not work on a mirror member on which transaction rollback is pending or in progress because the system does not activate a mirrored database read/write until transaction rollback has been completed. In this case, InterSystems IRIS enables you to run the Manage^JRNROLL routine, which provides a way to force the system to come up and store transaction rollback information which can be used to roll back transactions after the system is up and running. For more information, see Manage Transaction Rollback Using Manage^JRNROLL.

During transaction rollback, the first error in a database causes the rollback process to skip that database in the future. The process does not fully replay transactions that reference that database; it stores them for rollback during the recovery process.

When InterSystems IRIS encounters a problem during the dejournaling phase of startup it generates a series of messages log messages similar to the following:

08/10-11:19:47:024 ( 2240) System Initialized. 
08/10-11:19:47:054 ( 2256) Write daemon started. 
08/10-11:19:48:316 ( 1836) Performing Journal Recovery 
08/10-11:19:49:417 ( 1836) Error in JRNRESTB: <DATABASE>restore+49^JRNRESTB 
     C:\MyIRIS\mgr\journal\20230810.004 addr=977220 
     ^["^^C:\MyIRIS\mgr\jo1666\"]test(4,3,28) 
08/10-11:19:49:427 ( 1836) Error in JRNRESTB: <DATABASE>restore+49^JRNRESTB 
     C:\MyIRIS\mgr\journal\20230810.004 addr=977268 
     ^["^^C:\MyIRIS\mgr\test\"]test(4,3,27) 
08/10-11:19:49:437 ( 1836) Error in JRNRESTB: <DATABASE>restore+49^JRNRESTB 
     C:\MyIRIS\mgr\journal\20230810.004 addr=977316 
     ^["^^C:\MyIRIS\mgr\test\"]test(4,3,26) 
08/10-11:19:49:447 ( 1836) Error in JRNRESTB: <DATABASE>restore+42^JRNRESTB 
     C:\MyIRIS\mgr\journal\20230810.004 addr=977748 
     ^["^^C:\MyIRIS\mgr\test\"]test(4,2,70) 
08/10-11:19:50:459 ( 1836) Too many errors restoring to C:\MyIRIS\mgr\test\. 
 Dismounting and skipping subsequent records 
08/10-11:19:50:539 ( 1836) 4 errors during journal restore, 
see console.log file for details. 
Startup aborted, entering single user mode. 
 

If the errors are from transaction rollback, then the output looks similar to this:

08/11-08:55:08:732 ( 428) System Initialized. 
08/11-08:55:08:752 ( 1512) Write daemon started. 
08/11-08:55:10:444 ( 2224) Performing Journal Recovery 
08/11-08:55:11:165 ( 2224) Performing Transaction Rollback 
08/11-08:55:11:736 ( 2224) Max Journal Size: 1073741824 
08/11-08:55:11:746 ( 2224) START: C:\MyIRIS\mgr\journal\20230811.011 
08/11-08:55:12:487 ( 2224) Journaling selected globals to 
     C:\MyIRIS\mgr\journal\20230811.011 started. 
08/11-08:55:12:487 ( 2224) Rolling back transactions ... 
08/11-08:55:12:798 ( 2224) Error in %ROLLBACK: <DATABASE>set+2^%ROLLBACK 
     C:\MyIRIS\mgr\journal\20230811.010 addr=984744 
     ^["^^C:\MyIRIS\mgr\test\"]test(4,1,80) 
08/11-08:55:12:798 ( 2224) Rollback of transaction for process id #2148 
 aborted at offset 984744 in C:\MyIRIS\mgr\journal\20230811.010. 
08/11-08:55:13:809 ( 2224) C:\MyIRIS\mgr\test\ dismounted - 
      Subsequent records will not be restored 
08/11-08:55:13:809 ( 2224) Rollback of transaction for process id #924 
 aborted at offset 983464 in C:\MyIRIS\mgr\journal\20230811.010. 
08/11-08:55:14:089 ( 2224) STOP: C:\MyIRIS\mgr\journal\20230811.011 
08/11-08:55:14:180 ( 2224) 1 errors during journal rollback, 
see console.log file for details. 
Startup aborted, entering single user mode. 
 

Both output listings end with instructions such as:

Enter IRIS with 
     C:\MyIRIS\bin\irisdb -sC:\MyIRIS\mgr -B 
and D ^STURECOV for help recovering from the errors. 

When InterSystems IRIS cannot start properly, it starts in single-user mode. While in this mode, execute the commands indicated by these instructions to enter InterSystems IRIS (see Administrator Terminal Session).

You are now in the manager’s namespace and can run the startup recovery routine, ^STURECOV:

Do ^STURECOV

The ^STURECOV journal recovery menu appears as follows:


Journal recovery options 
-------------------------------------------------------------- 
1) Display the list of errors from startup 
2) Run the journal restore again 
3) Bring down the system prior to a normal startup 
4) Dismount a database 
5) Mount a database 
6) Database Repair Utility 
7) Check Database Integrity 
8) Reset system so journal is not restored at startup 
9) Display instructions on how to shut down the system 
10) Display Journaling Menu (^JOURNAL)
-------------------------------------------------------------- 
H) Display Help
E) Exit this utility
--------------------------------------------------------------
 
Enter choice (1-10) or [Q]uit/[H]elp?

Only UNIX®/Linux systems contain option 9 on the menu.

Before starting the system in multiuser mode, correct the errors that prevented the journal restore or transaction rollback from completing. You have several options regarding what to do:

  • Option 1 — The journal restore and transaction rollback procedure tries to save the list of errors in the ^%SYS() global. This is not always possible depending on what is wrong with the system. If this information is available, this option displays the errors.

  • Option 2 — This option performs the same journal restore and transaction rollback which was performed when the system was started. The amount of data is small so it should not be necessary to try and restart from where the error occurred.

  • Option 3 — When you are satisfied that the system is ready for use, use this option to bring the instance down prior to restarting it in a normal fashion.

  • Option 4 — This option lets you dismount a database. Generally, use this option if you want to let users back on a system but you want to prevent them from accessing a database which still has problems (^DISMOUNT utility).

  • Option 5 — This option lets you mount a database (^MOUNT utility).

  • Option 6 — This option lets you edit the database structure (^REPAIR utility).

  • Option 7 — This option lets you validate the database structure (^INTEGRIT utility).

  • Option 8 — This updates the system so that it does not attempt journal restore or transaction rollback at startup. This applies only to the next time the startup process is run. Use this in situations where you cannot get journal recovery to complete and you need to allow users back on the system. Consider dismounting the databases which have not been recovered. This operation is not reversible. You can perform journal restore manually using the ^JRNRESTO utility.

  • Option 9 — It is not possible to shut down the system from this utility, but this option displays instructions on how to shut the system down from the UNIX® command line.

  • Option 10 — This option brings up the journaling menu which allows you to browse and restore journal files. There are options which start and stop journaling but these are not generally of interest when resolving problems with journaling at startup.

Take whatever corrective action is necessary to resolve the problem. This may involve using the ^DATABASE routine to extend the maximum size of the database, or it may require freeing space on the file system or using the ^INTEGRIT and ^REPAIR utilities to find and correct database degradation. As you do this work, you can use Option 2 of the ^STURECOV utility to retry the journal replay/transaction rollback as many times as necessary. You can display any errors you encounter, including those from when the system started, using Option 1. When you correct all the problems, and run Option 2 without any errors, use Option 3 to bring the system up in multiuser mode.

If you find that you cannot resolve the problem, but you still want to bring the system up, use Option 8 to clear the information in the InterSystems IRIS image journal (.wij file) that triggers journal restore and transaction rollback at startup. The option also logs the current information in the messages log. Once this completes, use Option 3 to start the system. Use this facility with care, as it is not reversible.

If InterSystems IRIS was unable to store the errors during startup in the ^%SYS() global for ^STURECOV to display, you may get an initial message before the menu that looks like this:


There is no record of any errors during the prior startup 
This could be because there was a problem writing the data 
Do you want to continue ? No => yes 
Enter error type (? for list) [^] => ? 

Supported error types are: 
     JRN - Journal and transaction rollback 

Enter error type (? for list) [^] => JRN 
 

Journaling errors are one type of error that this utility tries to handle and that is the scope of this topic. Other error types are discussed in the appropriate sections of the documentation.

Caution:

Only use the ^STURECOV utility when the system is in single-user mode following an error during startup. Using it while the system is in any other state (for example, up running normally) can cause serious damage to your data as it restores journal information if you ask it to and this information may not be the most current data. The ^STURECOV utility warns you, but it lets you force it to run.

FeedbackOpens in a new tab