CUP Server Recovery – The Proper Way

Howdy!

If you ever happen to be tasked with recovery of CUCM IM & Presence (a newer name for CUP) server, perhaps this post will help you.

Disclaimer: The following recovery process worked for me (and I have a number of successful recoveries under my belt), and while every step has been taken to provide my readers with accurate information, please use your discretion before taking any decisions based on the contents of this post. You may want to validate some or all of the steps with a Cisco TAC engineer.

Step 1: Preserve your existing backups! If you have DRF backups in place, save them by copying the backup files to a safe place. Why? Existing backup copies can be overwritten by newer backup jobs (say, in case the restore process takes you longer than expected and you have selected to keep only a couple of most recent backups when you configured Backup Device in DRS).

Step 2: In CUCM, unassign all users from the existing Presence Group.

Step 3: Delete Presence Group and delete the failed CUP server from System -> Server in CUCM.

Step 4: Add the CUP node back to CUCM with the same name under System -> Server. A default Presence Group is created and the CUP node is added to it – that’s fine.

Step 5: Proceed with a fresh install of the CUP node. Note: the version should match exactly the one of the failed node.
Hint: all CUP ISOs that are available for download on CCO are bootable, so you do not have to use any tricks to turn non-bootable ISO into a bootable one.

Step 6: Proceed with DRS recovery. Now, this is important: you must perform full cluster recovery (restore both CUP and CUCM) from your backup. Why? Well, since the CUP node has been deleted and re-added in steps 3 and 4 above, the CUP server will have a new PKID in CUCM database. If you just recover the CUP node without recovering CUCM database, the node will have a different (old) PKID and thus would no longer match new PKID recorded in CUCM. As a result, certain services will not start in CUP and you will see the following error in CUP: “The IM&P Publisher node was deleted from the CUCM server list. This node needs to be reinstalled.”

Step 7: Once the restore process completes, restart CUCM Pub first (utils system restart), wait for it to come up, then restart CUCM Sub and CUP Pub.

Step 8: Perform typical health checks of your CUCM and CUP nodes:

  • utils dbreplication status, followed by utils dbreplication runtimestate on your CUCM Pub to verify database replication between Pub and Sub nodes;
  • Launch RTMT, connect to CUCM Pub and review the alarms;
  • Perform diagnostics in CUP Pub (Diagnostics – > System Troubleshooter)

That’s it! Hope this helps someone.

CUP Publisher Installation Stalls/Fails

It looks like I’m hitting another Cisco bug, this time at the installation of CUP Publisher node for a recently recovered CUCM cluster of two nodes. The installation from a bootable ISO goes smooth for the most part of the process, but then stalls with no error right after completing Security Configuration (which is a step right after Call Manager Connectivity Validation):

CUP_Error

This bug is described in CSCuv74715. First, CUP installer will try to buy some time:

CUP_Time

Then, after you you click “More Time” it will run the validation tests for about 20 minutes, ultimately resulting in the blank Error page as seen above.

Problem is, there is no version 10.5(2.230000.1) of CUCM or CUP available on CCO.  So the solution is to obtain the latest available build of the CUP for the same version (10.5.2 in this case) from Cisco TAC (or create your own bootable iso from non-bootable ISO that can be downloaded from CCO) and retry the installation process.

UPDATE: I will save you some time: you do not need to download a new ISO if the only installer you have available is 10.5.2.20000-1. The error message that should have appeared as the 10.5.2.20000-1 installer stalls is as follows:

CUP_NTP

In my case, the NTP synchronization on CUCM Publisher had status of “unsynchronised, time server re-starting”. In this particular environment, the NTP reference servers were Domain Controllers (so Windows-based). Cisco UC VMs are quite finicky when it comes to synchronizing time with Windows-based NTP servers; the solution is to point the CUCM Publisher to a Linux-based NTP server.

In order to change the NTP server references for CUCM Pub, SSH to it and first confirm that NTP is operational by issuing “utils ntp status” command (just like it says in the error message). If the status is anything other than “synchronised to NTP server (xxx.xxx.xxx.xxx) at stratum x”, try to restart the NTP by issuing “utils ntp restart” command and checking the status again. If you do have a Linux-based NTP server, you can change the NTP server reference for Pub by doing the following:

  1. Add the new NTP server by issuing “utils ntp server add xxx.xxx.xxx.xxx” command
  2. Remove old NTP server(s) by issuing “utils ntp server detele” command and selecting the old NTP server(s) at the prompt.
  3. Confirm the NTP service restart when prompted.
  4. Check the NTP status again by issuing “utils ntp status” command.

Once the NTP status shows synchronized, proceed with CUP server installation.

Hope this helps someone.