Unable to communicate with ‘cup-server.fqdn’. AXL query HTTP error “HTTPError: 404”

Okay, folks, if you are see the following error when you add the CUP node to your Expressway-C while implementing MRA, then you are probably not alone:

unable-to-communicate-with-cups

The solution is quite simple: make sure your CUP node(s) (and every other servers in your cluster, for that matter) are specified as FQDNs. Navigate to CUCM Publisher, select System -> Server and update the server names from hostnames to FQDNs by appending your domain name.

server-listed-as-fqdn

This would also ensure that the certificates obtained for your CUCM/CUP servers are trusted, as CN/SANs are going to match what’s recorded under Server Configuration.

It’s a step that is easy to forget and not many docs are mentioning this requirement (here’s the one that does: http://www.cisco.com/c/en/us/support/docs/unified-communications/unified-presence/116917-technote-certificate-00.html).

CUP Server Recovery – The Proper Way

Howdy!

If you ever happen to be tasked with recovery of CUCM IM & Presence (a newer name for CUP) server, perhaps this post will help you.

Disclaimer: The following recovery process worked for me (and I have a number of successful recoveries under my belt), and while every step has been taken to provide my readers with accurate information, please use your discretion before taking any decisions based on the contents of this post. You may want to validate some or all of the steps with a Cisco TAC engineer.

Step 1: Preserve your existing backups! If you have DRF backups in place, save them by copying the backup files to a safe place. Why? Existing backup copies can be overwritten by newer backup jobs (say, in case the restore process takes you longer than expected and you have selected to keep only a couple of most recent backups when you configured Backup Device in DRS).

Step 2: In CUCM, unassign all users from the existing Presence Group.

Step 3: Delete Presence Group and delete the failed CUP server from System -> Server in CUCM.

Step 4: Add the CUP node back to CUCM with the same name under System -> Server. A default Presence Group is created and the CUP node is added to it – that’s fine.

Step 5: Proceed with a fresh install of the CUP node. Note: the version should match exactly the one of the failed node.
Hint: all CUP ISOs that are available for download on CCO are bootable, so you do not have to use any tricks to turn non-bootable ISO into a bootable one.

Step 6: Proceed with DRS recovery. Now, this is important: you must perform full cluster recovery (restore both CUP and CUCM) from your backup. Why? Well, since the CUP node has been deleted and re-added in steps 3 and 4 above, the CUP server will have a new PKID in CUCM database. If you just recover the CUP node without recovering CUCM database, the node will have a different (old) PKID and thus would no longer match new PKID recorded in CUCM. As a result, certain services will not start in CUP and you will see the following error in CUP: “The IM&P Publisher node was deleted from the CUCM server list. This node needs to be reinstalled.”

Step 7: Once the restore process completes, restart CUCM Pub first (utils system restart), wait for it to come up, then restart CUCM Sub and CUP Pub.

Step 8: Perform typical health checks of your CUCM and CUP nodes:

  • utils dbreplication status, followed by utils dbreplication runtimestate on your CUCM Pub to verify database replication between Pub and Sub nodes;
  • Launch RTMT, connect to CUCM Pub and review the alarms;
  • Perform diagnostics in CUP Pub (Diagnostics – > System Troubleshooter)

That’s it! Hope this helps someone.

How to Dump Install Logs to Serial Port – The Proper Way

Howdy! Those of you are struggling to follow the instructions of dumping the install logs to a serial port of a Cisco UC (CUCM/CUP/CUC) VM using any of the official public guides (see here and here), perhaps this post will help you. Here’s how it works in vSphere 5.x VMware environment:

  1. When the installation fails the first time, the system will prompt you to dump diagnostic information. Select “Yes”:
    prompt_to_dump_diag_info
  2. The installer will further prompt you to connect the serial port before continuing. Hit “Continue”:
    attach_serial_prompt
  3. The installer will confirm that the dumping is complete (no, it ain’t, we know that). Hit “Continue” to halt the VM:
    continue_to_halt
  4. Once the VM shuts down, edit the hardware to add Serial Port:
    • Select “Serial Port” -> Next: 
      add_serial_port_1
    • Select “Output to file” -> Next:
      add_serial_port_2
    • Browse the Datastores to the location you want to save the output to and type the name of the file (can be anything; in my example it is called “serial0.log”):
      add_serial_port_3
    • Accept the default settings (“Connect at power on” and “Yield CPU on poll”) -> Next:
      add_serial_port_4
    • Confirm the additional hardware settings and hit “Finish”:
      add_serial_port_5
  5. Power on the VM that failed the install. The VM will boot and prompt you to dump the installation logs again:
    installation_failed
  6. Confirm that the serial port is attached and proceed with the dump of the diagnostic info to the Serial Port:
    dump_is_complete
  7. Important: Shutdown the VM. If you don’t, you will get the “File operation failed” error when you try to download the dump file:
    file_operation_failed
  8. Once downloaded, use 7-Zip to extract the logs (you will need to “unzip” the logs twice: extract serial0.log to serial0~ file, then extract that file to another folder to reveal the logs):
    contents_of_typical_install_dump

That’s it! Review the install logs (under \common\log\install\install.log) to figure out the problem with your installation or submit the logs to Cisco TAC engineer for analysis and further troubleshooting.

Hope this helps someone.

Two Issues, One Fix: SELinux Enforced vs. Permissive Mode

So I’m performing an upgrade of yet another CUCM and CUC clusters from 10.5.2 to 11.0.1 (my fifth this month!) and I get two separate issues post upgrade:

Issue #1: High CPU utilization on both Pub and Sub nodes post-upgrade to 11.0.1. The command “show process using-most cpu” shows “/usr/bin/python -Es /usr/sbin/setroubleshootd -f” process as using most CPU:

admin:show process using-most cpu
PCPU PID CPU NICE STATE CPUTIME ARGS
%CPU PID CPU NI S TIME COMMAND
64.5 18193 - 0 S 00:23:06 /usr/bin/python -Es /usr/sbin/setroubleshootd -f

Issue #2: VMware Tools are shown as “Not Running (Not Installed)” for one of the Unity Connection nodes. Re-installation of VMware Tools using any of the acceptable methods has no effect.

So what’s the fix?

Fix for Issue #1: Change the SELinux mode to permissive (utils os secure permissive) on all affected nodes (Note: this can also apply to Cisco Unity Connection appliance).

Fix for Issue #2: Change the SELinux mode to permissive (utils os secure permissive) and reinstall VMware Tools (utils vmtools refresh).
Note: DO NOT change security back to ‘enforced’ if you are running VMware Tools 10.0 or higher. Read more about this issue on Cisco’s Bug Search Tool: https://bst.cloudapps.cisco.com/bugsearch/bug/CSCux90747.

So what is SELinux anyway? Since Cisco UC servers are built on RHEL6, it’s best to turn to documentation from the source: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Security-Enhanced_Linux/index.html.

By changing the SELinux mode from the default ‘enforced’ to ‘permissive’, you are not disabling SELinux, but rather instruct SELinux to log rather than block access to files and/or processes.

Hope this helps someone.

CUP Publisher Installation Stalls/Fails

It looks like I’m hitting another Cisco bug, this time at the installation of CUP Publisher node for a recently recovered CUCM cluster of two nodes. The installation from a bootable ISO goes smooth for the most part of the process, but then stalls with no error right after completing Security Configuration (which is a step right after Call Manager Connectivity Validation):

CUP_Error

This bug is described in CSCuv74715. First, CUP installer will try to buy some time:

CUP_Time

Then, after you you click “More Time” it will run the validation tests for about 20 minutes, ultimately resulting in the blank Error page as seen above.

Problem is, there is no version 10.5(2.230000.1) of CUCM or CUP available on CCO.  So the solution is to obtain the latest available build of the CUP for the same version (10.5.2 in this case) from Cisco TAC (or create your own bootable iso from non-bootable ISO that can be downloaded from CCO) and retry the installation process.

UPDATE: I will save you some time: you do not need to download a new ISO if the only installer you have available is 10.5.2.20000-1. The error message that should have appeared as the 10.5.2.20000-1 installer stalls is as follows:

CUP_NTP

In my case, the NTP synchronization on CUCM Publisher had status of “unsynchronised, time server re-starting”. In this particular environment, the NTP reference servers were Domain Controllers (so Windows-based). Cisco UC VMs are quite finicky when it comes to synchronizing time with Windows-based NTP servers; the solution is to point the CUCM Publisher to a Linux-based NTP server.

In order to change the NTP server references for CUCM Pub, SSH to it and first confirm that NTP is operational by issuing “utils ntp status” command (just like it says in the error message). If the status is anything other than “synchronised to NTP server (xxx.xxx.xxx.xxx) at stratum x”, try to restart the NTP by issuing “utils ntp restart” command and checking the status again. If you do have a Linux-based NTP server, you can change the NTP server reference for Pub by doing the following:

  1. Add the new NTP server by issuing “utils ntp server add xxx.xxx.xxx.xxx” command
  2. Remove old NTP server(s) by issuing “utils ntp server detele” command and selecting the old NTP server(s) at the prompt.
  3. Confirm the NTP service restart when prompted.
  4. Check the NTP status again by issuing “utils ntp status” command.

Once the NTP status shows synchronized, proceed with CUP server installation.

Hope this helps someone.