VMware Tools fail to update due to SELinux denials

Another day, another bug. This one has to do with VMTools status showing “Not running (Not installed)” for my Call Manager running 11.5.1.14900-11 following the ESXi hypervisor upgrade to version 6.0.0U3. Attempting to update the VMTools wuth “utils vmtools refresh” command in CLI results in the following error:

Starting VMware Tools upgrade ...
Uninstalling VMware Tools 10.0.6-3560309 ...

Execution of uninstall phase failed

Rebooting the VM and restarting the process did not help. Upon contacting Cisco TAC, it was evident that we are hitting CSCvb67807 bug.  There is no workaround to this, so you need to contact Cisco TAC who has root access to the system to perform manual uninstall and reinstall of VMware Tools.

Jabber SSO: “Invalid SAML response” on logon.

I came across an interesting issue with Jabber shortly after implementing a Single Sign-On for one of the clusters. Upon launching Jabber, the following message would appear:

“Invalid SAML response. This may be caused when time is out of sync between the Cisco Unified Communications Manager and IDP servers. Please verify the NTP configuration on both servers. Run “utils ntp status” from the CLI to check this status on Cisco Unified Communications Manager.”

Jabber SSO Invalid SAML response

Naturally, one would follow the instructions and verify that time between the CUCM Pub and IdP (in this case, Microsoft ADFS) is in sync. In this case they were indeed. Time to collect some relevant logs and dig deeper.

From the Troubleshooting section of the SSO Configuration Guide, we learn that in order to get any meaningful logs on SSO, we need to set the SAML logs to “debug” level by executing set samltrace level debug command in the CUCM Pub’s CLI. Once the issue is re-created, launch RTMT and download “Cisco SSO” logs, just the like guide tells.

In the logs,

2018-03-14 12:33:40,009 DEBUG [http-bio-443-exec-16] fappend.SamlLogger - SPACSUtils.getResponse: got response=<samlp:Response xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol" ID="_be693914-03a7-4899-8a88-49443bda3ef9" InResponseTo="s21a8227234a9c9f26a73f8627a768ee197584baa4" Version="2.0" IssueInstant="2018-03-14T12:33:39Z" Destination="https://CUCM-Pub-FQDN:8443/ssosp/saml/SSO/alias/CUCM-Pub-FQDN" Consent="urn:oasis:names:tc:SAML:2.0:consent:unspecified"><saml:Issuer xmlns:saml="urn:oasis:names:tc:SAML:2.0:assertion">http://IdP-URL/adfs/services/trust</saml:Issuer><samlp:Status xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol">
<samlp:StatusCode xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol"
Value="urn:oasis:names:tc:SAML:2.0:status:Success">

If you look careful enough, you may notice that the response from the IdP to CUCM Pub comes 1 second after it is expected.

So how do you fix this? Turns out there is a way to configure Microsoft ADFS to allow for a small time skew. Here’s the procedure:

  1. Login to your primary ADFS server with sufficient permissions.
  2. Launch Windows PowerShell with elevated privileges (as Administrator).
  3. Retrieve the list of configured relaying party trusts to find one that is related to your CUCM:
    Get-AdfsRelyingPartyTrust | select *identifier*
  4. Get details on the allowable time skew for your CUCM:
    Get-AdfsRelyingPartyTrust -Identifier CUCM-Pub-FQDN | select *identifier*, *skew*
    
    Identifier NotBeforeSkew
    ---------- -------------
    {CUCM-Pub-FQDN} 0
  5. Set the “notbeforeskew” to 1, which would allow the request to come up to 1 minute earlier than expected on ADFS:
    Set-AdfsRelyingPartyTrust -TargetIdentifier "CUCM-Pub-FQDN" -NotBeforeSkew 1
  6. Repeat steps 4 and 5 on other Cisco UC nodes that are configured for SSO (e.g. CUC servers).

Now test. If authentication works, the job is done. Hope this helps someone.

P.S. Don’t forget to set the SAML trace level back to its default (INFO) by issuing command set samltrace level info in CUCM Pub’s CLI.

LDAP Directory Synchronization throws HTTP 500 error

Recently, I came across a case when LDAP Directory Synchronization stopped syncing users in one cluster. When attempting to do a manual sync, the CUCM would throw the following error:

HTTP Status 500
The server encountered an internal error that prevented it from fulfilling this request
exception: BadPaddingException Invalid padding.
note: The Full stack trace of the root cause is available in the logs.

LDAP DirSync HTTP 500

Attempting to make any modifications to the existing LDAP Directory configurations resulted in the same error and the issue appeared both on Pub and Sub nodes. Restarting Tomcat or Cisco DirSync service did not resolve the issue and the logs did not provide any useful information.

Well, there is a fix! Remove and re-create the LDAP Directory configuration(s) with identical settings and you should be good to go.

Hope this helps someone.

Unable to communicate with ‘cup-server.fqdn’. AXL query HTTP error “HTTPError: 404”

Okay, folks, if you are see the following error when you add the CUP node to your Expressway-C while implementing MRA, then you are probably not alone:

unable-to-communicate-with-cups

The solution is quite simple: make sure your CUP node(s) (and every other servers in your cluster, for that matter) are specified as FQDNs. Navigate to CUCM Publisher, select System -> Server and update the server names from hostnames to FQDNs by appending your domain name.

server-listed-as-fqdn

This would also ensure that the certificates obtained for your CUCM/CUP servers are trusted, as CN/SANs are going to match what’s recorded under Server Configuration.

It’s a step that is easy to forget and not many docs are mentioning this requirement (here’s the one that does: http://www.cisco.com/c/en/us/support/docs/unified-communications/unified-presence/116917-technote-certificate-00.html).

Unable to Sign-in to WebEx PT/WebEx Assistant or Launch WebEx Meetings

Those of you, who have upgraded their CWMS environments to version 2.7(1), please note that there was a change in the way this new version of WebEx is dealing with TLS Support. TLS 1.0 is no longer supported. “Big deal!”, you say, as TLS 1.1, TLS 1.2 and even the draft of TLS 1.3 are all out with the first two being widely supported. But then you get a user who, on a day after the upgrade to a newer version of CWMS, complains that he/she is unable to login to WebEx Assistant (a.k.a. WebEx Productivity Tools), getting the following error:

WebEx Assistant "The system is having difficulty processing your request"

The system is having difficulty processing your request. Try again a little later.

The user can login to the CWMS site, but is unable to join an existing meeting or launch a new one, getting the following error:

WebEx "Setup was unsuccessful. Please try again"

Setup was unsuccessful. Please try again.

Error [5]

Note: I’ve also seen “Error [103]” and “Error [104]” codes.

If this wasn’t enough, the user may also see this when attempting to use WebEx PT from within Outlook:

WebEx PT Cannot Access your WebEx site now

Cannot access your WebEx site now. Try again later.

The solution is to enable TLS 1.1 (and TLS 1.2, for that matter) under “Advanced” tab of IE’s Internet Options:

Internet Explorer Advanced Options TLS

The settings will take immediate effect, allowing the user to login to WebEx Assistant and launch/join a WebEx meeting.

Hope this helps someone.

Cisco Phone Security: End-to-End Signalling and Media Encryption with SME

Hello folks!

For those of you who are in charge of a large VoIP environment with multiple CUCM clusters, I dedicate this post. This is going to be a multi-part document, since the topic being covered is rather large and I want to be as detailed as possible.

The Environment

Topology Diagram

We are dealing with two CUCM clusters that have SIP trunks to Cisco SME cluster. In reality, the environment is a much larger one, consisting of 12 multi-node CUCM clusters scattered around the globe. I have intentionally simplified the topology to include just three CUCM clusters, with one of them being used as SME.

The Challenge

In this particular case, the client would like to implement end-to-end phone security (signalling and media encryption) on all endpoints that support it. Because the traffic is traversing SME, we need to make sure that the SIP trunks between CUCM and SME clusters are secure. In a traditional two-cluster scenario, all you need to do is to follow this awesome guide by Jason Burns, where we exchange CallManager.pem self-signed certificates between all nodes, configure SIP Trunk Security Profile and off we go. But imagine doing that certificate exchange with 12 multi-node clusters!

The Solution

We are going to use our own Enterprise CA to issue new CallManager certificates for all CUCM clusters and import the Root CA certs only to trust the issuer. Here’s the detailed guide on how to achieve just that.

Part 1: Preparing Enterprise CA and Issuing the Certs

Note: it is assumed that you have all the necessary rights to work with your Windows Server-based Certificate Authority. 

Step 1: Using Certificate Authority Add-In, connect to your Root or Subordinate CA, navigate to ‘Certificate Templates’, right-click and select ‘Manage’:

Certificate Authority - Manage

Step 2: In the ‘Certificate Templates Console’ that will open, right-click on any existing certificate and select ‘Duplicate Template’. When prompted, select ‘Windows Server 2003 Enterprise’ version for the duplicate.

Step 3: In the ‘Properties of New Template’ window, give certificate template a name (e.g. “CallManager”), choose validity period (higher is good, but note that the certificate validity period should be less than of the issuing CA’s), and put a check mark on ‘Publish certificate in Active Directory’ box:

New Certificate Template Properties

Step 4: Under ‘Request Handling’ tab, make sure that ‘Signature and encryption’ is selected for the certificate purpose and the minimum key size is 2048 or greater bits.

Certificate Template Request Handling

Step 5: Under ‘Subject Name’ tab, select the ‘Supply in request’ radio button:

Certificate Template Subject Name

Step 6: Under ‘Extensions’ tab, click on the ‘Edit…’ button and ensure that ‘Client Authentication’ and ‘Server Authentication’ application policies are selected:

Certificate Template Extensions

Step 7: Under ‘Security’ tab, make sure that your user account has the necessary permissions, allowing you to Read, Write, and Enroll certificates using this template.

Step 8: Leave all other values at their default and click “OK” to create the new certificate template. Close the ‘Certificate Template Management’ window and return to the ‘Certification Authority’ console.

Step 9: Back in the ‘Certification Authority’ console, right-click on the “Certificate Templates” and select ‘New’ -> ‘Certificate Template to Issue”. Select the new template that was created in the previous steps (“CallManager”):

Certification Authority - New Template

 

Now you are ready to issue the actual certificate for your CallManager clusters using CA’s web-based AD Certificate Services (https://your-CA-FQDN/certsrv).

Part 2: Requesting, Issuing and Installing CallManager Certificates

The following steps are required to be completed on all CUCM nodes, including the SME ones.

Step 1: Navigate to Cisco Unified OS Administration site of your first cluster’s publisher node (https://CUCM-1/cmplatform).

Step 2: Go to Security -> Certificate Management and click ‘Find’ to display a list of current certificates.

Step 3: To enable SIP trunk encryption, we are going to generate a new certificate request file (CSR) for CallManager certificate type, so click on ‘Generate CSR’, select ‘CallManager’ for the certificate purpose, select ‘Multi-server (SAN)’ for distribution type:

Generate CallManager CSR

Note: for my Multi-Server (SAN) certificates, I typically edit the CN (Common Name) to match the Publisher’s FQDN. Why? This reduces the required number of SANs, which is important if you are using third-party CA that limits the number of alternative names for the cert.

Step 4:  Download the newly-generated CSR, open it in notepad and copy the generated Base-64-encoded certificate request.

Step 5: Navigate to your CA’s Active Directory Certificate Services web-based UI (https://FQDN-of-your-CA/certsrv/), click on “Request a certificate” -> “Advanced certificate request” and paste the certificate request in the textbox. Select “CallManager” certificate template that was created in Part 1 of this guide and then click “Submit >”:

Submit Certificate Request

Step 6: Once the certificate has been generated, download it in Base-64-Encoded format.

Step 7: Back to CA AD Certificate Services Web GUI, click on “Home” link in the upper-right corner to return to the main page and click on “Download a CA certificate, certificate chain, or CRL” link. Select the current CA certificate, and ‘Base 64’ for the encoding method, then click “Download CA certificate”.

Download CA Certificate

Important: If the certificate has been issued by your subordinate CA, you need to separate your Root CA certificate from Subordinate CA certificate. Here’s how:

  1. Open the CA certificate that was downloaded in Step 7 above and navigate to “Certification Path” tab.
  2. Select the “Root CA for [yourdomain]”, then click “View Certificate”:
    Viewing Root CA certificate
  3. In the new ‘Certificate’ window that will open, click on “Details” tab and then click “Copy to File…” button that would open Certificate Export Wizard.
  4. In the ‘Certificate Export Wizard’, click “Next” -> select “Base-64 encoded X.509 (.CER)” format and provide a path to save the file.

Step 8: Back to your CallManager’s OS Administration page, click on “Upload Certificate/Certificate Chain”.

  1. Upload the Root CA certificate as “CallManager-trust” type.
  2. If applicable, upload the Subordinate CA certificate as “CallManager-trust” type.
  3. Upload the CA-generated certificate as “CallManager” certificate.

Step 9: You will need to restart Cisco TFTP and CallManager services under Cisco Unified Serviceability page on all CallManager nodes in the cluster for the new certificate to take effect. Hold on to that just for now.

Part 3: Switching the cluster to Mixed-Mode

For the encryption to work on CallManager endpoints and trunks, you need to ensure that your CUCM clusters are switched from the default “Non-secure” mode to “Mixed-mode”. First, verify the cluster mode on all of your CallManager clusters by navigating to System -> Enterprise Parameters -> ‘Cluster Security Mode’:

Verifying cluster security mode

If the value is “0”, then the cluster is in “Non-secure” mode and need to be switched to “Mixed-mode” by following these steps.

Step 1: Open an SSH session with your CallManager Publisher in Cluster 1.

Step 2: Issue “utils ctl set-cluster mixed-mode” command:

admin: utils ctl set-cluster mixed-mode
This operation will set the cluster to Mixed mode. Do you want to continue? (y/n): y
Moving Cluster to Mixed Mode
Cluster set to Mixed Mode
Please Restart Cisco Tftp, Cisco CallManager and Cisco CTIManager services on all nodes in the cluster that run these services.

Step 3: Restart Cisco TFTP, Cisco CallManager and Cisco CTI Manager on all nodes in the cluster.

Important: If your cluster was already in Mixed-mode, you need to regenerate CTL certificates after replacing CallManager certificates on your CallManager cluster that we did in Part 2.

admin:utils ctl update CTLFile 
This operation will update the CTLFile. Do you want to continue? (y/n): y

Updating CTL file
CTL file Updated
Please Restart the TFTP and Cisco CallManager services on all nodes in the cluster that run these services

If you are using Cisco Jabber in your environment and you omit the above step, the first indication that something went wrong after CallManager certificate replacement would be your Jabber’s phone services not working for any device types (CSF, TCT, etc.). If you review the jabber.log in Jabber’s PRT report, you may see the following errors:
2016-09-09 09:39:07,736 ERROR [0x00001e14] [ice\TelephonyAdapterServerHealth.cpp(66)] [jcf.tel.adapter] [CSFUnified::TelephonyAdapter::getConnectionIpProtocol] - No connected ConnectionInfo of type: [eSIP]. Could not determine connection IP Protocol
2016-09-09 09:39:07,736 DEBUG [0x00001e14] [\impl\TelephonyServerHealthImpl.cpp(279)] [jcf.tel.health] [CSFUnified::TelephonyServerHealthImpl::updateHealth] - updating health with serverType [CucmSoftphone] serverHealthStatus [Unhealthy] serverConnectionStatus [Disconnected] serverAddress [CUCM1.domain.com (CCMCIP)] serviceEventCode [UnknownConnectionError] transportProtocol [SIP] ipProtocol [Unknown]

This is fixed by regenerating CTL files and restarting TFTP and CallManager cervices on all nodes in the cluster.

We shall continue the setup with Part 4 in the next post. Stay tuned!

Jabber for Windows: “Cannot communicate with the server”

Hello folks!

I was delaying this post for a while, hoping to find a resolution to the issue that I’ve been working on for over a month now. This is a somewhat unique case which may not be experienced by many Cisco customers, but there is a chance that there are others that are hitting the same defect. Below is a quick overview of the environment, the description of the actual problem and current workarounds as discovered through independent troubleshooting processes and through Cisco TAC.

Overview/Conditions:

  • The client is a multinational company with presence at some very remote locations. The internal communication between different sites is within MPLS with highly heterogeneous connectivity (a whole variety of fiber, copper, microwave and satellite communications).
  • Due to such a vastly distributed environment with varying network latencies, there are little opportunities to centralize call processing to a few regional clusters. Hence, the client has a number of CUCM clusters with some being in a very close geographic proximity to one another. This is especially true for one region where the only form of communication is via high-latency satellite connection.
  • Cisco Jabber is used throughout the organization, so each CUCM cluster would have a CUP server (or two) to support IM & Presence capabilities. The ILS is used for Inter-cluster lookup and a centralized UDS for user directory. (BTW, if anyone is interested in seeing a separate post on an end-to-end configuration of a multi-cluster environment (with MRA!) to support Jabber – please drop me a line in the comments section).
  • Majority of Cisco Jabber clients are running version 11.0 and above and all CUCM clusters were recently upgraded to version 11.0.1. Prior to upgrading to CUCM version 11.0.1, the client was running version 10.5.2.

Problem Description:

  • The issue affects users who are located in remote areas where communication between the site and the rest of the corporate network is happening over high-latency satellite link. Locally, Cisco Jabber users connect to their home clusters just fine. When a user with working Cisco Jabber travels to another remote location and tries to connect, the client shows the all-too-common “Cannot communicate with the server” error.
  • It has been observed that the maximum allowable latency between Cisco Jabber client and the user’s Home Cluster is somewhere between 600-700 ms (round trip delay). With latency of 1000 ms or more, Jabber does not connect with the above “Cannot communicate with the server” error.
  • The PRT may show the following errors:
    • 2016-06-21 07:18:26,226 INFO  [0x000018ac] [ls\src\http\BasicHttpClientImpl.cpp(448)] [csf.httpclient] [csf::http::executeImpl] – *—–* HTTP response code 0 for request #21 to https://cucm.example.com:6972/CSFdevice.cnf.xml
    • 2016-06-21 07:18:26,345 WARN  [0x000018ac] [mpl\ucm-config\tftp\TftpFileSet.cpp(113)] [csf.config] [csf::ucm90::TftpFileSet::fetchInitialTftpFile] – Failed to connect to Tftp server : result : UNKNOWN_ERROR
      Note how the above reveals that Jabber client is requesting a configuration file from TFTP using port 6972 rather than 6970. This change was introduced wtih CUCM version 11.x and Jabber 11.x. 

Problem Resolution/Workaround:

Currently, there is no solution to resolve this issue, but as always with Cisco, there are workarounds:

  • Downgrade affected Cisco Jabber clients to any 10.x version (e.g. the latest build for version 10.6 that is currently offered on CCO is 10.6(7)). Prior to version 11.x, Jabber was using port 6970 to grab the configuration file off TFTP server. CUCM 11.0 is backward compatible with older versions of Cisco Jabber clients and would allow Jabber to connect on that port. Don’t ask me how the difference in port for the same service (TFTP) could alter the Cisco Jabber’s behaviour, but this workaround actually works.
  • If users who experience the problem do not care about phone services and just want IM & Presence functionality to be working, provide instructions on how to connect Cisco Jabber to the CUP Server manually (in Cisco Jabber for Windows, click “Advanced Settings”, choose “Cisco IM & Presence” for Account Type, select “Use the following server” for Login Server and type FQDN of the home CUP server).
    Note: since Jabber client is not connecting to CUCM’s TFTP to grab its config files, any customized configurations specified in the jabber-config.xml file are not going to apply.
  • Downgrade your CUCM environment to 10.5.2 (I wouldn’t).
  • Upgrade your CUCM environment to version 11.5 (apparently, it has just become available for download on CCO).
    Note, though, that although the latter was suggested by Cisco TAC, this workaround has yet to be verified by yours truly. 

This post will be updated once a formal resolution takes place. I would also expect Cisco TAC to file the bug in it’s Bug Tracker. When they do, I will publish an update with the link to the bug ID.

Hope this helps someone.

Issues with TMSPE after upgrading to TMS version 15.2.1

OK, folks, so you want to keep your Cisco UC systems up-to-date and decided to upgrade your TelePresence Management Suite (and extensions for it) to the latest-and-greatest. You’ve done your due diligence and followed the Install and Upgrade guides and Release Notes for all systems that can be potentially affected (TMS, TMSPE, TMSXE, VCS, etc.) to ensure you cover all your bases in regards to inter-dependencies (there are plenty). However, after the upgrade, you notice a few new alarms on the VCS and TMS. The errors may look something like the following:

On TMS:

"((-1) Importer Error : TypeError('__init__() takes exactly 4 arguments (2 given)',))"

On VCS Control:

"The VCS is unable to communicate with the TMS Provisioning Extension services. Phone book service failures can also occur if TMS does not have any users provisioned against this cluster."

In the event log on the VCS, you would see some additional details:

"...provisioning: Level="ERROR" Detail="Import from TMS Provisioning Extension services failed" Service="device" Status="{"reason": "Importer Error : TypeError('__init__() takes exactly 4 arguments (2 given)',)", "reason_code": -1, "detail": "Traceback (most recent call last):\n File \"/share/python/site-packages/ni/externalmanagerinterface/control/importcontrol.py\", line 766, in run\n File \"/share/python/site-packages/ni/utils/web/restclient.py\", line 345, in send_get\n File \"/share/python/site-packages/ni/utils/web/restclient.py\", line 308, in send_request\n File \"/share/python/site-packages/ni/utils/web/restclient.py\", line 320, in http_request\n File \"/share/python/site-packages/ni/utils/web/httplib2ssl.py\", line 399, in request\n File \"/lib64/python2.7/site-packages/httplib2/__init__.py\", line 1608, in request\n File \"/lib64/python2.7/site-packages/httplib2/__init__.py\", line 1359, in _request\n File \"/lib64/python2.7/site-packages/httplib2/__init__.py\", line 1247, in _auth_from_challenge\n File \"/lib64/python2.7/site-packages/httplib2/__init__.py\", line 523, in __init__\nTypeError: __init__() takes exactly 4 arguments (2 given)\n", "success": false, "error": "InternalServerError"}"

What gives? Well, apparently, there has been a change in the way the TMSPE is authenticating with TMS in the newest version of the suite. Navigate to your TMS server and open IIS Manager. Expand Sites -> Default Web Site; click on ‘tmsagent’, then select ‘Authentication’. Ensure that ‘Digest Authentication’ is disabled.

TMS Agent settings in IIS

If it is enabled, disable it and then restart your web server (iisreset /noforce). Next, verify that Provisioning Extension is operating successfully (you may need to restart TMS Provisioning Extension service).

Hope this helps someone.

CUP Server Recovery – The Proper Way

Howdy!

If you ever happen to be tasked with recovery of CUCM IM & Presence (a newer name for CUP) server, perhaps this post will help you.

Disclaimer: The following recovery process worked for me (and I have a number of successful recoveries under my belt), and while every step has been taken to provide my readers with accurate information, please use your discretion before taking any decisions based on the contents of this post. You may want to validate some or all of the steps with a Cisco TAC engineer.

Step 1: Preserve your existing backups! If you have DRF backups in place, save them by copying the backup files to a safe place. Why? Existing backup copies can be overwritten by newer backup jobs (say, in case the restore process takes you longer than expected and you have selected to keep only a couple of most recent backups when you configured Backup Device in DRS).

Step 2: In CUCM, unassign all users from the existing Presence Group.

Step 3: Delete Presence Group and delete the failed CUP server from System -> Server in CUCM.

Step 4: Add the CUP node back to CUCM with the same name under System -> Server. A default Presence Group is created and the CUP node is added to it – that’s fine.

Step 5: Proceed with a fresh install of the CUP node. Note: the version should match exactly the one of the failed node.
Hint: all CUP ISOs that are available for download on CCO are bootable, so you do not have to use any tricks to turn non-bootable ISO into a bootable one.

Step 6: Proceed with DRS recovery. Now, this is important: you must perform full cluster recovery (restore both CUP and CUCM) from your backup. Why? Well, since the CUP node has been deleted and re-added in steps 3 and 4 above, the CUP server will have a new PKID in CUCM database. If you just recover the CUP node without recovering CUCM database, the node will have a different (old) PKID and thus would no longer match new PKID recorded in CUCM. As a result, certain services will not start in CUP and you will see the following error in CUP: “The IM&P Publisher node was deleted from the CUCM server list. This node needs to be reinstalled.”

Step 7: Once the restore process completes, restart CUCM Pub first (utils system restart), wait for it to come up, then restart CUCM Sub and CUP Pub.

Step 8: Perform typical health checks of your CUCM and CUP nodes:

  • utils dbreplication status, followed by utils dbreplication runtimestate on your CUCM Pub to verify database replication between Pub and Sub nodes;
  • Launch RTMT, connect to CUCM Pub and review the alarms;
  • Perform diagnostics in CUP Pub (Diagnostics – > System Troubleshooter)

That’s it! Hope this helps someone.

How to Dump Install Logs to Serial Port – The Proper Way

Howdy! Those of you are struggling to follow the instructions of dumping the install logs to a serial port of a Cisco UC (CUCM/CUP/CUC) VM using any of the official public guides (see here and here), perhaps this post will help you. Here’s how it works in vSphere 5.x VMware environment:

  1. When the installation fails the first time, the system will prompt you to dump diagnostic information. Select “Yes”:
    prompt_to_dump_diag_info
  2. The installer will further prompt you to connect the serial port before continuing. Hit “Continue”:
    attach_serial_prompt
  3. The installer will confirm that the dumping is complete (no, it ain’t, we know that). Hit “Continue” to halt the VM:
    continue_to_halt
  4. Once the VM shuts down, edit the hardware to add Serial Port:
    • Select “Serial Port” -> Next: 
      add_serial_port_1
    • Select “Output to file” -> Next:
      add_serial_port_2
    • Browse the Datastores to the location you want to save the output to and type the name of the file (can be anything; in my example it is called “serial0.log”):
      add_serial_port_3
    • Accept the default settings (“Connect at power on” and “Yield CPU on poll”) -> Next:
      add_serial_port_4
    • Confirm the additional hardware settings and hit “Finish”:
      add_serial_port_5
  5. Power on the VM that failed the install. The VM will boot and prompt you to dump the installation logs again:
    installation_failed
  6. Confirm that the serial port is attached and proceed with the dump of the diagnostic info to the Serial Port:
    dump_is_complete
  7. Important: Shutdown the VM. If you don’t, you will get the “File operation failed” error when you try to download the dump file:
    file_operation_failed
  8. Once downloaded, use 7-Zip to extract the logs (you will need to “unzip” the logs twice: extract serial0.log to serial0~ file, then extract that file to another folder to reveal the logs):
    contents_of_typical_install_dump

That’s it! Review the install logs (under \common\log\install\install.log) to figure out the problem with your installation or submit the logs to Cisco TAC engineer for analysis and further troubleshooting.

Hope this helps someone.