VMware Tools fail to update due to SELinux denials

Another day, another bug. This one has to do with VMTools status showing “Not running (Not installed)” for my Call Manager running 11.5.1.14900-11 following the ESXi hypervisor upgrade to version 6.0.0U3. Attempting to update the VMTools wuth “utils vmtools refresh” command in CLI results in the following error:

Starting VMware Tools upgrade ...
Uninstalling VMware Tools 10.0.6-3560309 ...

Execution of uninstall phase failed

Rebooting the VM and restarting the process did not help. Upon contacting Cisco TAC, it was evident that we are hitting CSCvb67807 bug.  There is no workaround to this, so you need to contact Cisco TAC who has root access to the system to perform manual uninstall and reinstall of VMware Tools.

LDAP Directory Synchronization throws HTTP 500 error

Recently, I came across a case when LDAP Directory Synchronization stopped syncing users in one cluster. When attempting to do a manual sync, the CUCM would throw the following error:

HTTP Status 500
The server encountered an internal error that prevented it from fulfilling this request
exception: BadPaddingException Invalid padding.
note: The Full stack trace of the root cause is available in the logs.

LDAP DirSync HTTP 500

Attempting to make any modifications to the existing LDAP Directory configurations resulted in the same error and the issue appeared both on Pub and Sub nodes. Restarting Tomcat or Cisco DirSync service did not resolve the issue and the logs did not provide any useful information.

Well, there is a fix! Remove and re-create the LDAP Directory configuration(s) with identical settings and you should be good to go.

Hope this helps someone.

Cisco Phone Security: End-to-End Signalling and Media Encryption with SME

Hello folks!

For those of you who are in charge of a large VoIP environment with multiple CUCM clusters, I dedicate this post. This is going to be a multi-part document, since the topic being covered is rather large and I want to be as detailed as possible.

The Environment

Topology Diagram

We are dealing with two CUCM clusters that have SIP trunks to Cisco SME cluster. In reality, the environment is a much larger one, consisting of 12 multi-node CUCM clusters scattered around the globe. I have intentionally simplified the topology to include just three CUCM clusters, with one of them being used as SME.

The Challenge

In this particular case, the client would like to implement end-to-end phone security (signalling and media encryption) on all endpoints that support it. Because the traffic is traversing SME, we need to make sure that the SIP trunks between CUCM and SME clusters are secure. In a traditional two-cluster scenario, all you need to do is to follow this awesome guide by Jason Burns, where we exchange CallManager.pem self-signed certificates between all nodes, configure SIP Trunk Security Profile and off we go. But imagine doing that certificate exchange with 12 multi-node clusters!

The Solution

We are going to use our own Enterprise CA to issue new CallManager certificates for all CUCM clusters and import the Root CA certs only to trust the issuer. Here’s the detailed guide on how to achieve just that.

Part 1: Preparing Enterprise CA and Issuing the Certs

Note: it is assumed that you have all the necessary rights to work with your Windows Server-based Certificate Authority. 

Step 1: Using Certificate Authority Add-In, connect to your Root or Subordinate CA, navigate to ‘Certificate Templates’, right-click and select ‘Manage’:

Certificate Authority - Manage

Step 2: In the ‘Certificate Templates Console’ that will open, right-click on any existing certificate and select ‘Duplicate Template’. When prompted, select ‘Windows Server 2003 Enterprise’ version for the duplicate.

Step 3: In the ‘Properties of New Template’ window, give certificate template a name (e.g. “CallManager”), choose validity period (higher is good, but note that the certificate validity period should be less than of the issuing CA’s), and put a check mark on ‘Publish certificate in Active Directory’ box:

New Certificate Template Properties

Step 4: Under ‘Request Handling’ tab, make sure that ‘Signature and encryption’ is selected for the certificate purpose and the minimum key size is 2048 or greater bits.

Certificate Template Request Handling

Step 5: Under ‘Subject Name’ tab, select the ‘Supply in request’ radio button:

Certificate Template Subject Name

Step 6: Under ‘Extensions’ tab, click on the ‘Edit…’ button and ensure that ‘Client Authentication’ and ‘Server Authentication’ application policies are selected:

Certificate Template Extensions

Step 7: Under ‘Security’ tab, make sure that your user account has the necessary permissions, allowing you to Read, Write, and Enroll certificates using this template.

Step 8: Leave all other values at their default and click “OK” to create the new certificate template. Close the ‘Certificate Template Management’ window and return to the ‘Certification Authority’ console.

Step 9: Back in the ‘Certification Authority’ console, right-click on the “Certificate Templates” and select ‘New’ -> ‘Certificate Template to Issue”. Select the new template that was created in the previous steps (“CallManager”):

Certification Authority - New Template

 

Now you are ready to issue the actual certificate for your CallManager clusters using CA’s web-based AD Certificate Services (https://your-CA-FQDN/certsrv).

Part 2: Requesting, Issuing and Installing CallManager Certificates

The following steps are required to be completed on all CUCM nodes, including the SME ones.

Step 1: Navigate to Cisco Unified OS Administration site of your first cluster’s publisher node (https://CUCM-1/cmplatform).

Step 2: Go to Security -> Certificate Management and click ‘Find’ to display a list of current certificates.

Step 3: To enable SIP trunk encryption, we are going to generate a new certificate request file (CSR) for CallManager certificate type, so click on ‘Generate CSR’, select ‘CallManager’ for the certificate purpose, select ‘Multi-server (SAN)’ for distribution type:

Generate CallManager CSR

Note: for my Multi-Server (SAN) certificates, I typically edit the CN (Common Name) to match the Publisher’s FQDN. Why? This reduces the required number of SANs, which is important if you are using third-party CA that limits the number of alternative names for the cert.

Step 4:  Download the newly-generated CSR, open it in notepad and copy the generated Base-64-encoded certificate request.

Step 5: Navigate to your CA’s Active Directory Certificate Services web-based UI (https://FQDN-of-your-CA/certsrv/), click on “Request a certificate” -> “Advanced certificate request” and paste the certificate request in the textbox. Select “CallManager” certificate template that was created in Part 1 of this guide and then click “Submit >”:

Submit Certificate Request

Step 6: Once the certificate has been generated, download it in Base-64-Encoded format.

Step 7: Back to CA AD Certificate Services Web GUI, click on “Home” link in the upper-right corner to return to the main page and click on “Download a CA certificate, certificate chain, or CRL” link. Select the current CA certificate, and ‘Base 64’ for the encoding method, then click “Download CA certificate”.

Download CA Certificate

Important: If the certificate has been issued by your subordinate CA, you need to separate your Root CA certificate from Subordinate CA certificate. Here’s how:

  1. Open the CA certificate that was downloaded in Step 7 above and navigate to “Certification Path” tab.
  2. Select the “Root CA for [yourdomain]”, then click “View Certificate”:
    Viewing Root CA certificate
  3. In the new ‘Certificate’ window that will open, click on “Details” tab and then click “Copy to File…” button that would open Certificate Export Wizard.
  4. In the ‘Certificate Export Wizard’, click “Next” -> select “Base-64 encoded X.509 (.CER)” format and provide a path to save the file.

Step 8: Back to your CallManager’s OS Administration page, click on “Upload Certificate/Certificate Chain”.

  1. Upload the Root CA certificate as “CallManager-trust” type.
  2. If applicable, upload the Subordinate CA certificate as “CallManager-trust” type.
  3. Upload the CA-generated certificate as “CallManager” certificate.

Step 9: You will need to restart Cisco TFTP and CallManager services under Cisco Unified Serviceability page on all CallManager nodes in the cluster for the new certificate to take effect. Hold on to that just for now.

Part 3: Switching the cluster to Mixed-Mode

For the encryption to work on CallManager endpoints and trunks, you need to ensure that your CUCM clusters are switched from the default “Non-secure” mode to “Mixed-mode”. First, verify the cluster mode on all of your CallManager clusters by navigating to System -> Enterprise Parameters -> ‘Cluster Security Mode’:

Verifying cluster security mode

If the value is “0”, then the cluster is in “Non-secure” mode and need to be switched to “Mixed-mode” by following these steps.

Step 1: Open an SSH session with your CallManager Publisher in Cluster 1.

Step 2: Issue “utils ctl set-cluster mixed-mode” command:

admin: utils ctl set-cluster mixed-mode
This operation will set the cluster to Mixed mode. Do you want to continue? (y/n): y
Moving Cluster to Mixed Mode
Cluster set to Mixed Mode
Please Restart Cisco Tftp, Cisco CallManager and Cisco CTIManager services on all nodes in the cluster that run these services.

Step 3: Restart Cisco TFTP, Cisco CallManager and Cisco CTI Manager on all nodes in the cluster.

Important: If your cluster was already in Mixed-mode, you need to regenerate CTL certificates after replacing CallManager certificates on your CallManager cluster that we did in Part 2.

admin:utils ctl update CTLFile 
This operation will update the CTLFile. Do you want to continue? (y/n): y

Updating CTL file
CTL file Updated
Please Restart the TFTP and Cisco CallManager services on all nodes in the cluster that run these services

If you are using Cisco Jabber in your environment and you omit the above step, the first indication that something went wrong after CallManager certificate replacement would be your Jabber’s phone services not working for any device types (CSF, TCT, etc.). If you review the jabber.log in Jabber’s PRT report, you may see the following errors:
2016-09-09 09:39:07,736 ERROR [0x00001e14] [ice\TelephonyAdapterServerHealth.cpp(66)] [jcf.tel.adapter] [CSFUnified::TelephonyAdapter::getConnectionIpProtocol] - No connected ConnectionInfo of type: [eSIP]. Could not determine connection IP Protocol
2016-09-09 09:39:07,736 DEBUG [0x00001e14] [\impl\TelephonyServerHealthImpl.cpp(279)] [jcf.tel.health] [CSFUnified::TelephonyServerHealthImpl::updateHealth] - updating health with serverType [CucmSoftphone] serverHealthStatus [Unhealthy] serverConnectionStatus [Disconnected] serverAddress [CUCM1.domain.com (CCMCIP)] serviceEventCode [UnknownConnectionError] transportProtocol [SIP] ipProtocol [Unknown]

This is fixed by regenerating CTL files and restarting TFTP and CallManager cervices on all nodes in the cluster.

We shall continue the setup with Part 4 in the next post. Stay tuned!

How to Dump Install Logs to Serial Port – The Proper Way

Howdy! Those of you are struggling to follow the instructions of dumping the install logs to a serial port of a Cisco UC (CUCM/CUP/CUC) VM using any of the official public guides (see here and here), perhaps this post will help you. Here’s how it works in vSphere 5.x VMware environment:

  1. When the installation fails the first time, the system will prompt you to dump diagnostic information. Select “Yes”:
    prompt_to_dump_diag_info
  2. The installer will further prompt you to connect the serial port before continuing. Hit “Continue”:
    attach_serial_prompt
  3. The installer will confirm that the dumping is complete (no, it ain’t, we know that). Hit “Continue” to halt the VM:
    continue_to_halt
  4. Once the VM shuts down, edit the hardware to add Serial Port:
    • Select “Serial Port” -> Next: 
      add_serial_port_1
    • Select “Output to file” -> Next:
      add_serial_port_2
    • Browse the Datastores to the location you want to save the output to and type the name of the file (can be anything; in my example it is called “serial0.log”):
      add_serial_port_3
    • Accept the default settings (“Connect at power on” and “Yield CPU on poll”) -> Next:
      add_serial_port_4
    • Confirm the additional hardware settings and hit “Finish”:
      add_serial_port_5
  5. Power on the VM that failed the install. The VM will boot and prompt you to dump the installation logs again:
    installation_failed
  6. Confirm that the serial port is attached and proceed with the dump of the diagnostic info to the Serial Port:
    dump_is_complete
  7. Important: Shutdown the VM. If you don’t, you will get the “File operation failed” error when you try to download the dump file:
    file_operation_failed
  8. Once downloaded, use 7-Zip to extract the logs (you will need to “unzip” the logs twice: extract serial0.log to serial0~ file, then extract that file to another folder to reveal the logs):
    contents_of_typical_install_dump

That’s it! Review the install logs (under \common\log\install\install.log) to figure out the problem with your installation or submit the logs to Cisco TAC engineer for analysis and further troubleshooting.

Hope this helps someone.

Two Issues, One Fix: SELinux Enforced vs. Permissive Mode

So I’m performing an upgrade of yet another CUCM and CUC clusters from 10.5.2 to 11.0.1 (my fifth this month!) and I get two separate issues post upgrade:

Issue #1: High CPU utilization on both Pub and Sub nodes post-upgrade to 11.0.1. The command “show process using-most cpu” shows “/usr/bin/python -Es /usr/sbin/setroubleshootd -f” process as using most CPU:

admin:show process using-most cpu
PCPU PID CPU NICE STATE CPUTIME ARGS
%CPU PID CPU NI S TIME COMMAND
64.5 18193 - 0 S 00:23:06 /usr/bin/python -Es /usr/sbin/setroubleshootd -f

Issue #2: VMware Tools are shown as “Not Running (Not Installed)” for one of the Unity Connection nodes. Re-installation of VMware Tools using any of the acceptable methods has no effect.

So what’s the fix?

Fix for Issue #1: Change the SELinux mode to permissive (utils os secure permissive) on all affected nodes (Note: this can also apply to Cisco Unity Connection appliance).

Fix for Issue #2: Change the SELinux mode to permissive (utils os secure permissive) and reinstall VMware Tools (utils vmtools refresh).
Note: DO NOT change security back to ‘enforced’ if you are running VMware Tools 10.0 or higher. Read more about this issue on Cisco’s Bug Search Tool: https://bst.cloudapps.cisco.com/bugsearch/bug/CSCux90747.

So what is SELinux anyway? Since Cisco UC servers are built on RHEL6, it’s best to turn to documentation from the source: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Security-Enhanced_Linux/index.html.

By changing the SELinux mode from the default ‘enforced’ to ‘permissive’, you are not disabling SELinux, but rather instruct SELinux to log rather than block access to files and/or processes.

Hope this helps someone.

OpenSSL January 2016 Vulnerability

Cisco has released a patch for OpenSSL January 2016 vulnerability that is described in CVE-2016-0701 and also on Cisco’s Bug Tracker: CSCuy07473. The patch ciscocm.ciscossl_11.0.1-v5_4_3.k3.cop.sgn comes in a form of a COP file can be downloaded off CCO for version 11.0(1) or 10.5(2). No reboot is required after applying the patch, but installation after business hours is recommended. For more information, please consult the published Release Notes for version 11.0(1) or 10.5(2) for this update.

CUP Publisher Installation Stalls/Fails

It looks like I’m hitting another Cisco bug, this time at the installation of CUP Publisher node for a recently recovered CUCM cluster of two nodes. The installation from a bootable ISO goes smooth for the most part of the process, but then stalls with no error right after completing Security Configuration (which is a step right after Call Manager Connectivity Validation):

CUP_Error

This bug is described in CSCuv74715. First, CUP installer will try to buy some time:

CUP_Time

Then, after you you click “More Time” it will run the validation tests for about 20 minutes, ultimately resulting in the blank Error page as seen above.

Problem is, there is no version 10.5(2.230000.1) of CUCM or CUP available on CCO.  So the solution is to obtain the latest available build of the CUP for the same version (10.5.2 in this case) from Cisco TAC (or create your own bootable iso from non-bootable ISO that can be downloaded from CCO) and retry the installation process.

UPDATE: I will save you some time: you do not need to download a new ISO if the only installer you have available is 10.5.2.20000-1. The error message that should have appeared as the 10.5.2.20000-1 installer stalls is as follows:

CUP_NTP

In my case, the NTP synchronization on CUCM Publisher had status of “unsynchronised, time server re-starting”. In this particular environment, the NTP reference servers were Domain Controllers (so Windows-based). Cisco UC VMs are quite finicky when it comes to synchronizing time with Windows-based NTP servers; the solution is to point the CUCM Publisher to a Linux-based NTP server.

In order to change the NTP server references for CUCM Pub, SSH to it and first confirm that NTP is operational by issuing “utils ntp status” command (just like it says in the error message). If the status is anything other than “synchronised to NTP server (xxx.xxx.xxx.xxx) at stratum x”, try to restart the NTP by issuing “utils ntp restart” command and checking the status again. If you do have a Linux-based NTP server, you can change the NTP server reference for Pub by doing the following:

  1. Add the new NTP server by issuing “utils ntp server add xxx.xxx.xxx.xxx” command
  2. Remove old NTP server(s) by issuing “utils ntp server detele” command and selecting the old NTP server(s) at the prompt.
  3. Confirm the NTP service restart when prompted.
  4. Check the NTP status again by issuing “utils ntp status” command.

Once the NTP status shows synchronized, proceed with CUP server installation.

Hope this helps someone.

CUCM 10.5: L2 Updates May Get Stuck When Optional Email Notification Is Selected

So I encountered another issue today: I was updating an existing 10.5 CUCM cluster with a new patch (10.5.2.10000-5) and the process would get stuck after a few minutes of the upgrade initiation. The last line that I would see in the console and the installation log file is:

upgrade_manager.sh|Upgrade (L2) Starting|

CUCM L2 Update stuck

What should follow next? Correct: an [optional] email from “ucs-installer@cisco.com” with a subject line “Upgrade (L2) CallManager 10.5.2.10000-5 Started — <node name>” should follow. It never did. Turns out, our SMTP Relay was not permitting relay from the CallManager’s IP address, so the installation could not proceed. Workaround: either opt-out of the notification email or fix your SMTP relay!

Hope this helps someone.

CUCM 10.5.1: CSR SAN and Certificate SAN Mismatch

I’ve been lucky to hit another bug today. Brand-new deployment of CUCM/CUC/CUPS version 10.5.1 and I’m unable to upload a freshly-generated SAN certificate from Starfield. I would get the following error: “CSR SAN and Certificate SAN does not match”.

CSR/Certificate SAN Mismatch

Originally, I thought the issue is a result of the CA inserting a www-prefixed name as one of the SANs in the cert (e.g. www.common_name.domain.com). So I have manually added the www-prefixed name in the CSR and re-keyed the cert. No luck. After multiple retries, I gave up and opened a TAC case. I’m glad I did, because apparently I hit another bug. The reason why CUCM can’t match the certificates’ SANs against CSR is because the hostnames are all in UPPER case, while the cert is issued for hostnames names in lower case.

The bug affects systems running version 10.5.1.10000-7 and is fixed in newer releases of CUCM, but I was given a link to download an ES (Engineering Special) version that is almost guaranteed to work.

Hope this helps someone who has been beating his/her head against the wall trying to figure this one out.

 

 

Bash Environment Variable Patch for UCM versions 8, 9 and 10

Update: The patch is also applicable to Cisco Unity Connection versions 8.5.1 and up. I have updated the post to reflect this information.

With yet another vulnerability that has become public in the recent week, vendors are scrambling to issue security patches for their systems. Cisco is no exception here, and that’s a good thing. On October 1st Cisco has released bash environment patch for CUCM/CUC versions 8, 9 and 10 to protect these systems from Shellshock bug. All future software updates for CallManager/Unity Connection versions that have not reached E-O-M will be released with the patch included. But for now, affected customers should download and install ciscocm.bashupgrade.cop.sgn available on CCO under Products > Unified Communications Call Control Cisco Unified Communications Manager (CallManager) > Cisco Unified Communications Manager Version x.x > Unified Communications Manager / CallManager / Cisco Unity Connection Utilities-COP-Files.

The update does not require system reboot, but Cisco advises to make a backup copy just in case. Be sure to check patch installation instructions and you may also want to review the CSCur00930 (CUCM) and CSCur05328 (CUC) on the Bug Tracker for more information.

Stay safe!