Jabber SSO: “Invalid SAML response” on logon.

I came across an interesting issue with Jabber shortly after implementing a Single Sign-On for one of the clusters. Upon launching Jabber, the following message would appear:

“Invalid SAML response. This may be caused when time is out of sync between the Cisco Unified Communications Manager and IDP servers. Please verify the NTP configuration on both servers. Run “utils ntp status” from the CLI to check this status on Cisco Unified Communications Manager.”

Jabber SSO Invalid SAML response

Naturally, one would follow the instructions and verify that time between the CUCM Pub and IdP (in this case, Microsoft ADFS) is in sync. In this case they were indeed. Time to collect some relevant logs and dig deeper.

From the Troubleshooting section of the SSO Configuration Guide, we learn that in order to get any meaningful logs on SSO, we need to set the SAML logs to “debug” level by executing set samltrace level debug command in the CUCM Pub’s CLI. Once the issue is re-created, launch RTMT and download “Cisco SSO” logs, just the like guide tells.

In the logs,

2018-03-14 12:33:40,009 DEBUG [http-bio-443-exec-16] fappend.SamlLogger - SPACSUtils.getResponse: got response=<samlp:Response xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol" ID="_be693914-03a7-4899-8a88-49443bda3ef9" InResponseTo="s21a8227234a9c9f26a73f8627a768ee197584baa4" Version="2.0" IssueInstant="2018-03-14T12:33:39Z" Destination="https://CUCM-Pub-FQDN:8443/ssosp/saml/SSO/alias/CUCM-Pub-FQDN" Consent="urn:oasis:names:tc:SAML:2.0:consent:unspecified"><saml:Issuer xmlns:saml="urn:oasis:names:tc:SAML:2.0:assertion">http://IdP-URL/adfs/services/trust</saml:Issuer><samlp:Status xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol">
<samlp:StatusCode xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol"
Value="urn:oasis:names:tc:SAML:2.0:status:Success">

If you look careful enough, you may notice that the response from the IdP to CUCM Pub comes 1 second after it is expected.

So how do you fix this? Turns out there is a way to configure Microsoft ADFS to allow for a small time skew. Here’s the procedure:

  1. Login to your primary ADFS server with sufficient permissions.
  2. Launch Windows PowerShell with elevated privileges (as Administrator).
  3. Retrieve the list of configured relaying party trusts to find one that is related to your CUCM:
    Get-AdfsRelyingPartyTrust | select *identifier*
  4. Get details on the allowable time skew for your CUCM:
    Get-AdfsRelyingPartyTrust -Identifier CUCM-Pub-FQDN | select *identifier*, *skew*
    
    Identifier NotBeforeSkew
    ---------- -------------
    {CUCM-Pub-FQDN} 0
  5. Set the “notbeforeskew” to 1, which would allow the request to come up to 1 minute earlier than expected on ADFS:
    Set-AdfsRelyingPartyTrust -TargetIdentifier "CUCM-Pub-FQDN" -NotBeforeSkew 1
  6. Repeat steps 4 and 5 on other Cisco UC nodes that are configured for SSO (e.g. CUC servers).

Now test. If authentication works, the job is done. Hope this helps someone.

P.S. Don’t forget to set the SAML trace level back to its default (INFO) by issuing command set samltrace level info in CUCM Pub’s CLI.

Jabber for Windows: “Cannot communicate with the server”

Hello folks!

I was delaying this post for a while, hoping to find a resolution to the issue that I’ve been working on for over a month now. This is a somewhat unique case which may not be experienced by many Cisco customers, but there is a chance that there are others that are hitting the same defect. Below is a quick overview of the environment, the description of the actual problem and current workarounds as discovered through independent troubleshooting processes and through Cisco TAC.

Overview/Conditions:

  • The client is a multinational company with presence at some very remote locations. The internal communication between different sites is within MPLS with highly heterogeneous connectivity (a whole variety of fiber, copper, microwave and satellite communications).
  • Due to such a vastly distributed environment with varying network latencies, there are little opportunities to centralize call processing to a few regional clusters. Hence, the client has a number of CUCM clusters with some being in a very close geographic proximity to one another. This is especially true for one region where the only form of communication is via high-latency satellite connection.
  • Cisco Jabber is used throughout the organization, so each CUCM cluster would have a CUP server (or two) to support IM & Presence capabilities. The ILS is used for Inter-cluster lookup and a centralized UDS for user directory. (BTW, if anyone is interested in seeing a separate post on an end-to-end configuration of a multi-cluster environment (with MRA!) to support Jabber – please drop me a line in the comments section).
  • Majority of Cisco Jabber clients are running version 11.0 and above and all CUCM clusters were recently upgraded to version 11.0.1. Prior to upgrading to CUCM version 11.0.1, the client was running version 10.5.2.

Problem Description:

  • The issue affects users who are located in remote areas where communication between the site and the rest of the corporate network is happening over high-latency satellite link. Locally, Cisco Jabber users connect to their home clusters just fine. When a user with working Cisco Jabber travels to another remote location and tries to connect, the client shows the all-too-common “Cannot communicate with the server” error.
  • It has been observed that the maximum allowable latency between Cisco Jabber client and the user’s Home Cluster is somewhere between 600-700 ms (round trip delay). With latency of 1000 ms or more, Jabber does not connect with the above “Cannot communicate with the server” error.
  • The PRT may show the following errors:
    • 2016-06-21 07:18:26,226 INFO  [0x000018ac] [ls\src\http\BasicHttpClientImpl.cpp(448)] [csf.httpclient] [csf::http::executeImpl] – *—–* HTTP response code 0 for request #21 to https://cucm.example.com:6972/CSFdevice.cnf.xml
    • 2016-06-21 07:18:26,345 WARN  [0x000018ac] [mpl\ucm-config\tftp\TftpFileSet.cpp(113)] [csf.config] [csf::ucm90::TftpFileSet::fetchInitialTftpFile] – Failed to connect to Tftp server : result : UNKNOWN_ERROR
      Note how the above reveals that Jabber client is requesting a configuration file from TFTP using port 6972 rather than 6970. This change was introduced wtih CUCM version 11.x and Jabber 11.x. 

Problem Resolution/Workaround:

Currently, there is no solution to resolve this issue, but as always with Cisco, there are workarounds:

  • Downgrade affected Cisco Jabber clients to any 10.x version (e.g. the latest build for version 10.6 that is currently offered on CCO is 10.6(7)). Prior to version 11.x, Jabber was using port 6970 to grab the configuration file off TFTP server. CUCM 11.0 is backward compatible with older versions of Cisco Jabber clients and would allow Jabber to connect on that port. Don’t ask me how the difference in port for the same service (TFTP) could alter the Cisco Jabber’s behaviour, but this workaround actually works.
  • If users who experience the problem do not care about phone services and just want IM & Presence functionality to be working, provide instructions on how to connect Cisco Jabber to the CUP Server manually (in Cisco Jabber for Windows, click “Advanced Settings”, choose “Cisco IM & Presence” for Account Type, select “Use the following server” for Login Server and type FQDN of the home CUP server).
    Note: since Jabber client is not connecting to CUCM’s TFTP to grab its config files, any customized configurations specified in the jabber-config.xml file are not going to apply.
  • Downgrade your CUCM environment to 10.5.2 (I wouldn’t).
  • Upgrade your CUCM environment to version 11.5 (apparently, it has just become available for download on CCO).
    Note, though, that although the latter was suggested by Cisco TAC, this workaround has yet to be verified by yours truly. 

This post will be updated once a formal resolution takes place. I would also expect Cisco TAC to file the bug in it’s Bug Tracker. When they do, I will publish an update with the link to the bug ID.

Hope this helps someone.