Monday, April 15, 2024

Edge node vmid not found on NSX manager


 

Hello There,

Recently , we faced an issue in our NSX-T envrironment running with 3.2.x version.

We saw below error message while running the pre-check for NSX upgrade to 4.x version. 

"Edge node 31c2a0ba-e10a-48eb-940d-85f1e48c811f vmId is not found on NSX manager"




So to fix this vmId issue for Edge Nodes, we need to edit the DeploymentUnitInstance and EdgeNodeExternalConfig in NSX Corfu DB tables.

Steps to be done:

1) Login to NSX Manager UI . Start a backup by going to the NSX Manager UI System > Lifecycle Management > Backup and Restore and click on the START BACKUP button.


2) Log into any of NSX Manager node's CLI using the admin and switch to root account and stop the corfu service using the following command:


   /etc/init.d/corfu-server stop

3) Stop the proton service

   /etc/init.d/proton stop

4) Start the corfu service

   /etc/init.d/corfu-server start

5) Execute the following DB commands to make the changes in DeploymentUnitInstance and EdgeNodeExternalConfig tables on the Corfu DB one by one.  


First run the below command to udpate the DeploymentUnitInstance table in corfu DB.

corfu_tool_runner.py -t DeploymentUnitInstance -n nsx -o editTable --keyToEdit '{"uuid": { "left": "12448448404996573115", "right": "10430903042528069173" } }' --newRecord '{"managedResource":{"displayName":"acc1c5de-a099-47bb-90c2-030d486d4635"},"deploymentUnitId":{"uuid":{"left":"4178938188179001301","right":"9995961568062954963"} },"deploymentProgressState":"DEPLOYMENT_PROGRESS_STATE_DEPLOYMENT_SUCCESSFUL","deploymentGoalState":"DEPLOYMENT_GOAL_STATE_ENABLED","runningVersion":"2.5.1.0.0.15314297","errorMessage":"","entityId":vm-654, "uniqueVmExternalId":5015a23f-8cf3-e793-f662-ccb098105a98, "vcDeploymentConfig":{"baseDeploymentConfiguration":{"name":"nsxedg001","computeManagerId":"3070d243-6fae-4487-8e38-4262b9c11785"},"dataStore":"datastore-76","cluster":"domain-c47","memoryReservation":-1,"cpuReservation":-1,"cpuShares":-1,"hostId":"host-106"} }'


Now, run the below command to udpate the EdgeNodeExternalConfig table in corfu DB.


/opt/vmware/bin/corfu_tool_runner.py -t EdgeNodeExternalConfig -n nsx -o editTable --keyToEdit '{"stringId": "/infra/sites/default/enforcement-points/default/edge-transport-node/db86d51c-1a54-408c-9ad0-415613bfd2b1"}' --newRecord '{"managementIp":[{"ipAddress":[{"ipv4":169673490}],"prefixLength":24}],"vmId":{"stringId":"5015a23f-8cf3-e793-f662-ccb098105a98"},"deploymentType":"VIRTUAL_MACHINE","cpu":4,"memory":7962812,"hypervisor":"VMware","managementInterface":"eth0","maintenanceMode":"MAINTENANCE_MODE_DISABLED","searchString":"biosUuid:42153f0f-4783-cc05-1161-f2da9c30c578;macAddress:00:50:56:95:bd:59","pnic":[{"name":"fp-eth2","mac":"00:50:56:95:1d:6f"},{"name":"fp-eth1","mac":"00:50:56:95:5f:ee"},{"name":"fp-eth0","mac":"00:50:56:95:2a:ff"}],"prevPnic":[{"name":"fp-eth2","mac":"00:50:56:95:1d:6f"},{"name":"fp-eth1","mac":"00:50:56:95:5f:ee"},{"name":"fp-eth0","mac":"00:50:56:95:2a:ff"}],"enableSsh":true,"hostname":"nsxedg001.abc.com","ntpServer":["10.89.50.151","10.89.50.102"],"dnsServer":["10.90.22.151","10.90.22.102"],"qatConfig":{"isVm":true,"fipsCompliant":true},"syslogServer":[{"server":"10.35.1.110","port":514,"protocol":"SYSLOG_PROTOCOL_ENUM_UDP","logLevel":"SYSLOG_LEVEL_ENUM_INFO"}]}'


Please note: Please change the details in above commands as per your environment and Node details.


Once the abpve stpes competed successfully, please re-run the pre-checks and vmId warning should be fix. If not then, reboot the NSX managers one by one and then re-run the pre-checks and see.


Note:- the above procedure is critical as we are making the changes in corfu DB. Please take help from VMware support if you are not 100% confident and sure to do this.












Sunday, March 31, 2024

VMware VCF and vSphere Diagnostic tool-VDT




VMware VDT- VCF Diagnostic Tool Overview

VDT (developed and built by VMware Support) is a utility designed to run a series of comprehensive checks live on a target appliance. In its current state, VDT supports the vCenter Server and SDDC Manager appliances.

The VCF Diagnostic Tool (VDT) is a diagnostic tool that is run directly on the SDDC Manager or vCenter server. It runs through a series of checks on the system configuration and reports user-friendly PASS/WARN/FAIL results for known configuration issues. It also provides information (INFO) messages from certain areas which we hope will make detecting inconsistencies easier. The goal of these tests is to provide live diagnostic information to the user about their environment which might otherwise be missed.  

This tool is completely read-only for the entire environment. hence, it will not make any changes to the environment and no risks to use it.

Another important thing about this tool that, it is completely offline and does not reach out to the Internet or any VMware depots for any information. Therefore, it is geared to be run in offline and air-gapped environments, making it a very useful tool for troubleshooting, considering it is often not possible to share logs or screen-share with VMware support.

How to use VCF-VDT on SDDC manager:-

1. Download the latest version of vSphere Diagnostic Tool from the below VMware KB.

     https://kb.vmware.com/s/article/94141

2. Use the file-moving utility of your choice (WinSCP for example) to copy the entire ZIP directory to the /home/vcf/ partition on the SDDC-Manager on which you wish to run it.

3. SSH into the SDDC-Manager and elevate to root with su.

4. Change your directory to the location of the file, and unpackage the vdt zip file:

       cd /home/vcf/
       unzip vdt-<version>.zip
       cd vdt-<version>

5. Run the tool with the command:
      python vdt.py -p sddc_manager

6. You will then be prompted for the administrator@vsphere.local password.

Note:If you are unable to provide the SSO password the script will only run checks that do not require authentication. 


The utility logs to the following directory on the SDDC-Manager.
/var/log/vmware/vcf/vdt/


How to use VDT on vCenter Server:-

1. Download the version of VDT 2.0 compatible with your vCenter version from below VMware KB article.
    
    https://kb.vmware.com/s/article/83896

2. Use the file-moving utility of your choice (WinSCP for example) to copy the entire ZIP directory to /root on the node on which you wish to run it.
  

3. Change your directory to the location of the file, and unpack the zip:
      cd /root/
      unzip vdt-version_number.zip

4.Run the tool with the command:
      python vdt.py

You will be prompted for the password for administrator@sso.domain. Many checks will still run even if credentials are not supplied.


Please send feedback/feature requests to vcf-gcs-sa-vdt.pdl@broadcom.com

Refer below KB articles for more info:

For VCF-https://kb.vmware.com/s/article/94141
For vSphere- https://kb.vmware.com/s/article/83896

DISCLAIMER: This script is currently in its beta release phase.
As such, it may contain bugs, errors, or incomplete features. Please leverage results with caution.

Thursday, March 28, 2024

CVE-2023-48795 Impact of Terrapin SSH Attack

CVE-2023-48795 describes a vulnerability in OpenSSH v9.5 and earlier. This vulnerability, also known as the "Terrapin attack", could allow an attacker to downgrade the security of an SSH connection by manipulating information transferred during the the connection's initial handshake/negotiation sequence. The attacker must have already gained access to the local network, and must be able to both intercept communications and assume the identity of both the recipient and the sender. The CVSS 3.x rating of "Medium" reflects the difficulty in successfully exploiting this vulnerability.

CVE-2023-48795 has since been resolved in OpenSSH v9.6. It's mitigation requires both client and server implementations to be upgraded to this fixed or later version. Additionally, this vulnerability can also be addressed by disabling use of the "ChaCha20-Poly1305" cipher in affected OpenSSH implementations. 

This vulnerbility affects all systems having the openssh installed "Linux and Windows".

For VMware products like NSX Managers, Edge Nodes running on Linux kernel(same as Linux) are also affected by this vulnerbility.

Workaround to fix this on NSX Appliances is remove the affected ciphers from SSH and SSHD config files.

Login to NSX appliances(Managers & Edge nodes) via putty and switch to "root" account.

root@nsxmgr001:~# vi /etc/ssh/ssh_config 

root@nsxmgr001:~# vi /etc/ssh/sshd_config 

and remove the below ciphers from both of these files and save & exit.

# Cipher and MAC algorithms

chacha20-poly1305@openssh.com

hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com

Restart the ssh service then--

root@nsxmgr001:~# /etc/init.d/ssh restart

After removing the vulnerable ciphers & MAC Algorithms, both config files will looks like below:



Plz Note: There is no offically update on this vulnerability from VMware side as of now, so do it own risk.


Refer below documentation for more info.

https://learn.microsoft.com/en-us/answers/questions/1525235/need-solution-to-terrapin-vulnerability-cve-2023-4

https://nvd.nist.gov/vuln/detail/CVE-2023-48795

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-48795

Wednesday, March 13, 2024

VMware vSAN OSA and ESA overview.

 VMware vSAN 8™, Introduces the revolutionary Express Storage Architecture™.  This is an optional, alternative storage architecture to the vSAN original storage architecture also found in vSAN 8.  When running on qualified hardware in approved vSAN ReadyNodes, the vSAN Express storage architecture will offer supreme levels of performance, scalability, resilience, and data services without compromising performance.  The vSAN Express Storage Architecture unlocks the capabilities of modern hardware to allow the workloads of today and tomorrow.


Below are some key differences between OSA and ESA Architecture.


OSA: Original Storage Architecture

1. OSA is a vSAN Distributed File System (vDFS)

2. Drives like SSDs, HDDs, and hybrid supported by OSA.

3. 1 Cache drive per Disk group is supported by OSA.

4. Hardware requirements for OSA is Varies as per vSAN config.

5. With OSA we can get good performance by leveraging different RAID policies with stripping options.

6. OSA is using disk groups with caching devices. It can be configured with various hardware, including spinning disks.

7. OSA can run on vSphere 6.x,7.x and 8.x versions.

8. Minimum 10Gbps network required for host networking.

9. Minimum storage device required to configure OSA is 2.

 

ESA:  Express Storage Architecture

1. ESA is a Log Structured File System (vSAN LFS)

2. Drives Certified with NVMe SSDs are supported by ESA.

3. There are no disk groups, hence no caching drive needed.

4. ESA supports only vSAN ReadyNodes.

5. Performance is excellent with ESA as compared to OSA.

6. ESA uses a storage pool instead of disk groups. This makes it more storage effective.

7. ESA is available from vSphere 8.x version only.

8. Minimum 25GBps network required for host networking.

9. Minimum storage device required to configure ESA is 4.


For more information about VMware vSAN ESA, refer below article.

https://core.vmware.com/blog/comparing-original-storage-architecture-vsan-8-express-storage-architecture


Saturday, January 27, 2024

Update certificate/password on vRA cloud account

Symptoms:-

  • vCenter Server Cloud Account username or password has been changed.
  • An existing endpoint in VMware vRealize Automation (Now VMware Aria Automation) needs to be updated with the new credentials.
  • Credentials validation is successful, but then you see the error:
Failed to connect to vCenter: Error: Cannot login due to incorrect username and password
  • The configuration fails to load and the endpoint cannot be saved.
  • Data collection and provisioning to this endpoint fails due to the invalid credentials.

 

1. Log in to vRA, Right-click anywhere, and click on Inspect.


Click on Network tab and press CTRL+R


3. Scroll a bit down and search for Access-Token Row and click on it and navigate to the Response section.



4. Browse to Swagger API from API Documentation and click on Authorize button.





5. Enter Bearer xxxxxxxxxxxxxxxxxxxxxxxxxxx (here xxxxxxxxxx is the Access Token you copied) and hit Authorize.



6. Post performing this - Swagger is now Authorized to vRA and you can use API calls.

7. Press Ctrl + F and search for "Update vSphere cloud account".



In the body pass the following (with proper inputs for hostname, certificate info, password & username) and execute it:

 
{
  "hostName": " ",
  "certificateInfo": {
    "certificate": " -----BEGIN CERTIFICATE-----\nMIIExxxxxxxxxxxxxxxxxxxxxCwUAMIGkMQswCQYD\nVQQDDAJDQTEXMBUGCgmSJomT8ixkARkWB3Zzxxxxxxxxxxxxxxxxxxxxxxb3JuaWExJjAkBgNV\nBAoMHWNhdmEtNi0wMDEtMTQwLmVuZy52bXdhcmUuY29tMRswGQYDVQQLDBJWTXdh\ncmUgRW5naW5lZXJpbmcwHhcNMjMwNDI4MDAzODUyWhcNMjUwNDI3MTIzODUyWjAk\nMRUwEwYDVQQDDAwxMC4yMjUuMS4xNDAxCzAJBgNVBAYTAlVTMIIBojANBgkqhkiG\n9w0BAQEFAAOCAY8AMIIBigKCAYEAvB1xuJbc9dg5WOzt3+th2/rq/Kku6mmkeaBJ\nCKetYbt21QYLEMJ68GFuU9Q/RCs0DiDCmWR3APYxBbL9Hp7cB6PAMkR5PEoQCaHA\nXXJsw3TFPbU8LVmq/VMibAuNGo++4emfUNGGX2PJm5F1S7sPadODGxxxxxxxxxxxxxxxxxxxxxxxxxxxxO9z+/NuAXnXVJwlA==\n-----END CERTIFICATE-----\n "
  },
  "password": " ",
  "username":" "
  }
 
 
 

Ssh to any vRA node appliance and run the below command to get the vCenter certificate info and grab that to run the API above:

 
openssl s_client -connect <vCenterHostname>:443 2> /dev/null | openssl x509 | awk 'NF {sub(/\r/, "");  printf "%s\\n",$0;}'
 
 
 

Grab the certificate starting from "-----BEGIN CERTIFICATE-----\nM" and till "-----END CERTIFICATE-----\n".

 
 
 

You can get the cloud account id in the URL after %2F 


 


Once, all the info pasted in body, execute the API call.

Go back to vRA cloud account, refresh it and validate the account again. it should be OK now.

For more info, follow KB-https://kb.vmware.com/s/article/88531

Cheers !!



                                                                                                                    


Thursday, September 28, 2023

BGP EVPN Support in NSX-T Data Center

 NSX-T Data Center leverages BGP EVPN technology to interconnect and extend NSX-managed overlay networks to other data center environments not managed by NSX, VXLAN encapsulation is used between NSX TEPs (edge nodes and hypervisors) and external network devices to ensure data plane compatibility.

Two connectivity modes are supported for EVPN implementation in NSX-T Data Center:


Inline Mode:

In this mode, the tier-0 gateway establishes MP-BGP EVPN control plane sessions with external routers to exchange routing information. In the data plane, edge nodes forwards all the traffic exiting the local data center to the data center gateways and incoming traffic from the remote data center to the hypervisors in the local data center. Since the edge nodes are in the data forwarding path, this model is called the Inline model.

""



Route Server Mode:

In this mode, the tier-0 gateway establishes MP-BGP EVPN control plane to exchange routing information with the external router or route reflectors. In the data plane, ESXi hypervisors forward the traffic to external networks either to the data center gateways or remote ToR switches over VXLAN tunnels. TEPs used for the data plane VXLAN encapsulation are the same than the ones used for GENEVE encapsulation.

""



Route Distinguishers and Route Targets in NSX-T Data Center:

With NSX-T Data Center BGP implementation, route distinguishers (RD) can be either set automatically or manually. The following table details the supported RD modes in the Inline and Route Server modes.
ModeAuto RDManual RD

Inline

  • Supported.

  • Only type-1 is supported.

  • You must configure the RD Admin field. The RD Admin field must be in the format of an IP address.

  • The RD admin field is used to fill the Administrator subfield in the RD.

  • The 2-byte Assigned Number subfield will be allocated a random number in the range for each RD generation.

  • Generated auto RD is checked against other manually configured RDs to avoid any duplicates.

  • Supported.

  • Both type-0 and type-1 are allowed, but type-1 is recommended.

  • No RD Admin field is required to be configured.

  • Configure manual RD is checked against other auto RDs to avoid any duplicates.

Route Server

  • Not supported.

  • Supported.

  • Both type-0 and type-1 are allowed, but type-1 is recommended.

  • No RD Admin field is required to be configured.

  • Configured manual RD is checked against other auto RDs to avoid any duplicates.



Limitations and Caveats:

  • NSX supports L3 EVPN by advertising and receiving IP prefixes as EVPN Route Type-5.

  • NSX-T generates a unique route MAC for every NSX Edge VTEP in the EVPN domain. However, there may be other nodes in the network that are not managed by NSX-T, for example, physical routers. You must make sure that the router MACs are unique across all the VTEPs in the EVPN domain.

  • The EVPN feature supports NSX Edge nodes to be either the ingress or the egress of the EVPN virtual tunnel endpoint. If an NSX Edge node receives EVPN Route Type-5 prefixes from its eBGP peer that needs to be redistributed to another eBGP peer, the routes are re-advertised without any change to the next hop.

  • In multi-path network topologies, it is recommended that ECMP is enabled for the NSX BGP EVPN control plane, so that all the possible paths can be advertised by the tier-0 gateway. This will avoid any potential traffic blackhole due to asymmetric data path forwarding.

  • A tier-0 gateway can span across multiple edge nodes. However, specifying a unique route distinguisher for each edge node or TEP (either via auto or manual configuration) is not supported. As a result, the use of ECMP on the peer router is not supported.

  • Route maps are not supported for EVPN address family.

  • Recursive route resolution for gateway IP via default static route is not supported.

Limitations and caveats for Inline mode:

  • Only BGP Graceful Restart in Helper Mode is supported.

  • Only eBGP is supported between tier-0 SRs and external routers.

  • Only one TEP is supported per edge node. The use of loopback interfaces for TEP is highly recommended.

Limitations and caveats for Route Server mode:

  • The High Availability mode on the tier-0 must be set to active-active.

  • The manual Route Distinguisher and manual Route Targets are supported.

  • BGP Graceful Restart, Helper Mode, and Restarted Mode are not supported.

  • Only eBGP is supported between hosted VNFs and tier-0 VRF gateways.

  • eBGP multihop using loopbacks is required between tier-0 SRs and external routers. Using uplinks for eBGP neighbor session is not supported for EVPN Router Server mode operation.

  • The VNF uplink towards the tier-0 SR VRF must be in the same subnet as the Integrated Routing and Bridging (IRB) on the data center gateways.


Monday, September 25, 2023

SDDC manager backup task failed with error "Could not start SDDC Manager backup Backup failed : Unexpected error encountered when processing SDDC Manager Backup"

Today, I am writing this post related to SDDC manager backup task failure with error message "Could not start SDDC Manager backup Backup failed : Unexpected error encountered when processing SDDC Manager Backup". SDDC manager is running with VCF 4.5.1 version. We have configured the SDDC backups on a external SFTP server. Below is the screenshot of backup task failure.
After checking the operations logs under
 /var/log/vmware/vcf/operationsmanager/operationsmanager.logs

we have found below entries related to backup failures that indicating issue with SOS client service and Too many files open in backup task.

 2023-09-25T11:22:34.655+0000 ERROR [vcf_om,99cc8bcd8fbb453b,d881] [c.v.v.b.helper.SosBackupApiClient,http-nio-127.0.0.1-7300-exec-5] An exception 500 INTERNAL SERVER ERROR: "{"arguments":[],"causes":[{"message":"[Errno 24] Too many open files: '/var/log/vmware/vcf/sddc-support/vcf-sos.log'","type":null}],"context":null,"errorCode":"BACKUP_OPERATION_FAILED","message":"Unexpected error encountered when processing SDDC Manager Backup","referenceToken":"","remediationMessage":null}" occurred while making a call to SOS 2023-09-25T11:22:34.655+0000 ERROR [vcf_om,99cc8bcd8fbb453b,d881] [c.v.v.b.helper.SosBackupApiClient,http-nio-127.0.0.1-7300-exec-5] Response body as string {"arguments":[],"causes":[{"message":"[Errno 24] Too many open files: '/var/log/vmware/vcf/sddc-support/vcf-sos.log'","type":null}],"context":null,"errorCode":"BACKUP_OPERATION_FAILED","message":"Unexpected error encountered when processing SDDC Manager Backup","referenceToken":"","remediationMessage":null} 2023-09-25T11:22:34.655+0000 ERROR [vcf_om,99cc8bcd8fbb453b,d881] [c.v.v.b.helper.SosBackupApiClient,http-nio-127.0.0.1-7300-exec-5] Sos client error message Unexpected error encountered when processing SDDC Manager Backup 2023-09-25T11:22:34.655+0000 DEBUG [vcf_om,99cc8bcd8fbb453b,d881] [c.v.e.s.e.h.LocalizableRuntimeExceptionHandler,http-nio-127.0.0.1-7300-exec-5] Processing localizable exception Backup failed : Unexpected error encountered when processing SDDC Manager Backup 2023-09-25T11:22:34.659+0000 ERROR [vcf_om,99cc8bcd8fbb453b,d881] [c.v.e.s.e.h.LocalizableRuntimeExceptionHandler,http-nio-127.0.0.1-7300-exec-5] [CUJHPT] BACKUP_FAILED Backup failed : Unexpected error encountered when processing SDDC Manager Backup >

WORKAROUND:

1. To resolve this issue, we can REBOOT the SDDC manager once.
                                     
                                               OR

2. As this error is related to SOS service then we can execute below command to restart the SOS REST service on SDDC manager.

systemctl restart sosrest

Once, above steps done, trigger the backup again and it will be successful then.

....Cheers !!

Edge node vmid not found on NSX manager

  Hello There, Recently , we faced an issue in our NSX-T envrironment running with 3.2.x version. We saw below error message while running t...