Skip to content

Identifying component leaders of TKGI components

+++ author = "Shubham Sharma" title = "Identifying component leaders of TKGI components" menuTitle = "Identifying component leaders of TKGI components" date = "2022-08-18" description = "How to identify component leaders of TKGI components" series = ["TKGI"] +++

There are multiple components in TKGI which operate in a leader/follower mode. In this high availability pattern, the leader is the entry point of requests and is responsible for coordinating tasks with the followers. The components that fall into this category are

  • Etcd
  • NCP
  • Kubernetes controller manager
  • Kubernetes Scheduler
  • CSI Components

In a multi-control plane and worker node environment, if you want to monitor the activity or log of these components tracking down the leader can be tricky. The steps in this post explain how you can track the leader easily.

For the below components leader election uses lease API from the API group to identify the leading replica and continuously renew it based on the timestamps monitored by Lease Duration Seconds

  • Kubernetes controller manager
  • Kubernetes Scheduler
  • CSI Components

The leader can be identified using the steps below.

Identify the leaseholder

kubectl get -A | grep -v node

NAMESPACE           NAME                                              HOLDER                                                                      AGE
kube-system         kube-controller-manager                           ad975454-1101-4a24-b2fa-25705d3b9dc0_faf633cc-0d5a-4b8a-ba45-c85bbbd50024   127m
kube-system         kube-scheduler                                    ad975454-1101-4a24-b2fa-25705d3b9dc0_8109191c-1eb4-4d13-967b-1735e19086fb   127m
vmware-system-csi   csi-vsphere-vmware-com                            ad975454-1101-4a24-b2fa-25705d3b9dc0                                        127m
vmware-system-csi   external-attacher-leader-csi-vsphere-vmware-com   ad975454-1101-4a24-b2fa-25705d3b9dc0                                        127m
vmware-system-csi   external-resizer-csi-vsphere-vmware-com           ad975454-1101-4a24-b2fa-25705d3b9dc0                                        127m
vmware-system-csi   vsphere-syncer                                    ad975454-1101-4a24-b2fa-25705d3b9dc0                                        127m
  • The names in the Holder column are the nodes that are holding the lease. These holder names do not correspond to the Kubernetes node names though. The holder names are bosh deployed VMs hostnames.
bosh -d service-instance_aeec33f2-0c07-444f-a20e-3648d3ac18ed ssh master hostname | egrep -v 'subject|to|use'

master/a2cb06fc-c6d2-477c-bdfb-6212591b38c6: stdout | 6e2aa260-2ec5-4537-9133-46192d858a3b
master/31c0f1f6-2104-4479-a4e3-39ed63aadc5c: stdout | f8ad35c5-198d-46c8-bdb7-bbf610b81329
master/9ddb3dfe-a988-4249-a2e7-0ba1ec0ac47b: stdout | ad975454-1101-4a24-b2fa-25705d3b9dc0
  • As clear from the output above all the leases in this environment are held by a node with hostname ad975454-1101-4a24-b2fa-25705d3b9dc0 which is master/9ddb3dfe-a988-4249-a2e7-0ba1ec0ac47b
  • This means the replica running on this node will have the leader for these components. You can bosh ssh to this node to monitor and check out the logs.

Identifying ETCD leader

  • The below command gives us the etcd leader which is master/9ddb3dfe-a988-4249-a2e7-0ba1ec0ac47b as well
bosh -d service-instance_aeec33f2-0c07-444f-a20e-3648d3ac18ed ssh master/0 "ETCDCTL_API=3 /var/vcap/jobs/etcd/bin/etcdctl endpoint status" | egrep -v 'subject|to|use' | grep true

master/9ddb3dfe-a988-4249-a2e7-0ba1ec0ac47b: stdout | https://master-0.etcd.cfcr.internal:2379, 17f206fd866fdab2, 3.5.4, 5.5 MB, true, false, 4, 28536, 28536,

Identify NCP master

bosh -d service-instance_aeec33f2-0c07-444f-a20e-3648d3ac18ed ssh master "sudo /var/vcap/jobs/ncp/bin/nsxcli -c get ncp-master status" | egrep -v 'subject|to|use' | grep "This instance is the NCP master"

master/31c0f1f6-2104-4479-a4e3-39ed63aadc5c: stdout | This instance is the NCP master