Kubernetes runs your workload by placing containers into Pods to run on Nodes.A node may be a virtual or physical machine, depending on the cluster. Each nodeis managed by thecontrol planeand contains the services necessary to runPods.
After you create a Node object,or the kubelet on a node self-registers, the control plane checks whether the new Node object isvalid. For example, if you try to create a Node from the following JSON manifest:
node
Kubernetes creates a Node object internally (the representation). Kubernetes checksthat a kubelet has registered to the API server that matches the metadata.namefield of the Node. If the node is healthy (i.e. all necessary services are running),then it is eligible to run a Pod. Otherwise, that node is ignored for any cluster activityuntil it becomes healthy.
The name identifies a Node. Two Nodescannot have the same name at the same time. Kubernetes also assumes that a resource with the samename is the same object. In case of a Node, it is implicitly assumed that an instance using thesame name will have the same state (e.g. network settings, root disk contents)and attributes like node labels. This may lead toinconsistencies if an instance was modified without changing its name. If the Node needs to bereplaced or updated significantly, the existing Node object needs to be removed from API serverfirst and re-added after the update.
As mentioned in the Node name uniqueness section,when Node configuration needs to be updated, it is a good practice to re-registerthe node with the API server. For example, if the kubelet being restarted withthe new set of --node-labels, but the same Node name is used, the change willnot take an effect, as labels are being set on the Node registration.
Marking a node as unschedulable prevents the scheduler from placing new pods ontothat Node but does not affect existing Pods on the Node. This is useful as apreparatory step before a node reboot or other maintenance.
If the status of the Ready condition remains Unknown or False for longerthan the pod-eviction-timeout (an argument passed to thekube-controller-manager), then the node controller triggersAPI-initiated evictionfor all Pods assigned to that node. The default eviction timeout duration isfive minutes.In some cases when the node is unreachable, the API server is unable to communicatewith the kubelet on the node. The decision to delete the pods cannot be communicated tothe kubelet until communication with the API server is re-established. In the meantime,the pods that are scheduled for deletion may continue to run on the partitioned node.
The node controller does not force delete pods until it is confirmed that they have stoppedrunning in the cluster. You can see the pods that might be running on an unreachable node asbeing in the Terminating or Unknown state. In cases where Kubernetes cannot deduce from theunderlying infrastructure if a node has permanently left a cluster, the cluster administratormay need to delete the node object by hand. Deleting the node object from Kubernetes causesall the Pod objects running on the node to be deleted from the API server and frees up theirnames.
When problems occur on nodes, the Kubernetes control plane automatically createstaints that match the conditionsaffecting the node.The scheduler takes the Node's taints into consideration when assigning a Pod to a Node.Pods can also have tolerations that letthem run on a Node even though it has a specific taint.
Describes general information about the node, such as kernel version, Kubernetesversion (kubelet and kube-proxy version), container runtime details, and whichoperating system the node uses.The kubelet gathers this information from the node and publishes it intothe Kubernetes API.
The second is keeping the node controller's internal list of nodes up to date withthe cloud provider's list of available machines. When running in a cloudenvironment and whenever a node is unhealthy, the node controller asks the cloudprovider if the VM for that node is still available. If not, the nodecontroller deletes the node from its list of nodes.
The node eviction behavior changes when a node in a given availability zonebecomes unhealthy. The node controller checks what percentage of nodes in the zoneare unhealthy (the Ready condition is Unknown or False) atthe same time:
A key reason for spreading your nodes across availability zones is so that theworkload can be shifted to healthy zones when one entire zone goes down.Therefore, if all nodes in a zone are unhealthy, then the node controller evicts atthe normal rate of --node-eviction-rate. The corner case is when all zones arecompletely unhealthy (none of the nodes in the cluster are healthy). In such acase, the node controller assumes that there is some problem with connectivitybetween the control plane and the nodes, and doesn't perform any evictions.(If there has been an outage and some nodes reappear, the node controller doesevict pods from the remaining nodes that are unhealthy or unreachable).
The node controller is also responsible for evicting pods running on nodes withNoExecute taints, unless those pods tolerate that taint.The node controller also adds taintscorresponding to node problems like node unreachable or not ready. This meansthat the scheduler won't place Pods onto unhealthy nodes.
Node objects track information about the Node's resource capacity: for example, the amountof memory available and the number of CPUs.Nodes that self register report their capacity duringregistration. If you manually add a Node, thenyou need to set the node's capacity information when you add it.
The Kubernetes scheduler ensures thatthere are enough resources for all the Pods on a Node. The scheduler checks that the sumof the requests of containers on the node is no greater than the node's capacity.That sum of requests includes all containers managed by the kubelet, but excludes anycontainers started directly by the container runtime, and also excludes anyprocesses running outside of the kubelet's control.
Note that by default, both configuration options described below,shutdownGracePeriod and shutdownGracePeriodCriticalPods are set to zero,thus not activating the graceful node shutdown functionality.To activate the feature, the two kubelet config settings should be configured appropriately andset to non-zero values.
For example, if shutdownGracePeriod=30s, andshutdownGracePeriodCriticalPods=10s, kubelet will delay the node shutdown by30 seconds. During the shutdown, the first 20 (30-10) seconds would be reservedfor gracefully terminating normal pods, and the last 10 seconds would bereserved for terminating critical pods.
When pods were evicted during the graceful node shutdown, they are marked as shutdown.Running kubectl get pods shows the status of the evicted pods as Terminated.And kubectl describe pod indicates that the pod was evicted because of node shutdown:
To provide more flexibility during graceful node shutdown around the orderingof pods during shutdown, graceful node shutdown honors the PriorityClass forPods, provided that you enabled this feature in your cluster. The featureallows cluster administers to explicitly define the ordering of podsduring graceful node shutdown based onpriority classes.
When graceful node shutdown honors pod priorities, this makes it possible to dograceful node shutdown in multiple phases, each phase shutting down aparticular priority class of pods. The kubelet can be configured with the exactphases and shutdown time per phase.
A node shutdown action may not be detected by kubelet's Node Shutdown Manager,either because the command does not trigger the inhibitor locks mechanism used bykubelet or because of a user error, i.e., the ShutdownGracePeriod andShutdownGracePeriodCriticalPods are not configured properly. Please refer to abovesection Graceful Node Shutdown for more details.
When a node is shutdown but not detected by kubelet's Node Shutdown Manager, the podsthat are part of a StatefulSet will be stuck in terminating status onthe shutdown node and cannot move to a new running node. This is because kubelet onthe shutdown node is not available to delete the pods so the StatefulSet cannotcreate a new pod with the same name. If there are volumes used by the pods, theVolumeAttachments will not be deleted from the original shutdown node so the volumesused by these pods cannot be attached to a new running node. As a result, theapplication running on the StatefulSet cannot function properly. If the originalshutdown node comes up, the pods will be deleted by kubelet and new pods will becreated on a different running node. If the original shutdown node does not come up,these pods will be stuck in terminating status on the shutdown node forever.
To mitigate the above situation, a user can manually add the taint node.kubernetes.io/out-of-service with either NoExecuteor NoSchedule effect to a Node marking it out-of-service.If the NodeOutOfServiceVolumeDetachfeature gateis enabled on kube-controller-manager, and a Node is marked out-of-service with this taint, thepods on the node will be forcefully deleted if there are no matching tolerations on it and volumedetach operations for the pods terminating on the node will happen immediately. This allows thePods on the out-of-service node to recover quickly on a different node.
Prior to Kubernetes 1.22, nodes did not support the use of swap memory, and akubelet would by default fail to start if swap was detected on a node. In 1.22onwards, swap memory support can be enabled on a per-node basis.
Node.js provides a way to create "add-ons" via a C-based API called N-API, which can be used to produce loadable (importable) .node modules from source code written in C/C++.[77] The modules can be directly loaded into memory and executed from within JS environment as simple CommonJS modules. The implementation of the N-API relies on internal C/C++ Node.js and V8 objects requiring users to import (#include) Node.js specific headers into their native source code.[77] 2ff7e9595c
Comments