NodeAffinity:Node亲和性

需要在1.6.x以上的版本才能使用

  • RequiredDuringSchedulingIgnoredDuringExecution:必须满足指定的规则才可以调度POD到Node 上;硬限制

  • PreferredDuringSchedulingIgnoredDuringExecution:强调优先满足指定规则,调度器会尝试调度Pod到Node上,但不强求,多个优先级还能设置权重

apiVersion: v1
kind: Pod
metadata:
 name: with-node-affinity
spec:
 affinity:
  nodeAffinity:
   requiredDuringSchedulingIgnoredDuringExecution:
    nodeSelectorTerms:
    - matchExpressions:
      - key: beta.kubernetes.io/arch
        operator: In
        values:
        - amd64
   preferredDuringSchedulingIgnoredDuringExecution:
   - weight: 1
     preference:
      matchExpressions:
      - key: gateway
        operator: In
        values:
        - true
 containers:
 - name: nginx-affinity
   image: nginx
  • operator:操作符:NodeAffinity语法支持的操作符包括In NotIn,Exists,DoesNoExist,Gt,Lt
  • 如果同时定了以nodeSelector和nodeAffinity,那么2个条件必须同时满足,Pod才会调度
  • 如果nodeAffinity指定了多个nodeSelectorTerms,那么只需要其中一个能匹配即可
  • 如果nodeSelectorTerms中有多个matchExpressions,则一个节点必须满足所有matchExpressions才能运行该pod

PodAffinity: Pod亲和性

  • podAffinity:pod亲和性申明
  • podAntiAffinity:pod互斥性申明

亲和性

如果在具有标签X的Node上运行了一个或者多个符合条件Y的pod,那么pod应该(如果互斥,则为拒绝运行)运行在这个Node上,此处的X表示范围,X为一个内置标签,这个key的名字为topologyKey,值如下

  • kubernetes.io/hostname
  • failure-domain.beta.kubernetes.io/zone
  • failure-domain.beta.kubernetes.io/region
apiVersion: apps/v1beta1 # for versions before 1.6.0 use extensions/v1beta1
kind: Deployment
metadata:
  name: web-server
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: web-store
    spec:
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - store
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: web-app
        image: php

表示当该Node上有运行标签为app=store的时候,php镜像运行在该node上

互斥性:

apiVersion: v1
kind: Pod
metadata:
 name: with-node-affinity
spec:
 affinity:
  podAffinity:
   requiredDuringSchedulingIgnoredDuringExecution:
   - labelSelector:
      matchExpressions:
      - key: app
        operator: In
        values:
        - true
     topologyKey: failure-domain.beta.kubernetes.io/zone
  podAntiAffinity:
   requiredDuringSchedulingIgnoredDuringExecution:
   - labelSelector:
     matchExpressions:
     - key: app
       operator: In
       values:
       - nginx
     topologyKey: kubernetes.io/hostname
 containers:
 - name: php-affinity
   image: php
apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/e2e-az-name
            operator: In
            values:
            - e2e-az1
            - e2e-az2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: another-node-label-key
            operator: In
            values:
            - another-node-label-value
  containers:
  - name: with-node-affinity
    image: gcr.io/google_containers/pause:2.0
  • 此要求是这个新的pod必须要调度在app=true这个zone里,但是不能与app=nginx调度到同一台里
  • pod的亲和性操作符也包含In NotIn,Exists,DoesNoExist,Gt,Lt
  • 在pod亲和性和RequiredDuringScheduling互斥性的定义中,不允许使用空的topologyKey
  • 如果在Admission control里定义了包含LimitPodHardAntiAffinityTopology,那么针对RequiredDuringScheduling的Pod互斥性定义就被限制为kubernetes.io/hostname
  • 在PreferredDuringScheduling类型的Pod互斥性中,空的topologyKey会被解释为kubernetes.io/hostname,failure-domain.beta.kubernetes.io/zono,failure-domain.beta.kubernetes.io/region的组合
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    run: busybox
  name: busybox
spec:
  replicas: 5
  selector:
    matchLabels:
      run: busybox
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        run: busybox
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: run
                  operator: In
                  values:
                  - busybox
              topologyKey: kubernetes.io/hostname
            weight: 1
      containers:
      - command:
        - sleep
        - "3000"
        image: busybox
        imagePullPolicy: Always
        name: busybox
      dnsPolicy: ClusterFirst
      restartPolicy: Always

非强制亲和性,可以不调度到一台上,但是不强制(当节点不可用或无节点可用时)

Taints和Tolerations

NodeAffinity是在Pod上定义的一种属性,使得Pod能调度到某些Node上运行,Taints恰好相反,它拒绝Pod运行

Taints需要和Tolerations配合使用,让Pod避开那些不适合的Node,在Node上设置一个或多个Taints过后没出费Pod明确声明能容忍这些污点,否则无法在这些Node上运行

Toleration是Pod的属性,让pod能够(只是能够,不是必须)运行在标注了Taint的Node上

kubectl taint命令为Node设置Taint信息

kubectl taint nodes node1 key=value:NoSchedule

这个设置为node1加上一个Taint,该Taint的键为key,值为value,效果是NoSchedule,这意味着pod除非明确声明可以容忍这个Taint,否则就不会调度到node1上去,然后需要在pod上声明Toleration

tolerations:
- key: "key"
  operator: "Equal"
  value: "value"
  effect: "NoSchedule"
或者
tolerations:
- key: "key"
  operator: "Exists"
  effect: "NoSchedule"

Pod的Toleration声明中的key和effect需要和Taint的设置保持一致,并且满足以下条件之一:

  • operator的值是Exists(无须指定value)
  • operator的值是Equal并且value值相等
  • 如果不指定operator,默认值是Equal

关于effect取值

  • NoSchedule
  • PreferNoSchedule,这个值的意思是优先,也可以算NoSchedule的软限制版本,一个Pod如果没有声明容忍这个Taint,那么系统会尽量避免把这个Pod调度到这个节点上去,但不是强制的
  • NoExecute:如果给Node加上effect=NoExecute的Taint,那么该Node上正在运行的所有无对应Toleration的Pod都会被立刻驱逐,具有相应Toleration的Pod则永远不会被驱逐,系统允许给具有NoExecute效果的Toleration加入相应的tolerationSeconds字段,表明Pod可以在taint添加到Node后还能再这个Node上运行多久(单位为s)

如下设置Node的Taint

kubectl taint nodes node1 key1=value1:NoSchedule
kubectl taint nodes node1 key1=value1:NoExecute
kubectl taint nodes node1 key2=value2:NoSchedule

在pod上定义Tolerations:

tolerations:
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoSchedule
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoExecute

这样的结果是改pod无法被调度到node1上去,因为第三个Taint没有匹配Toleration,但是如果该pod已经在node1上运行,那么在运行时设置上第三个Taint,他还能继续在Node上运行,这是因为Pod可以容忍前2个Taint.

tolerations:
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoExecute
  tolerationSeconds: 3600

上述定义的意思是,如果pod正在运行,所在节点被加入一个匹配的Taint,则这个Pod会持续在这个节点上存活3600秒,然后被驱逐,如果在这个宽限期内,Taint被移除,那么不会触发驱逐事件

© w564791 all right reserved,powered by Gitbook文件修订时间: 2019-04-28 06:26:58

results matching ""

    No results matching ""