Self Service を展開すると、Prism Central の Kubernetes クラスタ(MSP:Nutanix Microservices Platform)に Self Service 関連のコンテナが展開されます。
ただ、Self Service 関連のコンテナ(Pod)は、わりと再起動されていて Self Service の動作も不安定だったりします。そこで、ドキュメントにある手順でタイムアウトを調整してみます。
今回の環境
Self Service は、下記のように有効化してあります。
Prism Central に SSH ログインして kubectl で確認してみると、Self Service 関連の Pod が自動再起動されています。この例では、postgres-operator-~、redis-standalone-~ という Pod の再起動回数(RESTARTS)が多くなっています。ほかにも Error があったりしますが、今回はひとまず無視します。
nutanix@NTNX-192-168-20-11-A-PCVM:~$ kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system cloud-controller-manager-5xrht 1/1 Running 0 13d kube-system coredns-5974f559c5-wkqz4 1/1 Running 0 13d kube-system kube-apiserver-ntnx-192-168-20-11-a-pcvm 3/3 Running 0 13d kube-system kube-flannel-ds-6vwvv 1/1 Running 0 13d kube-system kube-proxy-ds-6kncj 1/1 Running 0 13d kube-system lb-controller-deployment-0 1/1 Running 0 13d kube-system mspdns-swqkp 1/1 Running 0 13d kube-system mspserviceregistry-589b5c7bf4-65r2v 1/1 Running 1 13d ntnx-base alerts-broker-d76d947b4-psdk8 1/1 Running 0 13d ntnx-base backrest-backup-cape-v5ps7 0/1 Completed 0 4h28m ntnx-base cape-6cccdd46d7-pwg74 2/2 Running 20 13d ntnx-base cape-backrest-shared-repo-d8f6dbb89-d2xh8 1/1 Running 0 13d ntnx-base cape-leadership-cron-28895765-hvc6m 1/1 Running 0 4d20h ntnx-base cape-leadership-cron-28902725-864jp 0/1 Completed 0 5m24s ntnx-base cape-leadership-cron-28902730-2sgzv 1/1 Running 0 24s ntnx-base cape-stanza-create-vdqpc 0/1 Completed 0 13d ntnx-base iam-analytics-sensor-28900080-2h6mr 0/1 Error 0 44h ntnx-base iam-analytics-sensor-28900080-hclvn 0/1 Error 0 44h ntnx-base iam-analytics-sensor-28900080-q6xm4 0/1 Error 0 44h ntnx-base iam-analytics-sensor-28900080-qfxf9 0/1 Error 0 44h ntnx-base iam-analytics-sensor-28900080-sdj4g 0/1 Error 0 44h ntnx-base iam-analytics-sensor-28900080-wmhms 0/1 Error 0 44h ntnx-base iam-analytics-sensor-28900080-xxh9h 0/1 Error 0 44h ntnx-base iam-analytics-sensor-28902720-6zlq5 0/1 Error 0 9m31s ntnx-base iam-analytics-sensor-28902720-8jdjb 0/1 Error 0 6m58s ntnx-base iam-analytics-sensor-28902720-c7vvj 0/1 Error 0 8m30s ntnx-base iam-analytics-sensor-28902720-nq9sd 0/1 Error 0 2m57s ntnx-base iam-analytics-sensor-28902720-qbfvd 0/1 Error 0 10m ntnx-base iam-analytics-sensor-28902720-r95hp 0/1 Error 0 5m37s ntnx-base iam-bootstrap-8qmzn 0/1 Completed 0 13d ntnx-base iam-pg-dr-backups-28895580-hsqq8 1/1 Terminating 0 4d23h ntnx-base iam-proxy-748f46c7fc-sgt7l 2/2 Running 0 13d ntnx-base iam-proxy-control-plane-7bbb96b64b-v768w 1/1 Running 0 13d ntnx-base iam-themis-67c4775d7d-4f6sl 1/1 Running 0 13d ntnx-base iam-ui-c5c488d68-sl972 1/1 Running 0 13d ntnx-base iam-user-authn-5c4d47798-4v24j 1/1 Running 89 13d ntnx-base pgo-deploy-bhm4f 0/1 Completed 0 13d ntnx-base pgo-pull-image-g9q92 0/29 Completed 0 13d ntnx-base postgres-operator-6dd9d64685-hdhmw 3/4 Running 1313 13d ntnx-base redis-standalone-7fcbf666fb-n9j7b 3/3 Running 1134 13d ntnx-base svcmgr-75d7f467cf-w7b6x 1/1 Running 0 13d ntnx-system alertmanager-main-0 2/2 Running 0 13d ntnx-system csi-node-ntnx-plugin-cp6xg 2/2 Running 0 13d ntnx-system csi-provisioner-ntnx-plugin-0 3/3 Running 0 13d ntnx-system fluent-bit-phb6g 1/1 Running 0 13d ntnx-system kube-state-metrics-f7d6b984b-njsgr 3/3 Running 0 13d ntnx-system mutator-webhook-dep-858cd7968b-dwxwb 1/1 Running 0 13d ntnx-system node-exporter-pc6j5 2/2 Running 0 13d ntnx-system ntnx-cluster-maintainer-755bf8b688-pfvml 1/1 Running 0 13d ntnx-system ntnx-k8s-cluster-maintainer-operator-5db56dcffb-5tf4l 1/1 Running 1 13d ntnx-system prometheus-k8s-0 3/3 Running 1 13d ntnx-system prometheus-operator-5cf777977-spfww 1/1 Running 0 13d pc-platform-core batch-service-rw8xq 1/1 Running 0 13d pc-platform-nci security-dashboard-796594c96d-fb7rf 1/1 Running 0 13d pc-platform-other licensing-app-6686d6576c-rgdlv 1/1 Running 0 13d
これらの Pod は、Deployment から起動されています。
nutanix@NTNX-192-168-20-11-A-PCVM:~/tmp$ kubectl get deploy -n ntnx-base cape postgres-operator redis-standalone NAME READY UP-TO-DATE AVAILABLE AGE cape 0/1 1 0 13d postgres-operator 1/1 1 1 13d redis-standalone 1/1 1 1 13d
Self Service 関連 Pod(Deployment)のタイムアウト変更
ドキュメントには、下記のあたりに説明があります。対象バージョンは Self Service 4.0 だったり、Self Service VM むけだったりしますが、手順やスクリプトは 3.8.x と同様です。
Prism Central には、nutanix ユーザーで SSH ログインしておきます。カレント ディレクトリは /home/nutanix です。
nutanix@NTNX-192-168-20-11-A-PCVM:~$ pwd /home/nutanix
スクリプトをダウンロードします。
nutanix@NTNX-192-168-20-11-A-PCVM:~$ wget https://download.nutanix.com/Calm/CalmVM-Files/380-Files/increase_readiness_probe_timeouts.sh
スクリプト ファイルに、実行権限を付与します。
nutanix@NTNX-192-168-20-11-A-PCVM:~$ chmod +x increase_readiness_probe_timeouts.sh
スクリプトを実行します。このスクリプトでは、readinessProbe や livenessProbe のタイムアウトを増加させています。
nutanix@NTNX-192-168-20-11-A-PCVM:~$ ./increase_readiness_probe_timeouts.sh Executing command: sudo kubectl patch deployment redis-standalone -n ntnx-base --type='json' -p='[ {"op": "replace", "path": "/spec/template/spec/containers/0/readinessProbe/timeoutSeconds", "value": 10}, {"op": "replace", "path": "/spec/template/spec/containers/0/livenessProbe/timeoutSeconds", "value": 10}, {"op": "replace", "path": "/spec/template/spec/containers/1/readinessProbe/timeoutSeconds", "value": 10}, {"op": "replace", "path": "/spec/template/spec/containers/1/livenessProbe/timeoutSeconds", "value": 10}, {"op": "replace", "path": "/spec/template/spec/containers/2/readinessProbe/timeoutSeconds", "value": 10}, {"op": "replace", "path": "/spec/template/spec/containers/2/livenessProbe/timeoutSeconds", "value": 10} ]', deployment.apps/redis-standalone patched Executing command: sudo kubectl patch deployment cape -n ntnx-base --type='json' -p='[ {"op": "replace", "path": "/spec/template/spec/containers/0/readinessProbe/timeoutSeconds", "value": 10} ]', deployment.apps/cape patched Executing command: sudo kubectl patch deployment cape -n ntnx-base --type='json' -p='[ {"op": "replace", "path": "/spec/template/spec/containers/0/livenessProbe/periodSeconds", "value": 30} ]', deployment.apps/cape patched Executing command: sudo kubectl patch deployment cape -n ntnx-base --type='json' -p='[ {"op": "replace", "path": "/spec/template/spec/containers/0/readinessProbe/periodSeconds", "value": 30} ]', deployment.apps/cape patched Executing command: sudo kubectl patch deployment postgres-operator -n ntnx-base --type='json' -p='[ {"op": "replace", "path": "/spec/template/spec/containers/1/readinessProbe/timeoutSeconds", "value": 10} ]', deployment.apps/postgres-operator patched Executing command: sudo kubectl patch deployment postgres-operator -n ntnx-base --type='json' -p='[ {"op": "replace", "path": "/spec/template/spec/containers/1/readinessProbe/periodSeconds", "value": 30} ]', deployment.apps/postgres-operator patched Executing command: sudo kubectl patch deployment postgres-operator -n ntnx-base --type='json' -p='[ {"op": "replace", "path": "/spec/template/spec/containers/2/livenessProbe/timeoutSeconds", "value": 10} ]' deployment.apps/postgres-operator patched Success: All commands executed successfully.
「kubectl get deploy -n ntnx-base」や、「kubectl get pod -n ntnx-base」で様子を見ると、Pod が自動的に再作成されて、Ready になるはずです。
nutanix@NTNX-192-168-20-11-A-PCVM:~$ kubectl get deploy -n ntnx-base cape postgres-operator redis-standalone NAME READY UP-TO-DATE AVAILABLE AGE cape 1/1 1 1 13d postgres-operator 1/1 1 1 13d redis-standalone 1/1 1 1 13d
これで、少し安定するはず・・・
以上。