쿠버네티스

k3s 쿠버네티스 워커 노드 master IP, hostname 으로 인한 실패 정리

angora79 2024. 12. 17. 21:40

 

k3s 에서 워커 노드 추가시 보통 아래의 명령어를 사용해서 master 노드에 join 하게 된다.

curl -sfL https://get.k3s.io | K3S_URL=https://<MASTER_IP>:6443 K3S_TOKEN=<TOKEN> sh -

 

그런데 아무리 기다려도 join 명령어가 끝나지 않고 무한 대기를 타고 있었다.

강제 종료후 확인해보니 에러 메세지중ㅈ failed to get CA certs  라는 에러가 발생하고 있었다.

 

angora@angora:~$ curl -sfL https://get.k3s.io | K3S_URL=https://192.168.0.25:6443 K3S_TOKEN=K1087da8d651914287d1722803a628f6dabb851f57d16ede42f8c3454eb1b2f360b::server:8eba64e53126a220483eeaf1e2db7972 sh -
[sudo] password for angora:
[INFO]  Finding release for channel stable
[INFO]  Using v1.31.3+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.31.3+k3s1/sha256sum-amd64.txt
[INFO]  Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.31.3+k3s1/k3s
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-agent-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s-agent.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s-agent.service
[INFO]  systemd: Enabling k3s-agent unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s-agent.service → /etc/systemd/system/k3s-agent.service.
[INFO]  systemd: Starting k3s-agent



^Cangora@angora:~$ systemctl status k3s-agent
● k3s-agent.service - Lightweight Kubernetes
     Loaded: loaded (/etc/systemd/system/k3s-agent.service; enabled; vendor preset: enabled)
     Active: activating (start) since Tue 2024-12-17 11:52:47 UTC; 3min 21s ago
       Docs: https://k3s.io
   Main PID: 1352 (k3s-agent)
      Tasks: 9
     Memory: 241.9M
     CGroup: /system.slice/k3s-agent.service
             └─1352 /usr/local/bin/k3s agent

Dec 17 11:54:13 angora k3s[1352]: time="2024-12-17T11:54:13Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
Dec 17 11:54:26 angora k3s[1352]: time="2024-12-17T11:54:26Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
Dec 17 11:54:38 angora k3s[1352]: time="2024-12-17T11:54:38Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
Dec 17 11:54:50 angora k3s[1352]: time="2024-12-17T11:54:50Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
Dec 17 11:55:03 angora k3s[1352]: time="2024-12-17T11:55:03Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
Dec 17 11:55:15 angora k3s[1352]: time="2024-12-17T11:55:15Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
Dec 17 11:55:27 angora k3s[1352]: time="2024-12-17T11:55:27Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
Dec 17 11:55:39 angora k3s[1352]: time="2024-12-17T11:55:39Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
Dec 17 11:55:52 angora k3s[1352]: time="2024-12-17T11:55:52Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
Dec 17 11:56:04 angora k3s[1352]: time="2024-12-17T11:56:04Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
...skipping...
● k3s-agent.service - Lightweight Kubernetes
     Loaded: loaded (/etc/systemd/system/k3s-agent.service; enabled; vendor preset: enabled)
     Active: activating (start) since Tue 2024-12-17 11:52:47 UTC; 3min 21s ago
       Docs: https://k3s.io
   Main PID: 1352 (k3s-agent)
      Tasks: 9
     Memory: 241.9M
     CGroup: /system.slice/k3s-agent.service
             └─1352 /usr/local/bin/k3s agent

Dec 17 11:54:13 angora k3s[1352]: time="2024-12-17T11:54:13Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
Dec 17 11:54:26 angora k3s[1352]: time="2024-12-17T11:54:26Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
Dec 17 11:54:38 angora k3s[1352]: time="2024-12-17T11:54:38Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
Dec 17 11:54:50 angora k3s[1352]: time="2024-12-17T11:54:50Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
Dec 17 11:55:03 angora k3s[1352]: time="2024-12-17T11:55:03Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
Dec 17 11:55:15 angora k3s[1352]: time="2024-12-17T11:55:15Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
Dec 17 11:55:27 angora k3s[1352]: time="2024-12-17T11:55:27Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
Dec 17 11:55:39 angora k3s[1352]: time="2024-12-17T11:55:39Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
Dec 17 11:55:52 angora k3s[1352]: time="2024-12-17T11:55:52Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
Dec 17 11:56:04 angora k3s[1352]: time="2024-12-17T11:56:04Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.>
~

 

ChatGPT 에 물어보니 친절하게 해당 에러가 발생할수 있는 9가지 경우에 대해 알려줬고

그중 첫번째가 master 노드 ip 주소를 확인 하는 것이었다.

 

ip 주소 확인결과 192.168.0.18 을 입력해야하는데 192.168.0.25로 잘못 입력하여 에러 발생했고

워커 노드에 설치된 k3s-agent 를 삭제하고 다시 설치하여 문제가 해결되지 않고 새로운 에러가 발생했다.

 

systemctl status k3s-agent
● k3s-agent.service - Lightweight Kubernetes
     Loaded: loaded (/etc/systemd/system/k3s-agent.service; enabled; vendor preset: enabled)
     Active: activating (start) since Tue 2024-12-17 12:04:50 UTC; 3min 3s ago
       Docs: https://k3s.io
    Process: 3344 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service 2>/dev/null (code=e>
    Process: 3346 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 3347 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
   Main PID: 3348 (k3s-agent)
      Tasks: 9
     Memory: 253.0M
     CGroup: /system.slice/k3s-agent.service
             └─3348 /usr/local/bin/k3s agent

Dec 17 12:06:35 angora k3s[3348]: time="2024-12-17T12:06:35Z" level=info msg="Waiting to retrieve agent configuration; server>
Dec 17 12:06:44 angora k3s[3348]: time="2024-12-17T12:06:44Z" level=info msg="Waiting to retrieve agent configuration; server>
Dec 17 12:06:50 angora k3s[3348]: time="2024-12-17T12:06:50Z" level=info msg="Waiting to retrieve agent configuration; server>
Dec 17 12:06:57 angora k3s[3348]: time="2024-12-17T12:06:57Z" level=info msg="Waiting to retrieve agent configuration; server>
Dec 17 12:07:05 angora k3s[3348]: time="2024-12-17T12:07:05Z" level=info msg="Waiting to retrieve agent configuration; server>
Dec 17 12:07:15 angora k3s[3348]: time="2024-12-17T12:07:15Z" level=info msg="Waiting to retrieve agent configuration; server>
Dec 17 12:07:24 angora k3s[3348]: time="2024-12-17T12:07:24Z" level=info msg="Waiting to retrieve agent configuration; server>
Dec 17 12:07:32 angora k3s[3348]: time="2024-12-17T12:07:32Z" level=info msg="Waiting to retrieve agent configuration; server>
Dec 17 12:07:42 angora k3s[3348]: time="2024-12-17T12:07:42Z" level=info msg="Waiting to retrieve agent configuration; server>
Dec 17 12:07:47 angora k3s[3348]: time="2024-12-17T12:07:47Z" level=info msg="Waiting to retrieve agent configuration; server>

 

이번에는 구글링으로 k3s agent Waiting to retrieve agent configuration; server  메세지를 검색해보니

 

바로 duplicated host name 인 경우 발생하는 에러 메세지 라는 것을 알게 되었다.

 

chatgpt 로 k3s agent Waiting to retrieve agent configuration; server 에러메시지와 구글링으로 찾은 hostname 이 원인이냐고 물어보니 친절하게 해결방법까지 알려주었다.

 

chatgpt 가 정말 편하긴 편하네 구글링으로 원인, 해결책 찾아 다니느라 시간 많이 걸렸는데 그 시간을 줄여주네...

 

 

테스트 환경

  • asus pn64 13500h, 64gb, 1.6tb sata ssd
  • proxmox 8.2.2
  • ubuntu 22.04 server 로 VM 생성및 template 로 전환후 작업함

 

교훈 :

  • proxmox template 재활용도 좋지만 vm 생성후에도 hostname 은 꼭 바꿔 주자!
  • master node ip 헷갈리지 말자