Pages

Wednesday, July 8, 2015

Configure a Highly Available Kubernetes / etcd Cluster with Pacemaker on Fedora

I'm going to share some of the great work that Matt Farrellee, Rob Rati and Tim St. Clair have done with regards to figuring out $TOPIC - they get full credit for the technical details here.  It's really interesting work and I thought I'd share it with the upstream community.  Not to mention it gives me an opportunity to learn how this is all set up and configured.

In this configuration I will set up 5 virtual machines and one VIP:

fed-master1.example.com 192.168.123.100
fed-master2.example.com 192.168.123.101
fed-master3.example.com 192.168.123.102
fed-node1.example.com 192.168.123.103
fed-node2.example.com 192.168.123.104
fed-vip.example.com 192.168.123.105

If you are wondering how I set up this environment quickly and repetitively, check out omv from Purpleidea.  He's a clever guy with a great dev workflow.  In particular, have a look at the work he has done to put his great code into a package to make distribution easier.

In summary here, I used Vagrant, KVM and omv to build and destroy this environment.  I won't go into to many details about how that all works, but feel free to ask questions in the comments if needed.  My omv.yaml file is located here, this might help you get up and running quickly.  Just make sure you have a Fedora 22 Vagrant box that matches the name in the file.  Yup, I run it all on my laptop.

Global configuration:

  • Configure /etc/hosts on all nodes so that name resolution works (omv can help here)
  • Share SSH key from master to all other nodes



To summarize what this environment will look like and what components will be running where, I have 3 master servers which will be running the pacemaker components as well as etcd and kubernetes master node services.  I have 2 nodes which will be running flanneld and the kubernetes worker node services.  These 2 nodes will also be running Docker.  When I'm mentioning commands below, you can assume that I want them to be run on each group of nodes, unless I specify otherwise.  The overall flow of the configuration will be:

  • Deploy VMs
  • Install Software
  • Configure etcd
  • Configure flannel
  • Configure kubernetes
  • Configure pacemaker
  • Confirm functionality

By the time you are finished you should have a highly available Active / Passive cluster configuration running kubernetes and all the required components.

Okay, so, put on your helmet and let's get started here.

Installing Software: 

Here we just need to make sure we have the appropriate packages on each node.  I've listed the versions that I used for this configuration at the end of the article.

Execute the following on each master nodes:

       
# yum -y install etcd kubernetes-master pcs fence-agents-all

Execute the following on each worker nodes:

       
# yum -y install kubernetes-node docker flannel


Configure etcd:

Our key value store for configuration is going to be etcd.  In this case, we are creating an etcd cluster so we have a highly available deployment.  The config file and script for this is on github here and here.

Create the following script (also in github) and run it from master1:

       
etcd0=192.168.123.100
etcd1=192.168.123.101
etcd2=192.168.123.102
INITIAL_CLUSTER="etcd0=http://$etcd0:2380,etcd1=http://$etcd1:2380,etcd2=http://$etcd2:2380"

for name in etcd0 etcd1 etcd2; do
   ssh -t ${!name} \
       sed -i -e "s#.*ETCD_NAME=.*#ETCD_NAME=$name#" \

                 -e "s#.*ETCD_INITIAL_ADVERTISE_PEER_URLS=.*#ETCD_INITIAL_ADVERTISE_PEER_URLS=http://${!name}:2380#" \
                 -e "s#.*ETCD_LISTEN_PEER_URLS=.*#ETCD_LISTEN_PEER_URLS=http://${!name}:2380#" \
                 -e "s#.*ETCD_LISTEN_CLIENT_URLS=.*#ETCD_LISTEN_CLIENT_URLS=http://${!name}:2379,http://127.0.0.1:2379,http://127.0.0.1:4001#" \
                 -e "s#.*ETCD_ADVERTISE_CLIENT_URLS=.*#ETCD_ADVERTISE_CLIENT_URLS=http://${!name}:2379#" \
                 -e "s#.*ETCD_INITIAL_CLUSTER=.*#ETCD_INITIAL_CLUSTER=$INITIAL_CLUSTER#" \
           /etc/etcd/etcd.conf
done


Execute the following on all masters:

       
# systemctl enable etcd; systemctl start etcd; systemctl status etcd
# etcdctl cluster-health; etcdctl member list


Also, check out the /etc/etcd/etcd.conf file and journal, etc...  Check that out on each master and get familiar with how etcd is configured.

Configure Flannel:

We use flannel so that container A on host A can talk to container A on host B.  It provides and overlay network that the containers and kubernetes can take advantage of.  Oh, and it's really easy to configure. An example /etc/sysconfig/flanneld config file is on my github repo.

Execute the following on the worker nodes:

       
# echo FLANNEL_ETCD="http://192.168.123.100:2379,http://192.168.123.101:2379,http://192.168.123.102:2379" >> /etc/sysconfig/flanneld

# systemctl enable flanneld; systemctl start flanneld; systemctl status flanneld
# systemctl enable docker; systemctl start docker
# reboot


When the servers come back up, confirm that the flannel and docker interfaces are on the same subnet.

Configure kubernetes:

Kubernetes will be our container orchestration layer.  I wont' get to much into the details of the different kubernetes services, or even usage for that matter.  I can assure you it is well documented and you might want to have a look here and here.  I have posted my complete kubernetes config files here.

Execute the following on the master nodes:

       
# echo KUBE_API_ADDRESS=--address=0.0.0.0 >> /etc/kubernetes/apiserver


You can see my kubernetes master config files here.

Execute the following on the worker nodes:

       
# echo KUBE_MASTER=”--master=192.168.123.105:8080” >> /etc/kubernetes/config
# echo KUBELET_ADDRESS=”--address=0.0.0.0” >> /etc/kubernetes/kubelet
# echo KUBELET_HOSTNAME= >> /etc/kubernetes/kubelet
# echo KUBELET_ARGS=”--register-node=true” >> /etc/kubernetes/kubelet
Keep in mind here that the .105 address is the VIP listed in the table at the beginning of the article.

In addition, on the kubelet, you'll want to comment out the line for KUBELET_HOSTNAME, so that when it checks in with the master, it uses it's true hostname.

You can see my kubernetes node config files here.

Configure Pacemaker:

Pacemaker is going to provide our HA mechanism.  You can find more information about configuring Pacemaker on the Clusters from Scratch page of their website.  My /etc/corosync/corosync.conf file is posted on github here.

Execute the following on all masters:

This command will set the password for the hacluster user in order for cluster auth to function properly.
       
# echo hacluster | passwd -f --stdin hacluster
Execute the following on master1:
       
# pcs cluster auth -u hacluster -p hacluster fed-master1.example.com fed-master2.example.com fed-master3.example.com
# pcs cluster setup --start --name high-availability-kubernetes fed-master1.example.com fed-master2.example.com fed-master3.example.com
# pcs resource create virtual-ip IPaddr2 ip=192.168.123.105 --group master
# pcs resource create apiserver systemd:kube-apiserver --group master
# pcs resource create scheduler systemd:kube-scheduler --group master
# pcs resource create controller systemd:kube-controller-manager --group master
# pcs property set stonith-enabled=false
Check the status of the cluster:
       
# pcs status
# pcs cluster auth

Confirm functionality:

Here we'll want to make sure everything is working.

You can check that kubernetes is functioning by making a call to the VIP, which will point to the active instance of the kubernetes API server.

Execute the following on any master node:

       
# kubectl -s http://192.168.123.105:8080 get nodes
NAME        LABELS                             STATUS
fed-node1   kubernetes.io/hostname=fed-node1   Ready
fed-node2   kubernetes.io/hostname=fed-node2   Ready


Execute the following on any master node: 
       
# pcs status
Cluster name: high-availability-kubernetes
Last updated: Wed Jul  8 15:21:35 2015
Last change: Wed Jul  8 12:38:32 2015
Stack: corosync
Current DC: fed-master1.example.com (1) - partition with quorum
Version: 1.1.12-a9c8177
3 Nodes configured
4 Resources configured


Online: [ fed-master1.example.com fed-master2.example.com fed-master3.example.com ]


Full list of resources:

 Resource Group: master
     virtual-ip (ocf::heartbeat:IPaddr2): Started fed-master1.example.com 
     apiserver (systemd:kube-apiserver): Started fed-master1.example.com 
     scheduler (systemd:kube-scheduler): Started fed-master1.example.com 
     controller (systemd:kube-controller-manager): Started fed-master1.example.com 


PCSD Status:
  fed-master1.example.com: Online
  fed-master2.example.com: Online
  fed-master3.example.com: Online
Daemon Status:   corosync: active/disabled   pacemaker: active/disabled   pcsd: active/enabled
You can see that everything is up and running.  It shows that the resource group is running on fed-master1.example.com.  Well, we might as well place that in standby and make sure it starts on another node and that we can still execute kubernetes commands.

       
# pcs cluster standby fed-master1.example.com

Now, check the resources again:
       
# pcs status
Cluster name: high-availability-kubernetes
Last updated: Wed Jul  8 15:24:17 2015
Last change: Wed Jul  8 15:23:59 2015
Stack: corosync
Current DC: fed-master1.example.com (1) - partition with quorum
Version: 1.1.12-a9c8177
3 Nodes configured
4 Resources configured


Node fed-master1.example.com (1): standby
Online: [ fed-master2.example.com fed-master3.example.com ]


Full list of resources:


 Resource Group: master
     virtual-ip (ocf::heartbeat:IPaddr2): Started fed-master2.example.com 
     apiserver (systemd:kube-apiserver): Started fed-master2.example.com 
     scheduler (systemd:kube-scheduler): Started fed-master2.example.com 
     controller (systemd:kube-controller-manager): Started fed-master2.example.com 


PCSD Status:
  fed-master1.example.com: Online
  fed-master2.example.com: Online
  fed-master3.example.com: Online


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
You can see that it moved over to fed-master2.example.com.  Now, can I still get node status?
       
# kubectl -s http://192.168.123.105:8080 get nodes
NAME        LABELS                             STATUS
fed-node1   kubernetes.io/hostname=fed-node1   Ready
fed-node2   kubernetes.io/hostname=fed-node2   Ready

Yes.  I can.  So, enjoy.  Maybe deploy some kubernetes apps?

Package versions:

This tech changes quickly, so for reference, here's what I used to set this all up.

Master nodes:

       
# rpm -qa selinux* kubernetes-master etcd fence-agents-all
fence-agents-all-4.0.16-1.fc22.x86_64
kubernetes-master-0.19.0-0.7.gitb2e9fed.fc22.x86_64
etcd-2.0.11-2.fc22.x86_64
selinux-policy-3.13.1-128.2.fc22.noarch
selinux-policy-targeted-3.13.1-128.2.fc22.noarch

Worker nodes:

       
# rpm -qa kubernetes-node docker flannel selinux*
selinux-policy-3.13.1-128.2.fc22.noarch
selinux-policy-targeted-3.13.1-128.2.fc22.noarch
kubernetes-node-0.19.0-0.7.gitb2e9fed.fc22.x86_64
docker-1.6.0-3.git9d26a07.fc22.x86_64
flannel-0.2.0-7.fc22.x86_64


And that concludes this article.  I hope it was helpful.  Feel free to leave some comments or suggestions.  It would be cool to containerize Pacemaker and get this running on a Fedora Atomic host.


No comments:

Post a Comment