Earlier today I wrote an introduction to my new blog series about migrating to Kubernetes. This is the next installment, we'll be talking about setting up a Kubernetes storage facility to allow applications to run on any node qualifying.
There is much to say about Kubernetes, as it covers extremely much use cases and scenarios and you can orchestrate so many things, Did I say orchestration? Yes, Kubernetes is an orchestration framework for containers. Check this out:
I am not gonna write about Kubernetes itself here, because there is a gazillion of websites doing that already. Simply said, instead of using bare metal servers or hypervisor virtual machines with full operating systems loaded to run an application server, you use containers instead, sharing the kernel of a host operating system and be much more efficient, fast and productive.
Although not everything can or should be running in containers, the vast majority of things can be done so very well and reliably.
So anyway this is my starting point:
As you can see, there are a number of hosts providing standalone hosts, and some
connected to databases. All of them though live on single hosts in a virtual machine each. The nodes are completely managed by Puppet (Infrastructure as Code).
The goal is to have all applications being able to live in a Kubernetes cluster, including the databases, which probably is the trickiest part. Kubernetes must be able to schedule the application container on any available node with the right resources. If a node goes down, the application must be restarted on a different node. The databases must be able to survive container restarts. Applications must not be tied to any specific node.
In order for that, the Kubernetes nodes must be able to access shared file systems somehow. We'll be using persistent volumes (PV) and Persistent Volume Claims (PVC) for that, and need to have underlying storage solutions available. It is important to know, that PVs cannot be changed, once configured, so planning ahead is of essence.
Obviously there are very good storage solutions available, if you happen to be running a datacentre, on premise or even in the internet. But for one I want to keep it simple, cost effective and secure. Data traffic between applications, databases and underlying storage should never be visible anywhere outside their respective isolated networks, let go the internet. This it will be within the back-end network art my cloud provider, where everything is running. Only thing accessible to the internet should be the front-end services, and those will be protected either. That will be covered later down the road.
Some of the current nodes will be reused for different purposes, others simply deleted as they won't be needed anymore.
The new infrastructure will be like that:
- 3 VM nodes running gluster fs ( while it actually ca be part of Kubernetes, it adds much load to the nodes due to sync tasks)
- 1 VM running Gitlab
- 1 VM running Mailservices
Additionally the new Kubernetes cluster comes in:
Obviously there will be many more applications added to the Kubernetes layer. These are only examples. But it will allow us to focus on the real requirements:
- Providing a shared storage facility
- making Kubernetes use this to allow running applications requiring data storage on any node
- having databases in a stateful status within Kubernetes and as a cluster surviving container restarts.
That concludes this session about my storage requirements for today. The next post will cover setting up the storage facility and make Kubernetes use it properly.
Happy coding!