ElasticSearch Engineer Exam v7.13 (2021)

TL;DR

I passed !

Here are my revision notes: https://github.com/mohclips/Elastic-Certified-Engineer-Exam-Notes

What is the exam like?

The exam is 3 hours long, just 10 questions, it is hands on (so you have to build stuff and write code) and it’s open book (well, you get a copy of the Elastic documentation, but no internet access).

How did i do it?

The idea was to look at all the topics in the exam write up, view the Elastic webinars and then write my own training questions. Using docker to spin up various different server setups i was able to write example questions to every topic.
One thing was to work out what data to use. Kibana comes with some example data and also if you look at the kibana github repo, some other data was in there also (though sadly now it has gone – my github repo contains a copy). I also used https://www.kaggle.com/ to find data i liked. I find it’s easier to use data you are familiar initially and with Kaggle it has pretty much everything. With the Data Visualiser you can import csv files so easily.

I was going to take the exam in early June, but i realised that the exam was about to change to a new version in the following month which would have a different feature set. So a decision was needed to be made on whether to step up the training/revision or delay.

At that time, projects at work (Elastic in particular) were ramping up and the kids were soon to be on school holidays. Plus, i was given the opportunity to take a course and exam in SUSE Rancher (another product i really like) . So i put the Elastic Exam on hold, as you get 12 months from time of purchase to take the exam.

In July 2021, Elastic released the latest version of the exam, this was based on version 7.13 of the software whereas previously it was v7.2. Something i was more familiar with at work. There is one thing about Elastic they like to release features a lot, so moving to v7.13 was quite a change to the exam topics. Some topics were dropped like how to build an elastic server from the ground up (oddly) and new ones added like the Data Visualiser and Cross Cluster Replication.

After finishing up the Elastic Project at work and passing the Rancher exam, i was ready to revisit the Elastic training and re-start my revision. What i also wanted to do was cast around for other people’s work on how they had revised and passed the exam. Sure enough people had done this, so happy days.

So how did i really do it?

As mentioned above, i created questions to the topics provided by Elastic, just by studying the documentation and googling. Rich Raposa who heads up training at Elastic does a great webinar in relation to taking the exam. (I suggest you watch that, many times). I think Rich fills you with the confidence you need that it’s actually possible to do this.
For example, i wanted to get my head around the enrich pipeline, well there is a webinar for that, it seems for most of the features in Elastic someone there has done a webinar for it.

I also looked at various online training platforms. There is very little, in fact only Linux Academy had a course that was applicable. This was based on v7.2 so contained a lot of good stuff, but not enough for the v7.13 exam. After emailing “A Cloud Guru” (who own Linux Academy) they tell me that they are looking at upgrading the course. It’s a good course and if you can get it free from your work (ie. they pay and not you, then i’d say it’s worth doing) just be aware it’s not going to touch all the topics you want. Once they upgrade it to v7.13 im sure it will be great.

I also found a number of blogs were people had written up their own revision questions to the v7.2 exam. As far as i am aware (or Google is) my blog the only one based on v7.13 so far. It’s early days, so i expect others will do the same shortly.

George Bridgeman and Kruezwerker were very good examples of what can be done and it was good to get questions to practice on that i hadn’t written myself even if they were for v7.2 – searches and aggregations don’t change all that much of course 🙂

In the end, it was just getting down to do that hands on practice; doing the work over and over so that i was familiar with the topics and where i needed to look in the documentation for clues/examples.
Making sure i was familar with things like the auto-complete in the Kibana Dev Console, where i could shortcut certain steps in the Dev Console with the Kibana interface. For example setup of Cross-Cluster-Search is easier in some respects in the Kibana UI in v7.13 now, though in v7.2 you would need to know what flags to set in elasticsearch.yml and i think that’s missing as you should be aware of that. So that UI saved me typing in a load of json and i got to type in a name and copy/paste and IP/port and click Create instead.

Take a look at my github repo and see what you think. I can’t guarantee it will help you pass, but it might help.

Revision Notes

https://github.com/mohclips/Elastic-Certified-Engineer-Exam-Notes

There is lots more in the repo than this blog post. Let me know what you think.

I’m happy to take any pull requests on the github repo.

Best of luck to you!

Posted in Uncategorized | 2 Comments

Fixing the Brave Browser Spell Check Dictionary

Brave Browser has a known fault where you can’t download the language dictionaries to enable spell checking. If like me you spend most of your life in a browser window, then this is a bit of a pain.

Read on for a very simple fix.

This works for the following: Version 1.30.89 Chromium: 94.0.4606.81 (Official Build) (64-bit)

Copy the Chrome Browser dictionaries to Brave

Of course you need to have Chrome installed, but most likely you have that and wanted to move from the spying-chrome to Brave.

$ cp ~/.config/google-chrome/Dictionaries/*.bdic ~/.config/BraveSoftware/Brave-Browser/Dictionaries/

Restart Brave Browser

Just quit the app and open it again.

Enable the spell checker

Right click on a page where you can edit some text, select “Spell check” and the language you want to use.

Now you get all those nice red squiggly lines to show your spelling is awful like mine 🙂

Didn’t work?

You may have to enable the spell checker in the settings and try and download again – it will fail to download. (a known issue)

Posted in Uncategorized | Leave a comment

Docker and DNS

For some time now I’ve wanted to containerise my bind9, dnscrypt-proxy and DHCP setup, which currently runs on a very low powered mini-PC (i believe it used to power scrolling LED boards – it has 6x d-sub 9-pin serial ports!).

Spinning bind9 up and shifting the config over is easy enough. But getting everything from containers to miniPCs, Raspberry Pis, TVs, phones and laptops to see it correctly is less so.

One issue i have worked on was getting the containers to use the new dockerised bind9 container. What follows is my notes/diagnosis on how DNS works within docker and its daemon.

The pre-install config of DNS

Normally systemd-resolved runs on the host node and this is used to populate the containers DNS via the docker daemon. You would see nameserver 127.0.0.11 within your container in /etc/resolv.conf

see https://docs.docker.com/config/containers/container-networking/#dns-services

But when you remove systemd-resolved – because i like better control of my own DNS you get this sort of nicely common setup.

$ cat /etc/resolv.conf

#
# systemd-resolved has been disabled
#

options edns0 trust-ad
search homelan.local

nameserver 172.30.5.67
nameserver 172.30.5.253
nameserver 1.1.1.1

So, i’ve got 3 nameservers defined, the new one, the old one and a backup (external).

From what i had read and thus tested was docker reads the host node /etc/resolv.conf when you spin up a new container and inserts it into that containers /etc/resolv.conf. Pretty handy really.

Test a newly created container

I use –rm to remove the container after it has run, or you get loads of stopped containers left lying around.

The busybox image is useful as it contains nslookup and that shows the DNS server it is using in it’s simplified output.

$ docker run --rm busybox nslookup www.fujitsu.com
Server:         172.30.5.67
Address:        172.30.5.67:53

Non-authoritative answer:
www.fujitsu.com canonical name = www.fujitsu.com.edgekey.net
www.fujitsu.com.edgekey.net     canonical name = e29247.b.akamaiedge.net
Name:   e29247.b.akamaiedge.net
Address: 2.18.66.59
Name:   e29247.b.akamaiedge.net
Address: 2.18.66.184

*** Can't find www.fujitsu.com: No answer

Tah dah! it worked. I’m so pleased. I had updated my host nodes /etc/resolv.conf and a new container has that config.

Let’s check the /etc/resolv.conf

$ docker run --rm -ti busybox cat /etc/resolv.conf
#
# systemd-resolved has been disabled
#

options edns0 trust-ad
search homelan.local

#nameserver 172.30.1.3
nameserver 172.30.5.67
nameserver 172.30.5.253
nameserver 9.9.9.9

Yup, that is our new /etc/resolv.conf from the host node.

And every time you change the underlying host file, any new container will have that replicated into it.

Let’s us try another image and test with dig which i’m more familiar with using. (nslookup is so Windows right?)

$ docker run --rm tutum/dnsutils dig www.fujitsu.com

; <<>> DiG 9.9.5-3ubuntu0.2-Ubuntu <<>> www.fujitsu.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50190
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;www.fujitsu.com.               IN      A

;; ANSWER SECTION:
www.fujitsu.com.        592     IN      CNAME   www.fujitsu.com.edgekey.net.
www.fujitsu.com.edgekey.net. 17051 IN   CNAME   e29247.b.akamaiedge.net.
e29247.b.akamaiedge.net. 592    IN      A       23.55.58.240
e29247.b.akamaiedge.net. 592    IN      A       92.122.206.18

;; Query time: 1 msec
;; SERVER: 172.30.5.67#53(172.30.5.67)
;; WHEN: Wed Aug 04 18:29:05 UTC 2021
;; MSG SIZE  rcvd: 154

Again, we expect this as we are using the host nodes /etc/resolv.conf

Testing an already created container

So, i picked a few containers that were already running and attached to them to see what they were doing.

$ docker exec -it ttrss cat /etc/resolv.conf
search homelan.local
nameserver 127.0.0.11
options edns0 trust-ad ndots:0

huh oh! that doesn’t look right. But, of course, it was set to that configuration when the container was created. Even if you restart the container it won’t change that config. You have to rebuild it for the new /etc/resolv.conf to be applied.

Let’s test it and see what happens.

$ docker exec -it ttrss nslookup www.fujitsu.com
Server:         127.0.0.11
Address:        127.0.0.11#53

Non-authoritative answer:
www.fujitsu.com canonical name = www.fujitsu.com.edgekey.net.
www.fujitsu.com.edgekey.net     canonical name = e29247.b.akamaiedge.net.
Name:   e29247.b.akamaiedge.net
Address: 2.18.66.59
Name:   e29247.b.akamaiedge.net
Address: 2.18.66.184

So, it’s working anyway. A thing to note is that 127.0.0.11 is the docker daemon DNS IP. But where is the daemon forwarding the DNS queries to? the old, new or external DNS?

Let’s find out…

Tracing docker daemon DNS queries

If we enable debug mode for the docker daemon this will show us (amongst other things) the DNS resolution path.

see https://docs.docker.com/config/daemon/#enable-debugging

To enable docker daemon debug mode, we need to edit the daemon config and then HUP the daemon pid.

sudo vi /etc/docker/daemon.json

add in "debug": true

remember the comma at the end if you have other items in there or things break.

Reload the daemon config by doing the HUP

sudo kill -SIGHUP $(pidof dockerd)

Now tail your syslogs to see what it says

$ tail -f /var/log/syslog

Aug  4 17:09:14 ubuntu dockerd[2648773]: time="2021-08-04T17:09:14.619043921Z" level=debug msg="Name To resolve: medium.com."
Aug  4 17:09:14 ubuntu dockerd[2648773]: time="2021-08-04T17:09:14.619282016Z" level=debug msg="[resolver] query medium.com. (A) from 172.27.0.3:49208, forwarding to udp:172.30.5.67"
Aug  4 17:09:14 ubuntu dockerd[2648773]: time="2021-08-04T17:09:14.656980949Z" level=debug msg="[resolver] received A record \"162.159.152.4\" for \"medium.com.\" from udp:172.30.5.67"
Aug  4 17:09:14 ubuntu dockerd[2648773]: time="2021-08-04T17:09:14.657041963Z" level=debug msg="[resolver] received A record \"162.159.153.4\" for \"medium.com.\" from udp:172.30.5.67"

Aug  4 17:09:11 ubuntu dockerd[2648773]: time="2021-08-04T17:09:11.605801495Z" level=debug msg="Name To resolve: www.raspberrypi-spy.co.uk."
Aug  4 17:09:11 ubuntu dockerd[2648773]: time="2021-08-04T17:09:11.605903416Z" level=debug msg="[resolver] query www.raspberrypi-spy.co.uk. (AAAA) from 172.27.0.3:36524, forwarding to udp:172.30.5.67"
Aug  4 17:09:11 ubuntu dockerd[2648773]: time="2021-08-04T17:09:11.605948183Z" level=debug msg="[resolver] query www.raspberrypi-spy.co.uk. (A) from 172.27.0.3:53243, forwarding to udp:172.30.5.67"
Aug  4 17:09:11 ubuntu dockerd[2648773]: time="2021-08-04T17:09:11.606637385Z" level=debug msg="[resolver] external DNS udp:172.30.5.67 did not return any AAAA records for \"www.raspberrypi-spy.co.uk.\""
Aug  4 17:09:11 ubuntu dockerd[2648773]: time="2021-08-04T17:09:11.606675714Z" level=debug msg="[resolver] received A record \"109.203.126.236\" for \"raspberrypi-spy.co.uk.\" from udp:172.30.5.67"

What you will notice is the forwarding to udp:172.30.5.67" in those lines above, this shows the docker daemon is in fact forwarding to the right DNS server. (first one on the list in /etc/resolv.conf)

Woo Hoo!

Now we need to revert those debug changes as debug mode is very verbose.

To disable docker debug mode, we need to edit the daemon config and then HUP the daemon pid.

sudo vi /etc/docker/daemon.json

change to "debug": false

remember the comma at the end if you have other items in there.

Reload the daemon config by doing the HUP

sudo kill -SIGHUP $(pidof dockerd)

Check your syslogs to make sure it is not sending anymore verbose logs.

Fin!

Now we know where our DNS traffic is going. 🙂

Posted in Containers | Leave a comment

Admin Admin podcast – A comedy of errors.

I’m a regular listener to the Admin Admin podcast and lurk on their telegram channel, so i was quite chuffed when i was asked to appear with them and talk a little about one of the roles i have at work where i build Linux “golden images” that are used globally for our customers…

Listen here: https://www.adminadminpodcast.co.uk/ep91/

Show notes: https://www.adminadminpodcast.co.uk/ep091sn/

Click listen from here

The guys were brilliant, we had a fair few technical issues whilst doing it, but they took it on the chin and were very chilled out about it. One thing i can say is you can really tell the quality of my mic is so poor compared to the ones the guys are using so apologies about that, i really wasn’t sitting at the back of the room.

I really had a great time and loads of fun, so thanks to Al, Stuart, Jerry and of course Jon (who asked me to appear).

Posted in Automation, Security | 1 Comment

Elastic Search daily index size view

I could never find a decent way to forecast the index sizes in Elastic Search. In the Kibana GUI, under Stack Management, you can see the total index size which you need to divide by the number of nodes that the index data is stored on, and that gives you an idea, but you can’t visualise it.

So, to get some sort of rough average, I wrote some python code to do what i needed. The following is done.

  • Pick an index
  • Work out the average size of a document in that index
  • Count the number of documents in the previous day
  • daily index size = (number of docs that day) x (average doc size)

It’s not 100% but it’s going to allow you to see the index sizes and forecast some trends.

A docker image

My Elastic stack is running in docker, so an image is included with a docker-compose file, but the python code can be run on where ever you want.

https://github.com/mohclips/elastic_index_sizer

https://github.com/mohclips/elastic_index_sizer/blob/main/src/get_size.py

So you can run the code as often as you want and create a visualisation in Kibana to see the results.

Results

So after a few days running, i can see we are ingesting 9GB a day and also which index is ingesting the most often.

So the green index is the one that is ingesting more data each day.

Posted in Automation, Containers, DevOps | Leave a comment

Simple view of ElasticSearch templates

I wanted a simple way to view the Elastic Templates we use to sense check what we had in place.

The following script is what i came up with. Basically it uses the awesome ‘jq’ to flatten the dict/hash returned from the Elastic API

#!/bin/bash
ES="192.168.0.2:9200"
INDEX="stats-000001"
curl -s -XGET "http://$ES/$INDEX/_mapping " |\
jq -r –arg INDEX "$INDEX" '.[$INDEX].mappings.properties |
[leaf_paths as $path |
{"key": $path | join("."), "value": getpath($path)}] |
from_entries' |\
grep -v fields |\
sed -e 's/.type": "/: "/'
{
"@timestamp: "date",
"avg_doc_size: "float",
"daily_size: "float",
"doc_count: "long",
"index: "text",
"name: "text",
"query_count: "long",
"size_in_bytes: "long",
"tags: "text",
}
view raw output.log hosted with ❤ by GitHub

As you can see it’s much easier to view.
I think, being clever, a way to express the ‘keyword’ field of ‘text’ might be good.

onwards…

Posted in Automation, DevOps | Leave a comment

kubernetes and shared storage

Disclaimer

This post is aimed at those just starting to use kubernetes, it’s not a production worthy solution, though you can build upon it.

Installing kubernetes

I have used and tailored this post from IT Wonder Lab to create my home lab setup. It may fit the bill for you too. I like it as it uses vagrant as the infrastructure build tool along with ansible to do the k8s installation and configuration

Getting started

One of the first things that you want to do with k8s when you try out a multi-node cluster rather than the easy single all-in-one node build, is to install a load of stuff that you have done before on single node and scale it. As scaling is what its all about right 🙂 But you hit a brick wall when you realise you don’t have any shared storage available. Shared storage makes life much easier when you are clustering.

There is local storage you can use to at least get some sort of storage available: HostPath Volumes and Local Persistent Volumes but these tie your pods (containers) to a specific node which you probably didn’t notice when you have used a single node k8s build

But of course when you have a multi-node cluster then you find you can’t share files between your pods (containers) across those nodes very easily without some messing around, copying files between the nodes, outside of k8s.

Readily usable, simple and cheap shared storage

NFS of course is a really simple solution to this and it is simple to install and run. Not everyone has NetApp ONTAP available to them 🙂

There are a lot of tutorials to setup an NFS share, so i wont go into that here.
For example techmint do a good tutorial here

What you will need to be aware of and do, is make sure that your NFS exported share allow access to your nodes.

This is because the NFS provisioner we will use, allocates the volume share from the node itself, not the actual pod. So it’s a little like Local Storage. The pod talks to the local k8s node it is running on and that node talks to the remote NFS server.

Here we can see that the NFS server (homeserver) allows access to two subnets one of which is the k8s node subnet.

$ showmount -e homeserver
Export list for homeserver:
/data/nfs_share 192.168.50.0/24,172.30.5.0/24

Installing a k8s package manager to do the heavy lifting – helm

Next we need to install helm – helm is brilliant, it allows reuse of code via a powerful package manager. Basically, someone else has done all the hard work of writing an installer for the NFS provisioner we will be using.

Install helm onto your control plane instance (manager) from the helm website

$ curl https://baltocdn.com/helm/signing.asc | sudo apt-key add -
$ sudo apt-get install apt-transport-https --yes
$ echo "deb https://baltocdn.com/helm/stable/debian/ all main" | sudo $ tee /etc/apt/sources.list.d/helm-stable-debian.list
$ sudo apt-get update
$ sudo apt-get install helm

Installing the NFS Provisioner

Once helm is installed we need to install the NFS provisioner.

Here is the code repository:

https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner

But don’t panic, helm takes care of this for us.

This is all it takes to tell your control plane manager to make that the storage provisioner available.

$ helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/

Once the provisioner is enabled, we need to actually install and configure it. There are a lot of options that can be selected but the ones below will give you a simple enough configuration to be able to play.

$ helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
--set nfs.server=172.30.5.67 \
--set nfs.path=/data/nfs_share \
--set persistence.storageClass=nfs-client \
--set persistence.size=10Gi

We have told the provisioner where out shared storage is (IP and share name), given it a k8s Storage Class name and how big we want it to be. Really simple.

Now, let’s check to see if we can see that Storage Class

$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client cluster.local/nfs-subdir-external-provisioner Delete Immediate true 34s

A tip here is to make that Storage Class the default for your cluster.

$ kubectl patch storageclass nfs-client -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/nfs-client patched

Lets check again and make sure it is the default

$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client (default) cluster.local/nfs-subdir-external-provisioner Delete Immediate true 3m3s

Now you can see that the flag (default) is set.

Uh-oh, I broke something

uh oh something is not right, the pod that manages the provisioner is not starting. You can see below it is stuck in ContainerCreating for 6mins now. That is not right.

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nfs-subdir-external-provisioner-76b4bc6f7d-sxvdr 0/1 ContainerCreating 0 6m21s

Looks like we have an issue somewhere, lets check the logs (events)

$ kubectl get events
52s Warning ProvisioningFailed persistentvolumeclaim/test-claim-client failed to provision volume with StorageClass "nfs-client": unable to create directory to provision new pv: mkdir /persistentvolumes/default-test-claim-client-pvc-0335846f-3e6f-4cc8-aef3-2699365165be: permission denied

okay that looks bad. We need to dig further through the events.

Warning FailedMount 52s kubelet MountVolume.MountDevice failed for volume "pvc-0335846f-3e6f-4cc8-aef3-2699365165be" : NFSDisk - mountDevice:FormatAndMount failed with mount failed: exit status 32

After a bit of Googling for that exit status 32, it seems the nodes do not have the underlying NFS storage drivers installed LOL, so lets do that now on each node. I did the control plane manger too.

sudo apt-get install -y nfs-common

This is what happens when you take someone else’s Infrastructure as Code and do not check it properly. The virtual box images i used ubuntu/focal64 did not have NFS installed. Of course we can update that ansible code from IT Wonder Lab to do this for us next time around. 🙂

ok, lets uninstall that helm chart and reinstall it again.

$ helm uninstall nfs-subdir-external-provisioner
release "nfs-subdir-external-provisioner" uninstalled

$ helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner --set nfs.server=172.30.5.67 --set nfs.path=/data/nfs_share --set persistence.storageClass=nfs-client --set persistence.size=10Gi
NAME: nfs-subdir-external-provisioner
LAST DEPLOYED: Sat Mar 13 12:23:53 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

Lets check that pod again

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nfs-subdir-external-provisioner-76b4bc6f7d-5bjgg 1/1 Running 0 17s

Woo Hoo! simples!

Now create a PVC and pod to use this shared storage

The next step is to allocate this storage. This is done via a PVC, or Persistent Volume Claim.

PersistentVolumeClaim (PVC) is a request for storage by a user. … Claims can request specific size and access modes (e.g., they can be mounted ReadWriteOnce, ReadOnlyMany or ReadWriteMany, see AccessModes).

We do this by creating some YAML to define the state we want.

Create a file like the one below. You can see that we allocated 10Gi on the NFS server, but we only want 1Mi of that for our test claim.

$ cat test-pvc.yml 
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-claim-client
  annotations:
    nfs.io/storage-path: "test-path" # not required, depending on whether this annotation was shown in the storage class description
spec:
  storageClassName: nfs-client
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Mi

In the above we have done a few interesting things.

In the spec we defined the Storage Class we have created above.
The access mode, in this case read/write and to allow other pods to access the same storage (Many).
Also it’s size 1Mi

Now we apply it to the cluster.

$ kubectl apply -f test-pvc.yml 
persistentvolumeclaim/test-claim-client created

Let’s check

$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
test-claim-client Bound pvc-1213d39e-b623-44db-a1f1-f3835197a212 1Mi RWX nfs-client 14s

If we look on the actual NFS server we can see it, and the UUIDs match

/data/nfs_share/ $ ls -la
total 12
drwxrwxrwx 3 root   root    4096 Mar 13 13:12 .
drwxr-xr-x 7 root   root    4096 Feb 26 16:08 ..
drwxrwxrwx 2 nobody nogroup 4096 Mar 13 13:11 default-test-claim-client-pvc-1213d39e-b623-44db-a1f1-f3835197a212

Let’s check inside, as we expect there is nothing there.

s -l default-test-claim-client-pvc-1213d39e-b623-44db-a1f1-f3835197a212/
total 0

Access the storage from a pod

Lets create a pod to use that and put something inside the storage.

Again, we create some yaml, to describe a pod and it’s attributes.

$ cat test-pod.yml 
kind: Pod
apiVersion: v1
metadata:
  name: test-pod
spec:
  containers:
  - name: test-pod
    image: gcr.io/google_containers/busybox:1.24
    command:
      - "/bin/sh"
    args:
      - "-c"
      - "touch /mnt/SUCCESS && exit 0 || exit 1"
    volumeMounts:
      - name: nfs-pvc
        mountPath: "/mnt"
  restartPolicy: "Never"
  volumes:
    - name: nfs-pvc
      persistentVolumeClaim:
        claimName: test-claim-client

I wont go into too much detail on how this pro is created, thats a whole blog post on its own. We use a small image (busybox) and get it to write a file to the pods file system. The pods filesystem has mounted within it (/mnt) our new PVC.

It will do this and exit (stop), thus the file that is written to is actually written to NFS and we should see it there.

$ kubectl apply -f test-pod.yml 
pod/test-pod created

Let’s check it

$ kubectl get pods
NAME                                               READY   STATUS      RESTARTS   AGE
nfs-subdir-external-provisioner-76b4bc6f7d-5bjgg   1/1     Running     0          72m
test-pod                                           0/1     Completed   0          36s

Yes, the pod has run and completed. Let’s check the NFS share folder on the NFS server, and we can see the file was created.

$ ls -l default-test-claim-client-pvc-1213d39e-b623-44db-a1f1-f3835197a212/
total 0
-rw-r--r-- 1 nobody nogroup 0 Mar 13 13:35 SUCCESS

Let’s clear up that pod.

$ kubectl delete -f test-pod.yml 
pod "test-pod" deleted

$ kubectl get pods
NAME                                               READY   STATUS    RESTARTS   AGE
nfs-subdir-external-provisioner-76b4bc6f7d-5bjgg   1/1     Running   0          76m

Now check the NFS share folder on the NFS server again and we can see the file is still there, even though the pod is gone. So depending on how you define your PVC, you may need to do housekeeping.

$ ls -l default-test-claim-client-pvc-1213d39e-b623-44db-a1f1-f3835197a212/
total 0
-rw-r--r-- 1 nobody nogroup 0 Mar 13 13:35 SUCCESS

Let’s remove the PVC and see what happens then

$ kubectl delete -f test-pvc.yml 
persistentvolumeclaim "test-claim-client" deleted

$ kubectl get pvc
No resources found in default namespace.

What has happened to the folder on the NFS server?

$ ls -l default-test-claim-client-pvc-1213d39e-b623-44db-a1f1-f3835197a212/
ls: cannot access 'default-test-claim-client-pvc-1213d39e-b623-44db-a1f1-f3835197a212/': No such file or directory

Oh no! What has happened?

$ ls -l 
total 4
drwxrwxrwx 2 nobody nogroup 4096 Mar 13 13:35 archived-pvc-1213d39e-b623-44db-a1f1-f3835197a212

As you can see it has been archived. Take a peek inside…

$ ls -l archived-pvc-1213d39e-b623-44db-a1f1-f3835197a212/
total 0
-rw-r--r-- 1 nobody nogroup 0 Mar 13 13:35 SUCCESS

Ah phew! nothing lost.

What about scaling?

Let’s use a deployment definition and scale that up.
We use the definition below as it uses some environment variable magic to allow us to write to a file that contains the pods IP, thus as the pods scale and increase in number it won’t overwrite any of the other pods files.

$ cat scaled-test-pod.yml 
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: test-nfs
  name: test-nfs
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test-nfs
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: test-nfs
    spec:
      containers:
      - name: test-pod
        image: gcr.io/google_containers/busybox:1.24
        command:
          - "/bin/sh"
        args:
          - "-c"
          - "touch /mnt/SUCCESS-$MY_POD_IP && sleep 3600 || exit 1"
        # from https://stackoverflow.com/a/58800597/7396553
        env:
            - name: MY_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: MY_POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: MY_POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: MY_POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: MY_POD_SERVICE_ACCOUNT
              valueFrom:
                fieldRef:
                  fieldPath: spec.serviceAccountName
        volumeMounts:
          - name: nfs-pvc
            mountPath: "/mnt"
      volumes:
        - name: nfs-pvc
          persistentVolumeClaim:
            claimName: test-claim-client

Let’s apply just one pod first and see the output

$  kubectl apply -f scaled-test-pod.yml 
deployment.apps/test-nfs created

$ kubectl get pods
NAME                                               READY   STATUS    RESTARTS   AGE
nfs-subdir-external-provisioner-76b4bc6f7d-5bjgg   1/1     Running   0          108m
test-nfs-7d6d956c4-h9t9z                           1/1     Running   0          19s

And check the underlying NFS data store

/data/nfs_share/default-test-claim-client-pvc-451af214-10b8-48c2-ac6a-704bbf9f6339 $ ls -la
total 8
drwxrwxrwx 2 nobody nogroup 4096 Mar 13 14:11 .
drwxrwxrwx 4 root   root    4096 Mar 13 13:46 ..
-rw-r--r-- 1 nobody nogroup    0 Mar 13 13:52 SUCCESS
-rw-r--r-- 1 nobody nogroup    0 Mar 13 14:11 SUCCESS-192.168.122.139

So we have a file with the IP appended. 🙂

Let’s scale that up

$ kubectl get deployments
NAME                              READY   UP-TO-DATE   AVAILABLE   AGE
nfs-subdir-external-provisioner   1/1     1            1           110m
test-nfs                          1/1     1            1           2m16s
$ kubectl scale --replicas=4 deployment test-nfs
deployment.apps/test-nfs scaled
$ kubectl get pods
NAME                                               READY   STATUS    RESTARTS   AGE
nfs-subdir-external-provisioner-76b4bc6f7d-5bjgg   1/1     Running   0          110m
test-nfs-7d6d956c4-h9t9z                           1/1     Running   0          3m5s
test-nfs-7d6d956c4-h9tzf                           1/1     Running   0          20s
test-nfs-7d6d956c4-qvvff                           1/1     Running   0          20s
test-nfs-7d6d956c4-zpg87                           1/1     Running   0          20s

So we have 4 pods running, let’s check the NFS storage…

total 8
drwxrwxrwx 2 nobody nogroup 4096 Mar 13 14:14 .
drwxrwxrwx 4 root   root    4096 Mar 13 13:46 ..
-rw-r--r-- 1 nobody nogroup    0 Mar 13 13:52 SUCCESS
-rw-r--r-- 1 nobody nogroup    0 Mar 13 14:11 SUCCESS-192.168.122.139
-rw-r--r-- 1 nobody nogroup    0 Mar 13 14:14 SUCCESS-192.168.122.140
-rw-r--r-- 1 nobody nogroup    0 Mar 13 14:14 SUCCESS-192.168.122.141
-rw-r--r-- 1 nobody nogroup    0 Mar 13 14:14 SUCCESS-192.168.122.4

Excellent, all 4 pods are writing to the shared NFS PVC we created.

Let’s clean up…

$ kubectl delete -f scaled-test-pod.yml --force
deployment.apps "test-nfs" force deleted

$ kubectl delete -f test-pvc.yml 
persistentvolumeclaim "test-claim-client" deleted

Additional NFS checks on the worker nodes

nfsiostat is useful to see the details on the NFS shares in use

$ nfsiostat

172.30.5.67:/data/nfs_share mounted on /var/lib/kubelet/pods/c9096597-9a3d-43a0-a280-241061ea89b9/volumes/kubernetes.io~nfs/nfs-subdir-external-provisioner-root:

           ops/s       rpc bklog
           0.033           0.000

read:              ops/s            kB/s           kB/op         retrans    avg RTT (ms)    avg exe (ms)
                   0.000           0.000           0.000        0 (0.0%)           0.000           0.000
write:             ops/s            kB/s           kB/op         retrans    avg RTT (ms)    avg exe (ms)
                   0.000           0.000           0.000        0 (0.0%)           0.000           0.000

NFS checks on the NFS server

On the NFS Server, checking the proc space in later kernels is very useful.

$ sudo ls -la /proc/fs/nfsd/clients/
[sudo] password for nick:
total 0
drw------- 3 root root 0 Mar 1 21:14 .
drwxr-xr-x 3 root root 0 Feb 26 16:05 ..
drw------- 2 root root 0 Mar 1 21:14 5

In the above we can see the client ids, and we can check these to see information about those clients as seen below.

$ sudo cat /proc/fs/nfsd/clients/5/info
clientid: 0x942ec40460391c55
address: "172.30.5.32:918"
name: "Linux NFSv4.2 k8s-n-1"
minor version: 2
Implementation domain: "kernel.org"
Implementation name: "Linux 5.4.0-66-generic #74-Ubuntu SMP Wed Jan 27 22:54:38 UTC 2021 x86_64"
Implementation time: [0, 0]

Troubleshooting

The page is very useful https://learnk8s.io/troubleshooting-deployments

Fin!

So, hopefully this rather long post, and it helps new users of k8s to enable a simple shared file store as NFS.

And now as its stopped raining I think we will take the kids and dogs out for a walk. 🙂

Posted in Automation, Containers, DevOps, Uncategorized | 1 Comment

Blameless Culture – High performing teams

I was pointed at this wonderful article (though quite old in internet time, 2017 😁) about how high performing teams require a blameless culture to enable them to really fly.

https://hbr.org/2017/08/high-performing-teams-need-psychological-safety-heres-how-to-create-it

So, many good things written in that article.

Posted in Uncategorized | 1 Comment

Firewall rules for k8s containers

or what they are really are “Network Policies”…

Kudos to my DevOps colleague tenhishadow for this (i follow him on social media, you should too).

https://github.com/ahmetb/kubernetes-network-policy-recipes

Nice and simple with diagrams.

Posted in Uncategorized | 1 Comment

Terraform pre-commit hooks

Great article here, go read it if you use Terraform…

https://levelup.gitconnected.com/produce-reliable-terraform-code-with-pre-commit-hooks-d263bc332e6a

I would have loved these as a VScode extension. But pre-commit hooks are fine by me, wrap them into your pipeline (if you haven’t already ).

https://gist.github.com/guivin/7fa3e4212df2e7f1c31f04a242e8dbba#file-pre-commit-config-yaml

Posted in Uncategorized | Leave a comment