Some interesting questions asked and answered…
A great post on how Gremlin learnt from their latest service failure and made sure it never happened again. They have their own version of Chaos Monkey type service.
Actually Getting the Metrics from Common Services
A great post on what you should be monitoring.
I’ve been doing a bit of Azure work of late. It makes an interesting change to AWS and OpenStack. I might get into trouble here (being a 20 year veteran of Linux) but i really like it. Really polished, good interface, fast. All the things you want so that you can concentrate on build the services and not have to fight the infrastructure. 🙂
So here below is a short demo script, to show how easy it is. 3 mins 49secs to create the lot from scratch isn’t too shabby.
Don’t forget to create your authentication credentials: https://docs.microsoft.com/en-us/azure/virtual-machines/linux/ansible-install-configure#create-azure-credentials I use a rc file just like in OpenStack to store them and call the Environment Variables when i execute the playbook.
Don’t forget to install the Ansible Azure dependencies
pip install ansible[azure] if you need them.
The Azure CLI tooling (az) is worth looking at too, definitely if you are used to the openstack CLI. See https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-yum?view=azure-cli-latest
Technical debt is often unavoidable, but sometimes it is desirable to make conscious decisions to take on more of it. The key is making sure the benefits of technical debt outweigh the cost. That means making sure the cost of cleaning it up is either fixed or grows very, very slowly.
The most effective way to do that in my opinion is to spend time thinking things through at a high level. Thinking is a lot faster than writing code and making sure it works. By avoiding shortcuts in this thought process and making good decisions, one is actually capable of taking on more technical debt without severely impacting future development speed. That debt just happens to be planned rather than taken accidentally.
Stage 1 – DevOps Denial and Misinterpretation
Stage 2 – Automation for the Sake of Automation
Stage 3 – Collaboration and Reorganization
Stage 4 – A High Performing Organization
DevOps is all about the right balance of people, processes, and tools. Problems will always pop-up between these components. With that in mind, problem-solving (bugs are kind of a problem, right?) is fundamental to a healthy agile process.
The process that you build in order to treat bugs as part of automation at scale should be designed with the following ideas in mind:
- Provide the right visibility of bugs (when identified).
- Sufficient enough in order to fix them fast.
- Triaging is essential to determine where to put focus in CI/CD.
- Communicate quality findings with regards to bugs across teams for full alignment.
- Measure bug fixing process using unified metrics.
- Trust – a key factor in order to establish a sustainable process between teams which eventually leads to a stage where everyone take responsibility for bugs.
The Fundamental Law of Repo Topology is that you must not have cyclical dependencies between repos. If you do you are in for a world of hurt when you have to perform a series of non-atomic changes to update libraries.1 Going with a monorepo has the advantage that you never have this problem because there’s only one repo. On the other hand, working in a monorepo implies certain things about the rest of your development process and even philosophy of development.
A great post on the differences and gotchas of repos…