Help Each Other

Working on technical problems becomes SRE

18 min readMay 27, 2021

This story is a story that I just experienced when a friend of mine asked me to do a technical test on a job application as a Site Reliability Engineer at a well-known ecommerce company with orange color. This question is divided into 4 parts. but the part I want to divide is only 2 parts, namely:
1. Programming / algorithm tests
2. General knowledge regarding SRE

Programming Test

At the Programming test stage, in my opinion the problem is quite easy and there is only 1 question with a processing time of 15 minutes. if i can conclude this programming problem is type of problem, “string manipulation” or string manipulation.
Problem set:
“my-name-jansutris-apriten-ancient
street-addressable
a child who is diligent and good at saving “
Test Case 0:
change the first beginning of each sentence to capital letters
change character “-” to space
add the character “.” for each new line in the string
So I offer a solution with a complete program as follows:

L = ["jeeks-cikcik\n", "miss-dor\n", "bada-aeeks\n"]dot = "." 
 
# Writing to file
with open("myfile.txt", "w") as fp:
    fp.writelines(L)
 
 
# using readlines()
count = 0
 
with open("myfile.txt") as fp:
    Lines = fp.readlines()
    for line in Lines:
        kata = line.strip()
        CapitalizedString = kata.capitalize()
        a_string = CapitalizedString.replace('-', ' ')
        print(a_string+dot)

because there is only 1 problem-set, then I enter the string set into a variable with an array data type so that this variable can hold several strings separated by newlines (\ n). Next, I write the string into a file called myfile.next. Then, I mark the strings on each line using python’s built-in function called strip ().

The strip () method returns a copy of the string by removing both the leading and the trailing characters (based on the string argument passed). The strip () method removes characters from both left and right based on the argument (a string specifying the set of characters to be removed).

Then, according to test case 1, then I change the first letter / sentence / string to capital letters. then, if the interpreter finds the character “-” it will be converted to a space or “”. Finally, the word that was changed earlier was manipulated again. when it is the end of a line, it will end / add a full stop or “."

Output

General knowledge regarding SRE

There are 2 questions regarding the following sub-tests:

1. What are the pros and cons of using a Virtual Machine (VM) compared to a Container?

Before the rise of containerization, virtual machines were the way to go if you needed to isolate environments within a physical infrastructure.

However, when Docker Inc. released its containerization software in 2013, the spotlight shifted to containers.

Since then, we’ve seen an increase of interest in containers, and changing the cloud computing landscape in the process. More and more developers are interested in the agile development process offered by containers.

Both virtual machines and containers are designed with the concept of optimizing resources from an existing physical infrastructure in mind.

However, how do they differ? Is one superior over the other? Which one should you use for your next project?

In this post, we try to answer these questions by getting to know more about containers and virtual machines.

What is a Virtual Machine (VM)?

A virtual machine (VM) is an isolated environment that emulates a computer system with access to physical hardware resources.

Virtual machines run on top of the hypervisor software, which imitates the physical infrastructure and divides the resources into multiple virtual machines. The hypervisor is also referred to as the host machine or a virtual machine monitor.

Each VM seems to be running on bare-metal hardware, giving the impression that there are multiple PCs running when they are actually supported by individual physical servers.

VMs tend to be bulky and be many gigabytes in size because each VM contains its own guest operating system, kernel, binaries, libraries, and its application.

What are Containers?

Containers create isolated environments in a physical server by virtualizing the host operating system and running packaged applications on top of it.

Instead of virtualizing the hardware like virtual machines, containers virtualize the OS. It’s built on top of a host OS kernel and usually shares its libraries and binaries.

Because it shares most of its necessities, containers only pack the application and its dependencies. They’re much lighter than VMs and only megabytes in size.

There are two parts you need to deploy a container-based application: a software lille Docker and Amazon ECS to build a container and a container orchestration platform.

There are a lot of containerisation and orchestration platforms. The most popular set up commonly used for complex applications are the pairing of Docker as the containerisation platform and Kubernetes, a container orchestration platform developed by Google.

A container consists of a master node, worker nodes and pods. A master node is the control that connects your requirements with the rest of the system. Pods are where your containers are located, while worker nodes are where pods will be deployed to.

Containers vs. Virtual Machines

After understanding what containers and virtual machines are, now it’s time to compare the characteristics of both virtualization technologies.

Container

Pros:

Portability

As a part of a distributed system, containers are highly portable.

Because containers pack microservices and their dependencies in a small-sized package, it’s easy to move containers around, even across environments, such as the public cloud, private cloud, and hybrid cloud, as well as the multi-cloud and bare-metal environments.

Effective resource usage

Code packaged within containers share most of the dependencies needed to run the containers, including an operating system, libraries, and frameworks.

Unlike in virtual machines, there’s only one copy of necessary files in each hardware, leading to more effective resource usage. This also results in a lighter container, which means you can fit more containers within a physical server.

Easier to maintain

As containers use a microservices-based architecture, your code is broken down into manageable pieces that can be handled individually. Hence, you can update and maintain a container without worrying it will affect other parts of your application.

Highly Scalable

Container orchestration platforms are created to help you manage your containers. Container orchestrators, like Kubernetes or Docker Swarm, automate most of your container management process, including scaling, networking, and deployment.

Containerized applications are highly scalable, and they can scale up and down quickly by spinning up new nodes and/or pods when the needs call for it.

Cons:

Lacking Security measures

Containers provide lightweight isolation from the host OS and containers within the same system. This leads to a weaker security boundary compared to virtual machines.

However, mature container users are paying more attention to security, as they try to improve collaboration between DevOps and Security, according to StackRox.

Runs only one OS

This can be a benefit if you only use one OS, but if you need to be able to use it across different OS’s this is a negative. You can run an earlier version of the same OS using lightweight virtual machines.

Popular container providers:

RedHat Openshift
Google Kubernete Engine
Docker
Linux Containers (LXC, LXD, CGManager)
Mesos
Windows Server Containers

Virtual Machine

Pros

Hard security boundaries

VMs provide more isolation between neighboring systems, as you’re using a separate operating system from other machines in the same physical server. Whereas in containers, you’re operating within one OS, and flaws can affect the entire system.

The complete isolation in VMs results in better security, and vulnerabilities that are harder to exploit. If you’re not in control of the environment you’re in, using VMs, which has a stronger boundary is preferable.

Even so, leading container providers are dedicating more effort into security over time, so you might see containers with the same level of security as VMs soon.

Can emulate multiple OS

As you can run any operating system you want within a virtual machine, you don’t need to buy another hardware every time you need a different OS.

For example, when you need to simulate and test your applications across different operating systems to gain a clearer view of your application’s capability, using a virtual machine saves you from buying multiple different hardware.

More resources

The resources allocated for a virtual machine are far more than what’s allocated for containers. That’s why VMs are more suitable for resource-intensive tasks. Tasks with larger sizes and a long lifecycle are more suitable to use with VMs rather than containers.

You can run resource-intensive tasks on containers. However, you need to consider the cost of using containers as opposed to using a VM. Line up your requirements beforehand and do some research to discover which option will be more cost-effective for your application.

Cons

Not as portable

Virtual machines are gigabytes-sized chunks of software.

Naturally, it’s harder to move a virtual machine when compared to a container, because the applications run on a virtual machine that is highly dependent on the OS and the emulated hardware it runs on.

Moving virtual machines across data centers or the cloud will be harder than if you’re using containers.

Ineffective resource usage

Often, the resources provided by virtual machines are too much for running a single application.

However, once a VM is assigned to a resource, it takes up the whole space, even when it needs less. This creates idle power that you can use elsewhere if your planning is inaccurate.

Furthermore, VMs don’t only copy the operating system instances, it also contains libraries, binaries, and copies of the virtual hardware needed by the OS. Repetitive files suck up a large part of the RAM and CPU resources of the servers.

Harder to maintain the OS

As there are multiple operating systems in one VM, you need to update and maintain each OS separately. This is a time consuming and exhausting task, especially if you have multiple VMs.

Popular VM Providers

VMware
VirtualBox
Xen
Hyper-V
KVM

So, which one should you use?

VMs and containers have their own use cases. To decide which method you should use, you should look at the requirements of your application.

The container technology is rising in popularity thanks to its high scalability, effective resource management, and agile development cycle.

A recent study shows that the adoption of application containers will grow 30% annually by 2027.

Here are a few scenarios containers would be perfect for:

Your application has a multiservice architecture
You want to minimize the number of servers you work on
Your project moves through several different environments
You’re building cloud-native applications
Your project is developed on a similar environment as production

On the other hand, virtual machines are far from obsolete. It’s still a reliable way to store your application securely and have a longer life cycle than containers.

Here are a few cases where virtual machines work well:

Your application has a monolithic architecture
You need to run different operating systems
You need a platform with persistent storage
You need to separate systems for security purposes
You need the full functionality an operating system can offer

You can also combine both virtual machines and containers to create a more suitable setup for your application. Combining VM’s flexibility and container’s efficiency also improves isolation and functionality.

Virtual Machines often become the host OS for containers.

Try Out Engine Yard

VMs and containers have the same goal: reduce overhead costs, agile software development, and optimizing resources. However, they take a different approach to accomplish these goals.

VMs and containers have their own use cases, and sometimes it’s tricky to figure out which option you should use for your application.

2. Are there differences in the job roles of SRE and DevOps itself?

Google SRE vs DevOps basics: two sides of the same coin

In essence, two methodologies do the same thing: They try to bridge the gap between development and operations teams. Both aim at improving the release cycle and achieving better product reliability. But before diving deeper into the differences and similarities between them, let’s think back to when and for what reason SRE and DevOps appeared at all.

SRE vs DevOps comparison.

What is SRE?

Site Reliability Engineering or SRE is a unique, software-first approach to IT operations supported by the set of corresponding practices. It originated in the early 2000s at Google to ensure the health of a large, complex system serving over 100 billion requests per day. In the words of Ben Treynor Sloss, Google’s VP of engineering who coined the very term SRE, “It’s what happens when you ask a software engineer to design an operation function.”

The primary focus of SRE is system reliability, which is considered the most fundamental feature of any product. The pyramid below illustrates elements contributing to the reliability, from the most basic (monitoring) to the most advanced (reliable product launches).

The hierarchy of service reliability needs, according to Google’s SRE book. Source: Site Reliability Engineering.

Once the system is “reliable enough.” SRE shifts efforts to adding new features or creating new products. It also puts much attention on tracking results, making measurable performance improvements, and automating operations tasks.

What is DevOps?

The term DevOps (short for development and operations) was coined in 2009 by Patrick Debois, Belgian IT consultant and Agile practitioner. Its core principles are similar to those of SRE: application of engineering practices to operations tasks, measuring results, and reliance on automation instead of manual work. But its focus is much broader.

While SRE concentrates on keeping services running and available to users, DevOps aims to cover the entire product life cycle, from design to operations, making all processes continuous after Agile methodologies. Such end-to-end continuity is paramount to reducing time to market and making rapid changes.

A DevOps lifecycle. Source: Medium

Another difference from SRE is that DevOps emerged in the first place as a culture and mindset that didn’t specify how exactly to implement its ideas. It’s often viewed as a generalization of key SRE methods so that they can be used by a wider range of organizations. Likewise, SRE can be seen as an embodiment of DevOps visions. The next section describes the interactions between two methodologies in more detail.

Practice vs mindset: how Site Reliability Engineering implements DevOps philosophies

Broadly speaking, DevOps describes what needs to be done to unify software development and operations. Whereas SRE prescribes how this can be done. DevOps culture is based on several pillars that are covered by corresponding SRE practices.

What SRE offers to solve DevOps tasks.

Five key DevOps pillars are

No more silos. The idea stems from the fact that a lack of collaboration and information flow across teams reduces productivity.
Failures are normal. DevOps prescribes learning from mistakes rather than spending resources on an unattainable goal — preventing all failures.
The change should be gradual. Changes are most effective and low-risk when they are small and frequent. This pillar combined with automated testing of small batches of code and rollback of bad ones underlies the concepts of continuous integration and continuous delivery (CI/CD).
The more automation the better. DevOps focuses on automation to deliver updates faster and free up hours of manual effort.
Metrics are crucial. Each change should be measured to understand whether it brings the results you expect.

Now let’s see what SRE offers to put these pillars into practice.

Treat operations as a software problem

Corresponds to “no more silos,” “the more automation the better”

SRE utilizes software engineering to solve operations problems. In other words, software solutions are created to instruct a computer how to perform IT operations automatically, without human intervention. SRE specialists apply the same tools that developers typically use and share responsibility for product success with a software development team.

Minimize toil

Corresponds to “the more automation the better,” “metrics are crucial”

In terms of SRE, toil is manual, repetitive work devoid of long-term value and related to running a production service. Examples of toil are

regular password resets,
manual releases,
reviewing non-critical alerts, and
manual scaling of infrastructure.

SRE’s rule of thumb is to keep toil below 50 percent of engineers’ work time. Once the threshold is exceeded, the team needs to identify the top source of toil. Then engineers develop a software solution to automate some tasks and achieve a healthy work balance. A good practice is to eliminate a bit of toil each week.

Measure uptime and availability of the system

Corresponds to “metrics are crucial”

According to SRE, a key precondition for a system’s success is availability. If your service is unavailable at a certain time, it can’t perform its functions. To measure the availability and thus ensure that everything goes right, SRE provides three metrics.

1. Service-Level Indicator (SLI) is a quantitative measurement of a system’s behavior. The main SLI for most services is request latency — or the time needed to respond to a request. Other commonly used SLIs are throughput of requests per second and errors per request. These metrics are usually collected within a certain period of time and then converted into rates, averages, or percentiles.

2. Service-Level Objective (SLO) is a target range of values set by stakeholders (say, the average request latency must be under 100 milliseconds). The system is supposed to be reliable if its SLIs continuously meet SLOs.

3. Service-Level Agreement (SLA) is a promise to customers that your service will meet certain SLOs over a certain period. Otherwise, a provider will pay some kind of penalty. SRE isn’t directly involved in setting SLAs. However, it helps to avoid missed SLOs and the financial losses they entail.

Set error budget

Corresponds to “failures are normal,” “changes should be gradual,” “metrics are crucial”

SRE doesn’t aim to hit 100-percent reliability as this goal is unrealistic. “…100 percent is not the right reliability target, Ben Treynor confirms, because no user can tell the difference between a system being 100 percent available and, let’s say, 99.999 percent available.” Moreover, upon achieving a certain level, a further increase in reliability doesn’t benefit the system, restricting the speed and frequency of updates.

So, the goal of SRE is to deliver sufficiently good services without sacrificing the ability to deliver new features often and fast. This approach tolerates the acceptable risk of failure called the error budget.

In Google, the error budget is defined quarterly, based on SLOs. It gives a clear vision of how much risk is allowed within a quarter. Once the agreed-upon metric is exceeded, the team shifts its focus from the development of updates to improving reliability.

Reduce the cost of failure

Corresponds to “failures are normal,” “changes should be gradual”

The later in the product life cycle the error is detected, the higher the cost of fixing it. SRE recognizes this fact and tries to solve problems as early as possible using the following practices.

Rollback early, rollback often. When an error is revealed or even suspected in a release, the team rolls back first and explores the problem second. This approach reduces the Mean Time to Recovery (MTTR) — or the average time needed to recover your service from a failure.

Canary all rollouts. Canary release is a method to make the rollout process safer. An update is introduced to a small part of users first. They test it and provide feedback. After all required changes are made, the release is made available to everybody. Canary releases cut the Mean Time to Detect (MTTD) that reflects how long it usually takes your team to detect an issue. Besides, the method reduces the number of customers affected by system failures.

Create and maintain playbooks

Corresponds to “no more silos,” “automate everything”

Playbooks or runbooks are documents describing diagnostic procedures and ways to respond to automated alerts. They reduce Mean Time to Repair (MTTR), stress, and the risk of human error.

Entries in playbooks are out of date as soon as the environment changes. So, when it comes to daily releases, these guides need daily updates as well. Considering that creating good documentation is hard, some SREs advocate creating only general instructions that change slowly. Others insist on detailed, step-by-step playbooks to eliminate variability.

Google’s SRE Workbook recommends implementing automation if a playbook contains a list of commands engineers run every time in the case of a particular alert.

SRE vs DevOps jobs: a team of multitaskers or a cross-functional team

In recent years, SRE and DevOps roles have become super-important in many companies. But this doesn’t mean everyone agrees upon what exactly SRE and DevOps teams do. Likewise, there is no universal description for DevOps and Site Reliability Engineer jobs. Below, we’ll try to highlight the most essential aspects of DevOps and SRE functions.

Site Reliability Engineer role and SRE team

A typical SRE team is composed of either software developers with expertise in operations or IT operations specialists with software development skills. At Google, such teams are usually a fifty-fifty mix of those who have more of a software background and those who have more of a systems background. Other companies form SRE teams by adding software engineering skill sets and approaches to existing operations practices and personnel.

Besides operations and software engineering, areas of experience relevant to the SRE role encompass monitoring systems, production automation, and system architecture.

All members of an SRE team share responsibility for code deployment, system maintenance, automation, and change management. And functions of each individual Site Reliability Engineer may change over time, depending on the current focus of the team — the development of new features or improvement of the system’s reliability.

DevOps Engineer role and DevOps team

Unlike an SRE team where each member is a kind of jack-of-all-trades, a DevOps team contains different professionals with specific duties.

The team structure varies from company to company and usually includes (but is not limited to) the following specialists:

a Product Owner who understands how the service should work to bring value to customers,
a Team Lead delegating tasks across other members,
a Cloud Architect building cloud infrastructure for smooth running of services in production,
a Software Developer writing code and unit tests,
a QA Engineer implementing quality methods for product development and delivery,
a Release Manager scheduling and coordinating releases, and
a System Administrator in charge of cloud monitoring.

Of course, this is not an exhaustive list of roles in DevOps. Quite often such a cross-functional team invites a Site Reliability Engineer to ensure the availability of services. Typically, when SREs work as a part of a DevOps team, they have a narrower range of responsibilities than in fully-committed SRE teams.

No matter the number and background of team members, obviously DevOps is not a role or person in contrast to SRE. However, as of writing this article, there were nearly 25,000 DevOps Engineer jobs posted on Glassdoor — which is comparable with almost 33,000 Site Reliability Engineers sought on the same website.

A brief examination of vacancies on Glassdoor reveals that background, responsibilities, and skills required for both jobs have a lot of overlap. It seems that employers often use these job titles interchangeably.

DevOps vs SRE engineer jobs comparison based on vacancies published by Glassdoor.

However, average yearly salaries are somewhat higher among SREs — thanks to the financial data submitted by employees working for tech giants like Google, LinkedIn, Twitter, Microsoft, Apple, and Adobe, to name a few.

SRE vs DevOps tools: the same solutions do for both

Matthew Flaming, VP of software engineering at New Relic Application Software Monitoring, describes SRE as “the purest distillation of DevOps principles into a single role.” That being said, an SRE and DevOps toolset can be very much the same and typically include the following positions.

Containers and microservices facilitate creating a scalable system. So, Docker for building and deployment containerized apps and Kubernetes for container orchestration come as integral parts of SRE/DevOps toolchains.

CI/CD tools like Jenkins or CircleCI support the idea of gradual change, enabling teams to build, test, and deploy code faster.

Infrastructure as Code (IaS) tools correspond exactly with the “automate everything” concept. Terraform, AWS CloudFormation, Puppet, Chef, and Ansible are among the most widely-used solutions to automate infrastructure deployments and configurations.

Automated functional and non-functional testing in production can be performed with the help of Selenium, Zephyr, Nexus Lifecycle, Veracode, and other tools.

Resilience testing is essential to ensure the ability of the system to withstand real-life conditions. Popular options for this task are Chaos Monkey by Netflix, ChaosIQ, and Gremlin.

Monitoring systems play a crucial role in SRE and DevOps frameworks. Services delivered by Prometheus, DataDog, Broadcom, PRGT Network Monitor, and many other platforms allow for metrics-based continuous monitoring of network and application performance across cloud environments.

When do companies need DevOps and SRE?

Despite all the confusion and overlaps, one thing is pretty sure: SRE and DevOps are not conflicting factions, but rather two relatives working towards the same goal and with the same tools — but with slightly different focuses.

While SRE culture prioritizes reliability over the speed of change, DevOps instead accentuates agility across all stages of the product development cycle. However, both approaches try to find a balance between two poles and can complement each other in terms of methods, practices, and solutions. More on that in our video:

Depending on their size and goals, companies may implement different scenarios of DevOps, SRE, or even their combination.

SRE teams modeled after Google fit large tech-driven companies such as Adobe, Twitter, or Amazon that handle billions of daily requests and put the availability of their services before anything else.

DevOps culture and cross-functional teams benefit any business working in a highly competitive environment, where even a slightly shorter time to market gives a huge competitive advantage. Moreover, a DevOps team can be strengthened with a Site Reliability Engineer to monitor system performance and ensure its stability.

Some organizations have two teams — SRE and DevOps. The former is responsible for the support and maintenance of existing service while the latter creates and delivers new applications.

Smaller companies typically seek a person to manage cloud infrastructure and automate operations tasks, using different job titles for the same responsibilities — DevOps Engineer, DevOps Manager, Site Reliability Engineer, or even Cloud Engineer, or CI/CD Engineer.

No matter how large your company is, somebody in your organization probably already does the SRE job, promotes collaboration between developers and IT specialists, or writes scripts to automate time-consuming tasks. If you find these people and officially recognize their work, they can form the backbone of an effective SRE or DevOps team — whichever name you like more.