2016-11-29

0

1

0

0

By now you’ve likely heard of New Relic Infrastructure, our awesome new product designed to “shine a light on your dynamic infrastructure” and “reduce your MTTR.”

Those are fun phrases, but many of us have been burned in the past by adopting solutions early that over-promised and ultimately under-delivered, so let’s jump straight into how New Relic Infrastructure works and why it’s worth your time.

Working with AWS

Specifically, let’s focus on how New Relic Infrastructure works when Amazon Web Services (AWS) is a part of your environment. We’ll look at:

How to get set up on AWS with New Relic Infrastructure (agent installation and SaaS integrations)

How New Relic Infrastructure monitors compute (Amazon EC2)

How New Relic Infrastructure monitors satellite AWS services (S3, Lambda—more on that new integration here—and many more)

How to proactively monitor your environments with New Relic Infrastructure (Alerts and Dashboards)

How to get set up on AWS with New Relic Infrastructure

Let’s start with the boring part (this is setup after all, it’s supposed to be boring). Our goal was to make your setup story as familiar and low impact as possible.

Agent installation

If you are already familiar with New Relic Servers, this story is going to sound very familiar. New Relic Infrastructure requires a single lightweight agent to be installed on your hosts, using apt-get on Debian, yum on Red Hat, or an MSI on Windows—see, I told you it was basic. The initial setup automatically configures the only necessary field (the license key, which uniquely identifies your account). The agent has an associated configuration file that you can use to manage optional advanced settings. Since the installation flow is so standard, it’s easy to bake it into a Chef recipe, Puppet script, or even into a base AMI for mass-scale deploys—and we are looking to make those processes even easier.

SaaS integration

New Relic Infrastructure offers a SaaS-to-SaaS integration that can pull data directly from the AWS APIs without a single agent install (more on how this manifests later). The process is pretty straightforward:

Create an IAM role with read-only permissions to a specific set of APIs

Copy the ARN of that role into New Relic’s interface

Sip Mai-Tais whilst data pours in

You can manage this entire process from your browser without logging into a single machine.

Three-step setup for the SaaS integration can be done in as little as 60 seconds—yes, we’ve timed it.

How New Relic Infrastructure monitors compute

Cloud-first architectures (or even on-premise apps lifted-and-shifted to public or private clouds) face unique monitoring challenges. They make you think about your underlying compute in fundamentally different ways than their on-premise ancestors. They try to abstract away the notion of the individual server and instead work in aggregates across a series of nameless, ephemeral, and potentially immutable instances. The scaling and continuity issues this creates for monitoring technology is compounded by the fact that many enterprises live in hybrid worlds, so useful solutions have to account for both architectures simultaneously.

That’s exactly what we’ve tried to do. As with New Relic APM, we ignored the distinctions and focused on the commonalities of the two environments and provided a solution designed to simply give you the information you need rather than highlighting arbitrary differences in features.

Filtering and grouping by labels

Generally speaking, if a single host dies in the cloud … who cares? In a horizontally scaled architecture, you just spin up a new instance (I’ll spare you the puppies vs. cattle analogy).

In this world, the notion of a “host” is much less important than it used to be. What we really need to care about are groups of hosts with common tasks—a chronically underserved segment of our AWS needs. New Relic Infrastructure understands this and provides the ability to use labels to dynamically dissect and group your fleet of hosts. You can, of course, look at all of your hosts or a single host, but the sweet spot definitely lies somewhere in the middle.

Whether you want to audit all of your hosts or scope to a set of 30 t2-micro instances hosted in us-west-1a running a specific application, New Relic Infrastructure’s UI experience is the exact same. Simply apply the filters you want and the entire UI immediately adjusts to include data from only that subset. #bobsyouruncle. These labels can come directly from the agent out of the box, from the SaaS integration—yes, that’s right, New Relic Infrastructure can automatically pull EC2 metadata and correlate it with your instrumented hosts—or from custom labels you set in the configuration file. What’s even better is that these filtered views can be saved and tweaked at any time to help you easily hone in on exactly what you care about.

List of some of the available labels for filtering and faceting your Infrastructure hosts.

Events and metrics belong together

Imagine you don’t have New Relic Infrastructure and that a group of your machines is hovering around 99% CPU usage. Let’s be generous and assume you have some way of actually being alerted of that—other than people on Twitter raging at you for slow app performance. You know this is probably a bad sign and that you should try to bring usage down to a level more accommodating of spikes. So you crack your knuckles, hop into your time machine and head to 1980, where you SSH into each host and run “top.” You start feeling pretty modern right around this point because you have traced the issue down to a specific process (“foobar”) on the hosts. You restart the process and … the spike immediately occurs again! Now what? (The obvious answer: rage quit.)

Now, let’s run through that same scenario in New Relic Infrastructure. First, you receive the alert (which was set up in seconds) and click directly into a filtered view of affected hosts. You see the CPU spike and trends over time so you can quickly pinpoint approximately when the issue began occurring. Then you click on the Events tab and get a detailed list of all changes that occurred during that time window, including session logins, service starts and stops, package installations, and more. You work with the team that made the change, get a fix, deploy the change, and move on your with your life. You even get to keep your job!

Looking at the Events tab, we can clearly see a chronology of exactly what happened. User logged in, installed a bad package, triggered alerts, and then removed the package and logged off.

In the first scenario you don’t know why anything happened. This is why New Relic Infrastructure introduced the notion of Events to accent the numeric metrics. Think of metrics as the what is happening, the process output as the how it is happening, and the Events as the why it is happening.

As a quick summary, in the world of the cloud, access is democratized for good reasons. DevOps works, shared ownership works, rapidity works, but for all their benefits, they can also increase the chance of someone accidentally doing something stupid. Events help you immediately pinpoint when and where those moments occur.

Visibility inside the black boxes

We already hit on the labels and filters that let you filter, pivot, and ultimately better understand what types of things you have running in your environment, so let’s jump to New Relic Infrastructure’s ability to go a step further and show you exactly what is running on those hosts.

Imagine a new zero-day vulnerability called “Avocado” is released (no more ridiculous than Poodle, Heartbleed, or Shellshock). Knowing what hosts you have running doesn’t tell you whether you’re vulnerable—and it’s generally not practical to manually log into each instance to check.

This is where New Relic Infrastructure’s Inventory tab comes into play. Basically, it collects and makes searchable a wealth of “state information” for every host. You can search through packages, configurations, and services within seconds to easily audit vulnerability.

For example: The vulnerability affects only older versions of OpenSSL? Simply type “OpenSSL” in the search box and immediately see what versions are running across all your hosts. Puppet run not doing what you expected? Verify in seconds that the script actually worked by checking the version on a specific host.

In short, see exactly what’s running, when it changes, and what it’s doing. No need to worry about rapid develop-and-deploy cycles sprinting away from your ability to keep tabs on what’s installed where.

Searching all 515 hosts for “openssl” gives a detailed breakdown of what versions are installed on what hosts, allowing for an easy audit.

How New Relic Infrastructure monitors satellite AWS services (S3, Lambda, and many more)

Earlier, I explained how setting up the AWS SaaS integration enhances New Relic Infrastructure’s compute monitoring story by using EC2 metadata and metrics to accent the data collected by the agent. But let’s not stop there. AWS offers a wealth of services that allow easy access to core application requirements such as storage, networking, queues, notifications, load balancing, and content distribution.

This is a blessing in that no one wants to manage things that don’t have any differentiated business value. But it’s also a curse because not managing the underlying hosts affords limited visibility into how they’re performing. New Relic Infrastructure aims to solve that problem by using the same IAM the user provided in the setup above and pulling in data about AWS services (CloudFront, DynamoDB, EBS, ElastiCache, ELB, IAM, Kinesis, Lambda, RDS, S3, SNS, SQS, VPC) including metrics, events, and more.

You may be thinking, “I’m clever, they’re just using CloudWatch APIs” that are already available in the console, and you’d be partially right. CloudWatch is one mechanism we use to collect data (though we do improve on the core metrics with a more robust ability to transform the data and a much longer retention period). We also collect other data, such as changes in configuration to these services, and provide the ability to audit the individual configurations.

This point is best understood with real-world examples, so check out these New Relic Infrastructure screenshots:

[click to enlarge]

[click to enlarge]

[click to enlarge]

How to proactively monitor your environments with New Relic Infrastructure

Dashboards

If you made it this far, you’re a trooper, so we’ll keep things concise. All of this data is stored in NRDB, the underlying analytics engine that supports New Relic Insights. This means you can view any of this data in its raw form from Insights and issue custom queries to craft widgets and dashboards to serve your unique needs (we provide several out-of-the-box and editable dashboards to help you get up and running fast). Having a dashboard drives home awareness around “What does normal look like?” and helps surface anomalous behavior even before an alert is triggered. Check out the New Relic Infrastructure documentation for more details.

Infrastructure data is available in its raw form in New Relic Insights and can be queried just like any other events.

Alerts

If you are not setting up alerts on your systems, you’re not really doing monitoring. The whole point of monitoring your systems is to reduce the time that they’re down, and the biggest bottleneck is almost always just figuring out that there is an issue. New Relic Infrastructure lets users set up alerts for straightforward conditions that target all hosts or filtered groups of hosts. Setting them up is trivial and having them is absolutely crucial to a fast response time. The important thing to note here is that regardless of whether your hosts are on-premises, in the cloud, spread across both, or in your mother’s basement, the same policies, conditions, and process will apply to them equally.

And for the grand finale, as an added bonus to all of you New Relic Servers veterans who have struggled with setting up meaningful alerts on your hosts when using a dynamically load-balanced environment like Amazon’s Elastic Beanstalk, it is my pleasure to announce that all Infrastructure alert conditions will automatically include any new hosts or targets that meet the filter criteria you set up. This will enable true “set it and forget it” alerts for your infrastructure, allowing you to rest easy as your topography of hosts changes around you!

Alerts can be targeted at all or a subset of hosts and can include any metric from any chart in the UI.

I hope this post helps make your life a little easier. Stay tuned to the New Relic blog for future installments of the “New Relic Infrastructure in the Real World” series.

0

0

0

0

Blog-assets.newrelic.com

New Relic Infrastructure in the Real World: AWS