Aggregator

Why Your Server Monitoring (Still) Sucks

1 week 5 days ago
by Mike Julian

Five observations about why your your server monitoring still stinks by a monitoring specialist-turned-consultant.

Early in my career, I was responsible for managing a large fleet of printers across a large campus. We're talking several hundred networked printers. It often required a 10- or 15-minute walk to get to some of those printers physically, and many were used only sporadically. I didn't always know what was happening until I arrived, so it was anyone's guess as to the problem. Simple paper jam? Driver issue? Printer currently on fire? I found out only after the long walk. Making this even more frustrating for everyone was that, thanks to the infrequent use of some of them, a printer with a problem might go unnoticed for weeks, making itself known only when someone tried to print with it.

Finally, it occurred to me: wouldn't it be nice if I knew about the problem and the cause before someone called me? I found my first monitoring tool that day, and I was absolutely hooked.

Since then, I've helped numerous people overhaul their monitoring systems. In doing so, I noticed the same challenges repeat themselves regularly. If you're responsible for managing the systems at your organization, read on; I have much advice to dispense.

So, without further ado, here are my top five reasons why your monitoring is crap and what you can do about it.

1. You're Using Antiquated Tools

By far, the most common reason for monitoring being screwed up is a reliance on antiquated tools. You know that's your issue when you spend too much time working around the warts of your monitoring tools or when you've got a bunch of custom code to get around some major missing functionality. But the bottom line is that you spend more time trying to fix the almost-working tools than just getting on with your job.

The problem with using antiquated tools and methodologies is that you're just making it harder for yourself. I suppose it's certainly possible to dig a hole with a rusty spoon, but wouldn't you prefer to use a shovel?

Great tools are invisible. They make you more effective, and the job is easier to accomplish. When you have great tools, you don't even notice them.

Maybe you don't describe your monitoring tools as "easy to use" or "invisible". The words you might opt to use would make my editor break out a red pen.

This checklist can help you determine if you're screwing yourself.

Go to Full Article
Mike Julian

System76 Announces American-Made Desktop PC with Open-Source Parts

1 week 5 days ago
by Bryan Lunduke

Early in 2017—nearly two years ago—System76 invited me, and a handful of others, out to its Denver headquarters for a sneak peek at something new they'd been working on.

We were ushered into a windowless, underground meeting room. Our phones and cameras confiscated. Seriously. Every word of that is true. We were sworn to total and complete secrecy. Assumedly under penalty of extreme death...though that part was, technically, never stated.

Once the head honcho of System76, Carl Richell, was satisfied that the room was secure and free from bugs, the presentation began.

System76 told us the company was building its own desktop computers. Ones that it designed themselves. From-scratch cases. With wood. And inlaid metal. What's more, these designs would be open. All built right there in Denver, Colorado.

We were intrigued.

Then they showed them to us, and we darn near lost our minds. They were gorgeous. We all wanted them.

But they were not ready yet. This was early on in the design and engineering, and they were looking for feedback—to make sure System76 was on the right track.

They were.

Flash-forward to today (November 1, 2018), and these Linux-powered, made in America desktop machines are finally being unveiled to the world as the Thelio line (which they've been teasing for several weeks with a series of sci-fi themed stories).

The Thelio comes in three sizes:

  • Thelio (aka "small") — max 32GB RAM, 24TB storage.
  • Thelio Major (aka "medium") — max 128GB RAM, 46TB storage.
  • Thelio Massive (aka "large") — max 768GB RAM, 86TB storage.

All three sport the same basic look: part black metal, part wood (with either maple or walnut options) with rounded side edges. The cases open with a single slide up of the outer housing, with easy swapping of components. Lots of nice little touches, like a spot for in-case storage of screws that can be used in securing drives.

In an awesomely nerdy touch, the rear exhaust grill shows the alignment of planets in the solar system...at UNIX Epoch time. Also known as January 1, 1970. A Thursday.

Go to Full Article
Bryan Lunduke

The Monitoring Issue

1 week 5 days ago
by Bryan Lunduke

In 1935, Austrian physicist, Erwin Schrödinger, still flying high after his Nobel Prize win from two years earlier, created a simple thought experiment.

It ran something like this:

If you have a file server, you cannot know if that server is up or down...until you check on it. Thus, until you use it, a file server is—in a sense—both up and down. At the same time.

This little brain teaser became known as Schrödinger's File Server, and it's regarded as the first known critical research on the intersection of Systems Administration and Quantum Superposition. (Though, why Erwin chose, specifically, to use a "file server" as an example remains a bit of a mystery—as the experiment works equally well with any type of server. It's like, we get it, Erwin. You have a nice NAS. Get over it.)

...

Okay, perhaps it didn't go exactly like that. But I'm confident it would have...you know...had good old Erwin had a nice Network Attached Storage server instead of a cat.

Regardless, the lessons from that experiment certainly hold true for servers. If you haven't checked on your server recently, how can you be truly sure it's running properly? Heck, it might not even be running at all!

Monitoring a server—to be notified when problems occur or, even better, when problems look like they are about to occur—seems, at first blush, to be a simple task. Write a script to ping a server, then email me when the ping times out. Run that script every few minutes and, shazam, we've got a server monitoring solution! Easy-peasy, time for lunch!

Whoah, there! Not so fast!

That server monitoring solution right there? It stinks. It's fragile. It gives you very little information (other than the results of a ping). Even for administering your own home server, that's barely enough information and monitoring to keep things running smoothly.

Even if you have a more robust solution in place, odds are there are significant shortcomings and problems with it. Luckily, Linux Journal has your back—this issue is chock full of advice, tips and tricks for how to keep your servers effectively monitored.

You know, so you're not just guessing of the cat is still alive in there.

Mike Julian (author of O'Reilly's Practical Monitoring) goes into detail on a bunch of the ways your monitoring solution needs serious work in his adorably titled "Why Your Server Monitoring (Still) Sucks" article.

We continue "telling it like it is" with Corey Quinn's treatise on Amazon's CloudWatch, "CloudWatch Is of the Devil, but I Must Use It". Seriously, Corey, tell us how you really feel.

Go to Full Article
Bryan Lunduke

GNOME 3.30.2 Released, Braiins OS Open-Source System for Cryptocurrency Embedded Devices Launched, Ubuntu 19.04 Dubbed Disco Dingo, Project OWL Wins IBM's Call for Code Challenge and Google Announces New Security Features

1 week 5 days ago

News briefs for November 1, 2018.

GNOME 3.30.2 was released yesterday. It includes several bug fixes, and packages should arrive in your distro of choice soon, but if you want to compile it yourself, you can get it here. The full list of changes is available here. This is the last planned point release of the 3.30 desktop environment. The 3.32 release is expected to be available in spring 2019.

Braiins Systems has announced Braiins OS, which claims to be "the first fully open source system for cryptocurrency embedded devices". FOSSBYTES reports that the initial release is based on OpenWrt. In addition, Braiins OS "keeps monitoring the working conditions and hardware to create reports of errors and performance. Braiins also claimed to reduce power consumption by 20%".

Ubuntu 19.04 will be called Disco Dingo, and the release is scheduled for April 2019. Source: OMG! Ubuntu!.

IBM announces Project OWL is the winner of its first Call for Code challenge. Project OWL is "an IoT and software solution that keeps first responders and victims connected in a natural disaster". The team will receive $200,000 USD and will be able to deploy the solution via the IBM Corporate Service Corps. The OWL stands for "stands for Organization, Whereabouts, and Logistics", and it's a hardware/software solution that "provides an offline communication infrastructure that gives first responders a simple interface for managing all aspects of a disaster".

Google yesterday announced four new security features for Google accounts. According to ZDNet, Google won't allow you to sign in if you have disabled JavaScript in your browser. It plans to pull data from Google Play Protect to list all malicious apps installed on Android phones, and it also now will notify you whenever you share any data from your Google account. Finally, it has implemented a new set of procedures to help users after an account has been attacked.

News GNOME Distributions cryptomining Ubuntu IBM Google Security
Jill Franklin