<![CDATA[Dave Josephsen]> 2015-05-18T10:07:47-05:00 http://www.skeptech.org/ Octopress <![CDATA[The part where I hack my github graph]> 2014-02-11T17:32:00-06:00 http://www.skeptech.org/blog/2014/02/11/the-part-where-I-hack-my-github-graph A story in 8 tropes about time travel, space invaders, and lying to git.

The part where I have a shiny new idea

All of our programming stories begin with an idea, but one thing you need to understand about me is that I have a lot of ideas. They constantly dance out from behind the scenery like pagan woodland nymphs in a Shakespearian comedy; alluring and seductive. I am smitten by every one of them – besotted – to me, when they first appear, they are all fantastic. The problem is some of them are actually very bad ideas, like.. objectively bad, and honestly, sometimes for weeks at a time, I can’t tell the difference. I follow them willingly; a happy accomplice to the amoral whims of my own, often dim-witted ideas.

To give myself a fighting chance, I like to bounce my ideas off a certain kind of person. Bright, sensible, and brutally honest summs up the personality type. They are rare, like diamonds, but not nearly as well liked (which is why I think they put up with me). These people are to me, as spinach is to Popeye; together we can accomplish anything. I’m especially fond of the ones that can be counted on to forget my bad ideas and remember only the good ones (Pro tip: this is a common self-protection mechanism among sensible people and a good litmus test for detecting them).

Anyway I’m in a bit of a pinch just now, having recently begun a new work-from-home job, I guess you could say I find myself “between people” – in want of a bright, sensible, brutally honest person to bounce ideas off. Without a proper confidant I’m an easy mark for my own ideas, so what follows is the story of me putting into code, what is probably a bad idea.

It started when I noticed that my github contributions graph almost had a space invader in it, so I spent the next week or two trying to complete the pattern.

Sometime during that week the idea of hacking up something that would help me maintain a scrolling space invader github contributions graph hit me, and I don’t know, it seemed so overwhelmingly fantastic I had to do it.

If I’d had a bright, sensible, brutally honest person around, this idea probably wouldn’t have taken off. They probably would have pointed out that many potential employers would judge me by my github profile, and that github maybe wouldn’t appreciate me messing around with it, and maybe I should run it by somebody over there first. Also there could be negative functional side-effects associated with whatever actions I might take to affect my graph, unforeseen consequences. I can see the wisdom in this advice now that I’ve had a few weeks to reflect. Oh well.

The part where I lose interest after solving the “interesting part”

My normal strategy when I have a fantastic idea for a software project is to stay up all night implementing the most interesting part of it in the most interesting language to me at the moment, and then never look at any of it ever again. It’s as if, when the interesting part is done, the “problem” is solved, which is, of course nonsense. Solving problems in real life requires slogging drudgery and boring trivialities. All the little things that combine to form 90% of the project in earnest. I know this, but I get around it by telling myself that I can come back to that stuff “tomorrow”.

This space invader graph thing was just a little toy project for me, so it didn’t have to be elegant or feature-rich (I told myself), and there was going to be an awesome space invader graph (eventually) when I was done. So it was pretty easy to keep my eyes on the prize. I only slacked off for like two weeks after staying up all night and coding the interesting part first in the most interesting language to me at the moment

The part where I make an off-by-one error

I won’t say that I generate a zero-indexing bug every time I work with dates or multi-dimensional arrays, but I do it a sufficiently large percentage of the time that I work with dates or multi-dimensional arrays that it may as well be every time. I mean if you round up or whatever I basically do it every time. You’d have thought that after 15 years of programming I’d understand that computers count starting at 0, but nope. Not me. I’m evidently genetically predisposed to introducing off by one errors (or just stupid (probably the latter)).

This time I inserted a particularly insidious off-by-one bug such that everything I wrote would work perfectly for 216 days (the number of days it takes the invader pattern to loop), but on that 217th day, my Go code would explode with an invalid index runtime error. If there’s anything I have become familiar with as I’ve grown older however, it’s the inevitability of my own screw-ups, so I simulated the pattern for three hundred years or so in order to find the the off-by-one error I knew I must have introduced. I no longer strive for bug-free code, I strive only to be dead by the time it blows up.

The part where I experience the setback that leads to something cool

Starting out, I had a few goals that informed my design.

1 I wanted the data in the contributions graph to be representative of real commits, no cheating.

There have been other efforts of this sort, but they use fake commits to fake accounts. My goal here was not just to draw a cool graph somewhere on github, but to find an approach that rewarded me for my commits. I want to earn it, like every coder earns their graph, I just wanted mine to have space invaders.

2 I didn’t want whatever I came up with to dictate my schedule

I needed to arrange things so that I didn’t have to schedule my coding time around the graph, because realistically I have deadlines to keep etc.. I wasn’t going to be able to Bogart commits from someone waiting on a fix because ZOMG PRETTY GRAPH. On the flip-side, if I don’t commit enough there will be holes in the graph. Right now most of my commits go to private repos, so I’ll have to step up my OSS game, or live without space invaders, it’s not in the spirit of the thing to “save up” my commits and spread them out or whatever.

3 I didn’t want to change my tooling (much)

I wanted the process to be transparent to me; which is to say I wanted to continue using git the way I always did. This wasn’t going to work if I had to use git –prettygraph commit -am ‘foo’ on some days and not on others (because I’d never stick to it. See ‘stupid’ above). Also I do quite a bit of file-transfer via git, because I move between Linux workstations and Mac laptops, so whatever I did, couldn’t prevent me from pushing because it would mean never having the code where I wanted it.

So given those constraints, I figured I’d need three pieces of code to pull this off. The first piece would be a program that I could query to tell me whether I should push on a given day or not. Then I could write a shell wrapper (piece #2) for git, that would intercept my push commands, check to see if today was a “blank” day on the graph, and, if it was, push to a private repository instead. Then I’d have to come along later with the third piece, and gather up all of the delayed pushes, and merge them back to origin.

I had an idea for another approach which used two different github accounts, which would simplify the git part of this, but would (probably) require me to keep two sets of credentials on all of my machines. I went with the private repo because I thought it’d teach me more about git.

Anyway, while experimenting with different branching and cherry-picking schemes, I asked some git-adept friends how they thought I should handle it, and that’s when I hit what at first seemed like a catastrophic setback. Namely: github.com doesn’t populate the graph with your push dates, it populates the graph with your commit dates.

So delaying my pushes wasn’t going to work; github would use the date that I made the commit even if I eventually pushed it three days later, and I wasn’t going to be able to spoof the commit date after the fact because of the way git uses hashes to name things like logs internally. If I was going to spoof the commit date, it’d have to be before git got it’s hands on the commit.

But hey I’m the idea guy so like 18 ideas occurred to me immediately, the first was change the system clock on every commit. This was quickly disqualified because it’d trigger my screensaver every time I committed (and who knows what else). The second was to Vagrant or Docker up a VM with a broken clock for every commit. This was, of course, absurd, and for that reason alone I was sorely tempted to do it, but I didn’t because it would slow down my commit commands enormously, and therefore it was disqualified by the tooling-change constraint.

My third idea was to write a wrapper for the time() system call, and have the linker PRELOAD it. That way my Library could detect the name of the calling process and – if it’s name was ‘git’ – LIE to it about the current date. This seemed hacky and fun, and somehow kind of deliciously mischievous, so I went with it.

I should mention at this point that I’m aware that by doing this I may create git logs of impossible commit history down the road that might confound people. I plan to buy those people the beverage of their choice to atone for making their lives more difficult. It’s my sincere hope that the sight of my space invaders and a cold beer (or whatever) will make up for the negative energy I introduce into their lives. The date skews should never span more than three days given the invaders pattern, and I’m not a terribly prolific OSS contributor so I’m unlikely to bug you personally.

The part where I stumble across a gaping security vulnerability

Once my libc wrapper was ready, I had some fun experimenting with it and quickly realized that I could spoof commit-dates any length of time into the past or future and github will happily graph them (if you look at my github commit graph right now, it shows a commit a few years in the future, November 14th 2017).

This shouldn’t have really surprised me, but seeing it in action gave me pause. With what I had already coded, I could open an account today and give it a solid dark green contribution graph – the sort of thing Google Interns ignore the opposite sex to obtain. That’s an extreme example. It wouldn’t fool anyone who was paying attention, but I do think it would be hypothetically possible to take a random number generator and modify a mediocre looking account so that it looked orders of magnitude more impressive, in a way that would take some thorough analysis to detect.

Now, this isn’t really a security vulnerability on the github side – github can’t be expected to verify your commit dates or assess the quality of your contributions, and the graph is behaving exactly as it should. But I wonder how deeply the companies who use github as your resume ever dig beyond that graph. I’m sure it’s fine.

The part where the Internet provides

There’s another, closely related but mutually exclusive trope here, that goes “the part where I completely abandon my project to focus on a thing I made for the project that’s way cooler than the whole project”.

Once I started playing with my perfidious little time wrapper, I realized that I was really on to something cool. Every program I’d ever written that used date-based scheduling to do something could benefit from it. A few years ago I wrote a pretty great back-end processing library to do things like copy batch files tither and yon, and load them into oracle et al.. With my time wrapper I could see how my batch job processor would operate on leap-day in 2066 for example (wait.. is there a leap day in 2066? Well you get the point).

Of course, to turn my time wrapper into something that could be useful for testing arbitrary processes, I’d have to do all that slogging drudgery I alluded to above. Things like reading in options, parsing config files, and having a man page, you know, user stuff. Also, at the time I only had this working on Linux where I have some system chops. I still had to port it to Darwin for my Macbook, which meant learning about dyld, and delving into the Darwin headers and etc.. So before I abandoned my little project to hack up this time-warp testery thing, I took a moment to ask Google if someone had already done if for me. Hey google, has someone done this time thing already?

Of course they did, and way better than I ever would have. Thanks Internets! Not only do I get a WAY more functional and cross-platform means to lie to git for free, I don’t have to abandon my toy project to work on something real. How did we ever get anything done before there were millions of people falling over each other to solve every functional coding problem ever conceived for free?

The part where I wind up debugging someone else’s project

This trope also has a closely related (and not at all mutually exclusive) cousin called “The part where I shave a yak”. When running ‘make’ in the libfaketime tree failed for me, I wasted some time poking at the Makefile and then the headers (looking for the OSX symbols that match the Linux ones) before realizing there’s an OSX makefile. Then, seeing that macports contributed the OSX makefile I smacked my forehead and ran:

brew install libfaketime

Yup. Not only did the Internet provide an obscure cross-platform library that enables me to lie to arbitrary processes about what day it is, the Internet also packaged it up for me so I wouldn’t strain myself keeping track of two whole makefiles.

I proceeded to have some weird clock-skew problems using libfaketime’s ‘faketime’ wrapper. It seemed work ok for the ‘date’ binary, but it skewed years into the future for git. I poked at the code a little bit to see how they were forking the sub process before winding up back in the Darwin libc headers. But eventually gave up and bypassed the faketime binary by setting the Darwin environment variables manually in my shell wrapper, and that works. Test test test, by George, I think we’ve got it!

The part where I almost but not quite bother to blog about it

Whenever I do something cool, I like to write a draft of a blog post about it and then not share it with you. That way I feel like I could have shared, but you weren’t worthy. We both know you probably would have just said something mean about it and anyway it’s important that we both know who is wearing the pants in this relationship. Really, this post isn’t actually for your benefit, I only published as a pre-emptive apology to my future potential employers and the programmers on github who are confounded by how I could have contributed a fix to their feature three days before they contributed their feature. Also, I have a rule that I have to share every post that uses the word “perfidious”.

Take it easy -dave

]>
<![CDATA[The Developer Evangelist]> 2013-12-12T16:48:00-06:00 http://www.skeptech.org/blog/2013/12/12/the-developer-evangelist A few weeks ago, at Librato home base, there was a sort of state of the union meeting, wherein the marketing team gave a small presentation to let everyone know what they were up to. Well, really, it was more like it explained itself. We’ve only actually had a marketing team for a week or two you see, and many of the engineers were, unsurprisingly unsure what to make of it.

Anyway afterwards, everyone seemed, if not convinced, then resigned to the efforts described in that presentation, but I sensed some lingering confusion. A palatable sense of “Ok, but so what is this developer evangelist guy doing here?” seemed to hang in the air. So probably unwisely, I offered to give a spur-of-the-moment 5 minute rundown of just exactly wtf I thought I was doing there. What follows is a somewhat fleshed-out version of what I said.

The problem is, you can’t market to dev.

It doesn’t matter how awesome your tool is, or how much you think developers will appreciate it. Really, it has nothing to do with you or your tool in general; the rules were set in motion by those who preceded you, and the nature of those rules is that engineers especially hate marketing. Like… viscerally. Just accept it.

It isn’t just that marketing and engineering are different, rather, they are deeply orthogonal by nature. Engineering is pedantic, analytical and introverted. It doesn’t talk about itself, and when it does, it does so in an academic and intentionally hostile setting, and so it naturally tends to evolve in a meritocratic way. It wants to be vetted more than it wants to be purchased.

Marketing is rarely incentivized in healthy ways, and so naturally tends to devolve no matter how good it’s intentions. It starts communicating and devolves to spam, it starts to be data driven and devolves to SEO, coercion and thought control. In marketing there is no difference between being vetted and being purchased.

Marketing has only ever discovered one truly successful technique to sell things to Dev, namely: infiltration. Developer evangelism is marketing by infiltration, and I have no illusions about that. But an interesting thing happens when you attempt to infiltrate dev, and as far as I can tell, it happens pretty much every time if you are doing it right, which is the only way to be successful at it.

In order to be successful, it turns out you need to make engineers happy. This mostly means being effective at creating extra value for people who aren’t necessarily paying customers. You need to make a bigger splash for the general good of everyone. Usually, this means improving related, open-source tools that you probably weren’t focused on before. It means investing in tools that you were already getting for free.

For example, at Librato, in order to carry out effective developer evangelism as a marketing strategy, you need to create value in the community by improving the monitoring tools that surround Librato. Tools like Statsd, and collectd that might very well help out our competition. But the more value you create – the awesomer you become within the community of your potential customers – the more effective will be your marketing efforts. Unlike marketing/charity undertakings in other industries, you can’t fake it. Perception doesn’t matter. Github graphs don’t lie. Developer evangelism is probably one of very few ways to positively incentivize marketing.

When you …do(?) developer evangelism correctly, Dev gets drawn out of it’s echo-chamber before it’s ready. It benefits from peer review earlier and more often, and it creates more value and makes a wider impact than it would have normally. Marketing meanwhile gets to sell to developers without resorting to tactics that are annoying and coercive. The two, when combined in this particular way, seem to make each other better.

As far as I can tell, my job as the developer evangelist breaks down into four general areas:

  • write code
  • generate content
  • give talks
  • close the loop

Write code

The code I write at Librato will mostly not be focused on Librato core API and etc.. but rather on the periphery of tools that we don’t own that sort of orbit us. That’s good, because hacking on open-source monitoring tools is super interesting to me – it’s totally what I’ve wanted to focus on for years. If you look at my github graph right now, it’s mostly empty because I’ve spent my career in private code repositories improving things for individual companies, and I’ve wanted that to change for a long time. And yes, in case you’re wondering, I find it ironic in the extreme, that after 20 years of engineering I’m moving to the marketing side because it’ll enable me to do the kind of engineering I’m really interested in.

Generate content

Like this blog post and posts on the Librato blog, as well as articles and documentation – maybe even a screen-cast or two. This also aligns very well with my natural inclination to write stuff, an inclination that has almost always been distracting and worrisome to my employers. As a developer evangelist I will be encouraged to write about the cool stuff that’s going on around me. I find it hard to articulate the extent to which this is a relief to me, but I imagine it’s the same sort of feeling you’d get if you went into one of those catholic confessional booths to confess your sins only to be told to eat more bacon.

Give talks

Go to conferences, write abstracts and papers to get us speaking slots, speak sometimes and help to push the engineers into the spotlight. Sounds rough I know but someone has to do it.

Close the loop

This is probably the most important part of being a developer evangelist, because it boils down to keeping Librato sane as it gets bigger. Closing the loop means collecting, packaging and bringing back all the feedback I can get my hands on. It means being the bearer of bad news when we’ve done something silly that the community is having a hard time wrapping it’s head around, and following up with the community as we fix it. It means being the guy you can rage at one engineer to another without having to worry about signal-to-noise and whether your gripe will actually go somewhere. It also means fielding emotionally agnostic questions, and passing back good news and kudos obviously.

Anyway that’s what I think going in to this. I’ll let you know how it actually turns out.

]>
<![CDATA[Hello Librato]> 2013-11-11T14:23:00-06:00 http://www.skeptech.org/blog/2013/11/11/hello-librato I don’t remember how I found DBG, but I do recall taking the interview process pretty lightly. I considered DBG a dress rehearsal for the REAL interview at match.com. It was going to be my first real dot-com with the ping-pong table and the catered lunches and the not business casual. That was, until I heard DBG’s pitch.

Match was going to be a postfix-admin job, and a cushy one at that. I’d chill in my cube, swat at a few postfix instances, pig out on catered lunch, and read web comics. If I was feeling feisty I might tell the ping-pong kids to keep it down. The job was mine, all I had to do was sign on the line.

DBG however, needed me to build literally everything. From the ground up. Design and construct the infrastructure required to run a web hosting and application-as-a-service company. Network, servers, firewalls, load balancers, database layer – everything. Use whatever tools you want. Dress code? Clothes are required. I actually started daydreaming about it in the middle of my interview. In retrospect, it was no contest. I’d join DBG all over again today if I had the choice.

7 years later I have a palatable sense that I am in the way. The infrastructure long-since built and I spend most of my time with auditors and policy documents. Arguing with executives and saying no to people who are trying to get real work done. DBG needs to grow in ways that I with my open-source hackery can only obstruct. I have never held a job longer than 5 years, and I have begun to suspect that no one should.

Maybe there are signs. I like to think there are. Maybe PCIDSS is not a hopelessly broken doctrine that begets an infestation of bean-counting parasites, desperate to inject their filthy protuberances into the productive energy of good and well-meaning people. Or at least not just that. Maybe PCIDSS is also a message from God – a message telling me that it’s time to move and grow and find the next big thing so that DBG can move and grow and find the next big thing.

Maybe it’s also providential that, while looking for a decent graphite-as-a-service company to possibly throw some metrics at, I came across Librato’s ad for a developer evangelist. It is an Ad that gives me pause, makes me, I’m not ashamed to say, daydream a little bit. So I sent them an email:

Hi Librato,

I'm Dave Josephsen.  I recently came across your "Developer Evangelist"
position and I find myself Intrigued.  On one hand, I imagine it being
the best job I've ever even heard of, and on the other, a horrible and
unmitigated mistake.

Lets see if I can talk us out of it.

I guess I should start by pointing out that I've never been either a
developer or an evangelist by trade so to suddenly attempt both
simultaneously might not end well.  I've always been a devop and a
very opsy devop at that.  I did write the Prentice Hall Nagios book [1],
and I co-authored the O'Reilly book on Ganglia [2], and I publish a
semi-monthly column in the Usenix magazine (;login) on systems
monitoring, but none of that was borne of a desire to be evangelical
about things.

"Accidental", would be a more apt description; you see, I get excited
about and inspired by cool monitoring and metrics collection and
visualization systems, and then I write about them [3], and then
overworked junior editors at various publishers read them, and then
books and articles and things happen. I'm unclear on the details really.

Also, although I've never been a developer, I program in myriad
languages, and have committed patches to several open source projects,
and here's the weird part: They're all monitoring projects.. Ganglia,
Nagios, Graphite, MK... You know, now that I'm thinking about it, I have
a fascination bordering on unhealthy with monitoring and metrics
collection software; How depraved must they have been in the underworld
the day they passed out lurid obsessions that I drew "monitoring
systems". I mean really, I had to have been like the penultimate dude in
that line.

You'll forgive the religious connotation, this is, after all, an
evangelist position I'm trying not to apply for.  So anyway, there's the
rub, I might be a natural -- just the guy you're looking for.  Maybe we
should give it a shot after all, or maybe at least have a chat on the
phone about it.

I'll leave it up to you. I'm game if you are.

-dave

A few hours later I had a response from Librato’s CEO informing me that I had unfortunately not talked them out of a desire to speak with me about the position. Several weeks of Google Hangouts and a trip to San Francisco ensued. The timeline escapes me, but I have, at this point met and personally spoken with nearly every Librato employee (Ironically I’ve yet to meet Cherry, the office manager), and I have to say, they are an intimidatingly smart bunch. When I join them in a couple weeks, I will certainly be the dullest tool in that shed.

And so I find myself leaving what has become my home. A place with awesome people I love very much and an environment that I built by hand from scratch, to start a wholly new career with a new group of awesome people. I will get to hack on tools, and write, and work with engineers who-knows-where trying to measure who-knows-what. I will also probably be on the hook to give a talk or two but I’m resigned to that heh. Given the option to host the Oscars in my underwear or spend another year with the auditors I’d choose the former.

So, to summarize; If you can’t find me in a few weeks, it will be because I have ascended to some sort of worldly utopia – one that has been custom designed for me by a small group of San-Francisco geniuses. I will be doing my damnedest not to disappoint them. And I’ll be having a blast.

-dave

]>
<![CDATA[#monitoringsucks]> 2013-09-08T17:13:00-05:00 http://www.skeptech.org/blog/2013/09/08/number-monitoringsucksless-an-anti-rant A Counter-Rant

I was a little dismayed to find James Turnbell on the Nagios-bashing bandwagon. Honestly, what IS the world on about these days? Anyway, here’s a 8-months late counter-rant for you James (love you man).

But before I get to all that, I want to rant a little bit about #monitoringsucks in general, now that we have a few years of hindsight.

#monitoringsucks kind of… sucked.

To be sure, we had some good discussion and built some new tools but as someone who has spent years implementing, gluing together, and working to improve monitoring systems and infrastructure, #monitoringsucks was almost immediately boring. At the onset I thought “ok infratalk, lets get together and talk about furthering the state of the art in systems monitoring”, but from the start it was a movement determined to define itself by throwing the baby out with the bath-water. In proclaiming the uselessness of all it preceded, and claiming for its own everything that came after, the movement didn’t just refuse to acknowledge the giants on whose shoulders it stood, but rather danced around on their shoulders with its pants down, openly mocking them like a middle-school brat, and making impossible a lot of very necessary conversation.

So, while I freely admit that we’ve gotten some great tools out of this, I’ll go ahead and be the one to point out that #monitoringsucks has spent the better part of two years acting like an angsty-teenage jerk who, despite whatever valid points and great ideas he had, completely alienated everyone around him with his snide, fart-in-your-general-direction snobbyness, and I can’t stand to hang out with him anymore.

You want examples? They’re myriad, but for now here’s a pretty good example of what I’m talking about. A smart guy doing good work who (probably as a direct result of #monitoringsucks) can’t help but preface it with “psh Nagios. Iknowright?”, as if there’s a tacit, universal understanding that:

  1. Polling for CPU load is silly
  2. Nagios can only do silly things like poll CPU load
  3. Nagios magically sends you pages that you didn’t ask for.

He then goes on to describe his work which, when it eventually becomes useful for problem detection, will be made into a Nagios check about 4 minutes after it hits github.

I’m not sure what people think Nagios is anymore, but it isn’t a CPU-polling program. If you think polling and capturing CPU load is silly, then don’t do it – Nagios certainly doesn’t demand it of you. So while no one is arguing that we can’t do better, it is a mistake to assume we can’t do better with existing tools; the very suggestion that we can’t implies a shallowness of comprehension or ulterior motive on your part. Baron, you had an awesome talk bro, drop the non-sequitur Nagios bashing, and let your work speak for itself.

#monitoringsucks gave us more tools, but not better ways to use them

In the same way the ZOMG PYTHON crowd gave us Shinken, Psymon, Graphite, and a gmond rewrite, and the ZOMG RUBY crowd beget Bluepill, GOD, Amon, Sensu, and etc.. #monitoringsucks spewed out a bunch of new tools, some of which are great and most of which are reinventions of the wheel in the ZOMG language of the week. And ironically, along with the mind-boggling preponderance of new tools came this constant side-channel complaining about the Nagios + collectd + graphite + cacti + pnp4nagios antipattern. Are we to understand that Bluepill + Psymon + Ganglia + graphite is not only fundamentally different but objectively superior? Are we to throw away everything we already have and replace it with only Sensu? Are we to do that every time the next great monitoring system comes along?

The weird thing to me is that although NONE of us use a single monitoring tool anymore in real life (and for most of us this feels quite natural), we seem to be fixated on making some new UBERTOOL to replace all of the pieces we already have. So I think a really important thing that the movement failed to provide is a good way to use together whatever combination of tools make sense for us Indeed, we couldn’t even have this conversation because anyone who mentioned pre-existing tools was lolbuts laughed off the stage.

In spite of all that, I think that #monitoringsucks was healthy. It forced us to recognize that the infrastructure had changed and the monitoring hadn’t, and it focused our attention on exploring scale, and concurrency, and distributed systems, and better metrics and visualization, but it’s past time to stop bellyaching about the tools we don’t like and either make them better or adopt/build new ones. Everyone gets it. Monitoring sucks. Lets move on.

Ok James, lets do this

With that said, I want to talk a little (a lot) about the points made in this Kartar rant, because they typify much of the Nagios bashing that’s gone on since #monitoringsucks was born. But before I get into it in earnest, I want the record to reflect that I think James Turnbell is awesome, I’ve followed his work for years, and continue to, and this is in no way a personal attack or flame against James. I just want to have the conversation, because I think it’s time, and because it’ll be a healthy conversation for us to have.

1 Nagios doesn’t scale

James says:

It doesnt scale. Despite being written in C and reasonably fast, a lot of
Nagios works in series especially checks. With thousands of hosts and tens of
thousands of checks Nagios simply cant cope.

First I need to nitpick the statement that checks work in serial. In point of fact, service checks are forked from Nagios core, and their results injected into a queue where they are eventually reaped by a reaper process. It’s arguable whether fork was a great decision from a scalability standpoint, but really I want to take exception to the more general sentiment that no thought was put in to scale in the context of the Nagios event loop, because it’s an oft-repeated fallacy, and one that is almost always accompanied by some incorrect factoid about how Nagios works internally.

More generally, in years past it’s become in-vogue to sort of wave ones hand in the general direction of Nagios and proclaim that it is poorly designed. This is actually a great litmus test to detect someone who doesn’t know Nagios very well, because in fact, it’s a well engineered piece of software. Studying the 1.x source as a young sysadmin taught me most of what I know about function pointers and callbacks in C. If you want to learn about how real C programs work, you could do a lot worse than studying the Nagios Core internals.

That said, I don’t think James is guilty of misunderstanding the internals; I suspect he meant that check results are serialized by the reaper, which is a valid point. There are better ways to do it, but Ethan didn’t exactly have libevent in 2001, and we can’t fault him for not inventing it. Andreas et. al have been busy at work looking at new and improved concurrency models for Nagios 4.

And that brings me to my second point (and I’m sure James knows this): A centralized, polling-based monitoring system is only going to scale so far no matter what concurrency hacks you employ. At some point, if you want to stay with a centralized polling strategy, you’re going to need to look at distributing the load, and Nagios is ahead of the curve here compared with its direct competition in my opinion. There are eleventybillion ways to run Nagios in real life, several of which involve the use of event broker modules that make service checks (and other internal operations) distributed. These include Merlin, mod gearman, and DNX.

It is entirely possible today, to create distributed Nagios infrastructure that scales to tends of thousands of hosts and hundreds of thousands of services. This is not hackish bleeding edge type stuff, and there are documented real-world examples.

2 Configuration is Hard

James says:

It requires complex and verbose text-based configuration files. Despite
configuration management tools like Puppet and Chef the Nagios DSL is not
easily parseable. Additionally, the service requires a restart
to recognize added, changed or removed configuration. In a virtualized or cloud
world that could mean Nagios is being restarted tens or hundreds of times in a
day. It also means Nagios cant readily auto-discover nodes or services you want
it to monitor.

I agree that Nagios configuration syntax doesn’t lend itself to being machine parsable, but I disagree that it’s incompatible with configuration management engines. This kind of thing is going on right now all over the place, and isn’t considered a big deal.

Further, there are Nagios configuration parsing libs in just about every language out there, so even should you decide to roll your own, it’s not like you need to write the parser.

3 Binary views?

James says:

It has a very binary view of the world. This means it's not a useful tool for
decision support. Whilst it supports thresholds it really can only see a
resource as in a good state or in a bad state and it usually lacks any context
around that state. A commonly cited example is disk usage. Nagios can trigger
on a percentage threshold, for example the disk is 90% full. But it doesnt have
any context: 90% full on a 5Gb disk might be very different from 90% full on
1Tb drive. It also doesnt tell you the most critical piece of information you
need to make decisions: how fast is the disk growing. This lack of context and
no conception of time series data or trending means you have to investigate
every alert rather than being able to make a decision based on the data you
have. This creates inefficiency and cost.

Nagios is not the sum of its plug-ins tarball, and thresholds are very much a thing built into plug-ins and not Nagios Core. If you have something smarter than a percentage threshold detecting disk trouble, Nagios will happily execute it for you and report the result. So, by all means, write a plug-in that uses a chi-squared Bayesian computation, or holt-winters forecasting instead of a threshold. Nagios really doesn’t care (and it should be mentioned the built-in disk plug-in can use thresholds other than percent). We have oodles of clever tests that Nagios runs for us at my day job, and it can hardly be argued that the cucumber plug-in or webinject have “binary” views of the world.

If I wanted to throw notifications on rate of disk growth I’d run Nagios checks against that metric in either Ganglia or Graphite. Again, nobody uses one monitoring tool anymore and I don’t think that’s a bad thing. I do not expect Nagios to be a metrics collection engine or decision support system any more than I wrestle knife-wielding men. Nor do I rant about the extensibility of my jiujitsu on bullshido.com, because, although I love my jiujitsu, it is the wrong tool for that job.

4 It’s not very stateful

James says:

It is not very stateful. Unless you add additional components Nagios only
retains recent state or maintains state in esoterically formatted files.
Adding an event broker to Nagios, which is the recommended way to make it
more stateful, requires considerable configuration and still does not ensure
the data is readily accessible or usable.

Getting data out of Nagios is an age-old dilemma that is really only solved today via event broker modules. If you refuse to use NEB modules, then yes, this problem is not solved for you. The thing is, nobody in the Nagios community can agree what the ideal solution looks like. REST interfaces, MySQL databases, and even DSL’s to interrogate the current state of the Nagios process in RAM have all been imagined and built as NEB modules. These solutions are robust, production ready and widely used today. So while I might have agreed that this was a problem 3 or 4 years ago, given the preponderance of right answers today it seems silly to enforce something in core that only a subset of the users are going to be happy with. If you want enforcement of that kind, Nagios XI uses MySQL and Postgres. Personally, I prefer MK Livestatus and abhor the notion of a MySQL database, but most people disagree with me on that, and that’s perfectly fine.

As an OPS in real life, I consider the NEB modules that I use part of my Nagios installation, and I don’t think about them all that much. They deploy as Nagios deploys. I don’t think they’re particularly difficult to configure, and I’m thankful that they’re there.

5 it isn’t easily extensible

James says:

It isnt easily extensible. Nagios has a series of fixed interface points and it
lacks a full API. Its also written in C, which isnt approachable for a lot of
SysAdmins who are its principal users. It also lacks a strong community
contributing to its development.

I’m not sure what we’re comparing it to, but Nagios can be made into an endlessly scalable distributed monitoring infrastructure. It can take input from transient entities in 7 different ways, remotely execute checks on every operating system in existence and runs in the cloud, against Cygwin, and on dd-wrt and linux wristwatches. I met an unfortunate windows sysadmin at NagiosCon last year who literally was only allowed to run SNMP on his corporate network and had constructed a several thousand host, trap-only passive monitoring architecture on Nagios. It lets you define what it means to check a thing, what it means to notify when that thing breaks, and what it means to escalate that notification. It literally lets you make up your own configuration parameters, and has a hook everywhere it has a definition. The Broker API lets you inspect, interrupt, and override every action the event loop takes internally. I struggle with what extensible means if Nagios isn’t it, and would certainly like to get my hands on a “full” API, if the NEB API isn’t one.

The Nagios community is certainly non-standard, but I don’t think it’s fair to say they aren’t strong. Only a handful of guys have commit access to core, so most of the contributions are in the form of plug-ins and add-ons. The ichinga fork happened specifically because of the frustration surrounding the fact that Nagios doesn’t really “get” open source development, but forking Nagios to a more open development model hasn’t made Ichinga more ‘easily extensible’, and although Nagios has been rewritten in several languages none of those ports are what I’d call more easily extended.

IMO, this is because Nagios is about as extensible as a centralized poller can be. There is an underlying design here, a real and physical limit, and there just isn’t much more we can demand from this design (which makes complaining about it unproductive). If Nagios doesn’t extend to what you want, it isn’t what you want.

6 It is not modular

James says:

It is not modular. The core product contains monitoring, alerting, event
scheduling and event processing. Its an all or nothing proposition.

Agreed. Nagios is a special-purpose scheduling and notification engine. It does its thing and that’s it. To believe otherwise is to confuse a deficiency with a design goal.

Summary stuff

I won’t quote James here, but he goes on to bash add-ons, and in general question the long-term viability of centralized pollers like Nagios before hoping for a paradigm shift that will deliver the next big thing in monitoring.

At this point in my career the word “monitoring” is so laden and pregnant with connotation and nuance that it is effectively meaningless. It is however, difficult for me to imagine a world where centralized polling as a monitoring strategy is wholesale replaced with an uber-technique that optimally meets literally everyones needs. This is not just an engineering observation but a business-needs one. Centralized polling makes more sense than the alternatives in a lot of situations, and as long as the technique has a place along side the myriad other techniques we employ to monitor things, Nagios will be around because it’s a good, free, centralized poller, with a massive support community, a commercial version, and an annual conference of its own.

More than that; with the litany of monitoring systems out there that can execute Nagios checks out of the box, Nagios, like it or not, has become a specification language for prototyping systems monitoring solutions. Once you understand Nagios and the various ways it’s been extended, you pretty much understand the problem domain, by which I mean you know what humanity knows about how to centrally monitor computational entities. You also have a good mental model of the data structures required in the field and how much and what kind of metric and availability data need to be transmitted, parsed and stored. So even if you don’t use Nagios in your environment, learning about how it works makes you a better OPS – one who is adept at designing and communicating monitoring solutions to other engineers in an implementation agnostic way.

But Nagios, as far as I know, has neither claimed nor aspired to be the final, ultimate solution to the monitoring problem, so please stop flogging it. I remember a time not too long ago when we could talk about new and exciting ideas in this field without having to slander the ideas from which they were derived, and I welcome a return to that time.

Thanks!

]>
<![CDATA[headspace]> 2013-08-29T21:17:00-05:00 http://www.skeptech.org/blog/2013/08/29/headspace 10 days of experimental guided meditation.

Day 1.

An article hits my feedly by way of HN about meditation and the brain. It mentions headspace and after chuckling to myself at the realization that I live in a world where there is “an app” for guided meditation, I decide to give it a whirl.

The app will guide me in meditating for 10 minutes every day for ten days. I pick a time that I’ll likely have free for the next 10 days, and bring out my best headphones.

After watching a few short animations, I start my first session. This is not what I expected. Meditation is like, an activity, and not an easy one. I cannot imagine being able to do this on my own without someone there to tell me what to do. At the end, I’m asked by the narrator if I feel different. I don’t think I feel different, but I can’t deny: I want more.

Also, I realize that I can now preface sentences with “since I started guided meditation…”, and am therefore REALLY looking forward to tomorrow.

Day 2.

Co-workers quite annoyed at me. Line of the day was: “Since I started guided meditation, I have not put onions on my cheeseburger.”. GREAT DAY.

The second session felt easier. I focused right in and didn’t wander at all despite a cat rubbing me nearly the entire time. It’s nice to sit for a while and clear my head. When asked at the end if I felt different I could honestly say yes. Not detached exactly, but maybe a bit more.. I’m not sure how to articulate it, but “comfortably situated” comes close. I’m concerned that I’m getting hooked, but I tell myself I can stop any time I want.

Day 3.

Line of the day: “Since I started guided meditation I no longer feel the need to wear shoes to meetings.”

A frustrating day of unproductive meetings spent reiterating the obvious to various combinations of people who all should know better (for 8 solid hours).

The third session was difficult. I felt tense and unfocused. I had a muscle twitch in my left eye and my body was sore and it was difficult to settle in. Earlier in the day it occurred to me that I should blog about this meditation thing I was doing, and I kept thinking about what I was going to write about meditation in the meditation. I also thought a lot about work. I was at least halfway in by the time my mind quieted down but I was able to pull it out in the end.

Despite the frustrating start I felt pretty great afterwards, but not quite as settled as day 2. I still find it hard to imagine just sitting down and doing this on my own without a guide. I’m tempted to try it, but won’t until the 10 days is up for fear of corrupting the data.

Day 4.

Line of the day: “Since I started guided meditation, I have not needed to clear the xlate table.”

A work from home day spent mostly in “managerial” pursuits rather than productive ones. The fourth session was easy and relaxed. Cat jumped on me about 30 seconds in but otherwise no distractions at all. Must remember to sit for a couple extra minutes and let the cats gather and settle before pressing play from now on.

I’m a little embarrassed that it’s taken me four days to notice this, but it became apparent today that the sessions are following a pattern that moves your focus .. inward? .. toward a sort of pivotal bit of time where you are encouraged to lose focus and let your mind wander wherever it wants. Maybe that few seconds is actually what “meditation” is, or maybe it’s the goal. By the time I get there, I’m noticeably in a different state of mind in the same way that sleep is a different state of mind. I am not asleep; I am fully aware of everything going on around me, but I’m not exactly awake either, it’s certainly different, and it feels important.

I don’t think it’s temporally the center of the session; we spend more time getting there than getting back. If I had to guess I’d say it’s about 7 minutes in, and lasts for around 40 seconds. In the previous days nothing memorable happened during this … “dreamy unfocused center time” … but today, right near the end of it, I got a sort of “flash-image” of a revolver (yeah, the handgun kind). It was powerful – it felt as if I was physically struck by it – and came seemingly out of nowhere. I don’t own a revolver, and can’t remember having seen one in a photo or movie lately, so that struck me as odd, and caused me to notice the overall pattern that the sessions seem to be following.

Anyway, I’m hooked, and do not care if it makes me a hippie, and find myself looking forward to these sessions daily. Also prefixing my sentences with ‘Since I started guided meditation’ is not even close to getting old.

Day 5.

Line of the day: “It’s OK, since I started guided meditation, babies don’t cry in my presence.”

The first Saturday and official day off. Also the first day my wife realized in earnest that I’ve been meditating, she’s totally jealous. Did the session in the same room as my wife walking on the treadmill, the noise/presence was not a problem at all. It was a different chair and therefore a different position, which drastically altered both my breathing and my weight from what I was used to. Neither of these were a problem either, but it was kind of telling how obviously different they were once I started the focus exercises.

This made me want to meditate at the office in the position I spend the preponderant quantity of my day in. I suspect (given its similarity to the position that I used in the session today) that it’s an unhealthy repose, and meditating in it will likely result in my changing it. No strange weaponry flashes today, just a lavish 10 minutes of mindfulness. Looking forward to tomorrow.

Day 6.

Line of the day “Since I started guided meditation I encounter 14% fewer compiler errors”

I’ve completely lost interest in blogging about this. I feel like a Rock star who was too busy partying to finish the lyrics and just writes in “Nah-nah-nah-nah-nah NAAAHH” instead. I have nothing to report and yet words keep appearing on the page. Also, this post has entirely too many first person pronouns, and is boring. Meditation is pretty great. Try it if you want.

Day 7.

Line of the day “Well last week I would have agreed with you, but since I started guided meditation I’m better equipped to appreciate John Tesh.”

Still tired of this post. Also, I’m beyond tired of having to force-stop the Headspace app. Protip guys: Making an app that sits resident in memory notifying me every so often with faux-wisdom that I cannot read because you haven’t figured out how to line wrap in the notification API does NOT fill me with inner peace. I will delete you when the 10 days are up because buzzers are a horrible implementation of a stupid idea. I would have stopped this experiment on day 4 because of buzzers, but since I’ve started guided meditation I find that I am a more patient person in general, and I promised myself that I’d see this through.

Day 8.

Line of the day: “I used to be ignorant of the inner workings of node.js, but since I started guided mediation, it is a more blissful ignorance.”

Two things of note today. The first was, a cat started kneading on my forearm about 5 minutes in, and although I usually find this irresistibly uncomfortable and basically impossible not to react to, today in mediation I was able to just kind of notice that it was happening and carry on. This must be what Roland felt like in the first book of the gunslinger series when he separated himself from this own thirst to make it through the desert. So basically, since I started guided mediation, I have super powers. The cat stopped a couple minutes later because by that time my arm was sufficiently bleeding from the scratches to gross-out the cat, causing him to move on.

The second thing was that when Andy tried to guide my attention back up to normal, I refused to follow him. It was a conscious thing, but not something I’d planned going in. The option hadn’t occurred to me until that moment, but not only was I not ready, I realized that for several days I’ve felt interrupted and not ready to stop at this point in the meditation, so, annoyed and unwilling to follow Andy, I rebelled. I let Andy do his thing, and waited for him to leave, and carried on alone for a while. I don’t know how long. When I was ready I brought myself back up in the normal way.

I remember feeling like guiding myself through meditation would require a cognitive duality that I wasn’t capable of, but I no longer feel that way at all. I could do this anytime, anywhere; I’m sure of it.

Day 9.

Line of the day: “Since I started guided meditation, everything smells like beer.”

I’m sorry Andy, but I don’t need you anymore. I guess I’ve hit my angsty-bhuddist teen phase, and want to explore this on my own for a little while without your preening condescension inside my skull. It’s not you, it’s me. I just feel like, I’ve done a lot of growing lately, and you seem to still be in the same place. I’d like it if we could still be friends, and we don’t need to make this awkward – I’ll meet you tomorrow for our thing, but then I think we should spend some time apart. You deserve better. Don’t be like that, this is a good thing for both of us. I’ll always cherish the time we’ve spent together.

Also, I think I have a sufficiently quiet mind that a couple sessions a week is all I’m going to want, and 10 minutes is not enough time per session. Something like 30 minutes Tuesday and Thursday would be more ideal for me, but that seems beyond silly; like “No I can’t meet you for drinks, I seek inner peace every Tuesday and Thursday from 9 to 9:30”. Really? I’m not an ordained monk, but if “seek inner peace” is on your Google calendar I think you might be doing it wrong.

Anyway, don’t let me discourage you from trying the 10 for 10 thing; I don’t know what’s going to work out for me, but every day for 10 minutes is too often and not enough.

Day 10

Line of the day: “Since I started guided meditation I’m able to give you the pity that you truly deserve.”

AAAAAnd we’re done. Not much to say about the last session, I felt a little guilty about leaving Andy. He seemed so certain that I’d be showing up for the 15-minute phase-two stuff, but I think I’m going to call this a successful experiment and leave it at that.

See you on the other side Andy.

]>
<![CDATA[surviving vacation]> 2013-08-24T00:57:00-05:00 http://www.skeptech.org/blog/2013/08/24/surviving-vacation I’m building a lightweight, low-cost survival/trauma kit for my 3-week backpacking vacation in Montana, and thought I’d document what’s going into it here. I’m pretty happy with the result, and will probably get a subset of this together to keep in my truck as well.

I’m setting out two options, one with some extras that still comes in under $100, and a minimal kit that comes in at around $50 (depending on what you already have lying around). I’ll include links to the exact pieces of kit I’m taking where possible.

Survival minimal:

When people get lost and die in the woods, the dying part usually happens because they’re too cold, they have heart trouble, or they get too thirsty (the dehydration is usually accompanied by an injury that prevents them from walking out). So with those in mind:

Survival extras:

  • 4 Fishing Hooks
  • 4 Fishing Sinker Weights,
  • 50’ 10lb Fishing Line
  • wire saw

Trauma minimal:

People who die from injuries in the woods, do so from bleeding too much, or because something has happened to make it so they can’t breathe. So:

Now that you can buy CELOX for less than QuickClot, I can’t think of a good reason not to. QuickClot is good stuff, but causes pretty bad chemical burns, and really, you’re going to have enough to worry about as it is.

I was shocked, by the way, to discover that I could purchase a Nasopharyngeal Airway on Amazon for $7. It’s good that I’m no longer a teenager because we would have been playing a whole different kind of doctor if we had the internet when I was a kid.

Trauma extras:

It is, of course possible to improvise a tourniquet and a splint, but they aren’t heavy items, nor are they overly expensive, and if you’re in the kind of situation where you’re seriously considering the application of a tourniquet, I think you’ll be happy to not have to risk screwing it up. Further, if you’re backpacking in Montana and need to splint a bone, you’re likely to need to move several miles to hit a road, which is not something you want to do with an improvised splint.

For trauma kits, you want the folded “zpack” or “S-rolled” gauze. It’s way easier to deal with than the classic role as well as less likely to become contaminated.

Anyway there it is.

]>
<![CDATA[Anonymous Time Tracking]> 2013-08-21T08:41:00-05:00 http://www.skeptech.org/blog/2013/08/21/anonymous-time-tracking The most frustrating thing about time tracking in the nerdosphere is that we aren’t allowed to fix it. I mean really, given our propensity for measuring this or that, you’d think we could be trusted with it, but it is a thing, even in the supposedly geek friendliest of shops, not so much done for us or by us but to us. It is a thing imposed, by faceless agents of “the business”, whose concerns and decrees are beyond our meager comprehension.

If I ever stumble into a parallel dimension where I can, as an individual contributor, affect time tracking process, here are a few observations I might make, and one large suggestion based on those observations:

Observation 1

There is a cognitive dissonance between the usually stated justification for time tracking and the way it’s carried out in practice.

At most shops, the need for time tracking is driven by management’s simple desire to understand how much labor went into undertaking this or that project. It’s important to know that it doesn’t cost you more to build a thing than you make when you sell it. This is not only the sort of thing I imagine you might pick up in business school, but also a valid point.

Why is it then, that when we undertake time tracking, we ask the question: How many hours did Steve work this week? When we should be asking the question: How many hours did Ops contribute to Project X?

Are we paying Steve by the hour or something? (probably not if he’s an op). Why do we care about Steve at all if our goal is to find out how much project X cost? We might care about what Steve’s time is worth, but we aren’t asking that question either. So is there a second, unstated goal? The goal of finding out how individually productive each employee is? Because that’s a very different sort of problem, bringing me to:

Observation 2

Your employees wont report their own lack of productivity to you (no matter how intimidating you are when you demand it of them).

If you don’t hire programmers and ops guys that are idiots, you can pretty safely assume that they know where this is going. Many of them have even seen it before. Maybe you honestly don’t, so I’ll lay it out for you: When Ted reports 40 hours and Steve 10, your bean-counter genes will force you to go back to Steve and smack him for his lack of productivity. And when you do that, you’ll be ignoring a few pretty typical details in the process:

*Steve got more done in his 10 hours than Ted will all week

*Ted loves time tracking because it makes him look productive

*Technical Aptitude is inversely proportional to bullshit paperwork aptitude

*The kind of employee who will give an honest answer is the one you’re most likely to punish.

Of course, Steve saw you coming the moment you mentioned time tracking in the first place. He just wanted to see what you were going to do when he gave you good data, and you did pretty much what he expected. You punished him for it. Which leads me to:

Observation 3

If you want good data, you shouldn’t encourage lying.

Tracking employee time on an hourly basis begets garbage data. If you track per employee, you’ll inevitably make invalid assumptions about when and how they’re working, and compare them against each other based on irrelevant metrics like total hours per week. They aren’t going to come to you to explain why they work differently than other people, why would they think you care? They are going to lie to you, and you are going to thank them for it.

Even if you know this going in, and attempt to account for it, it will happen. Somewhere in the chain, some middle manager will see those numbers, and in him they will trigger that finely honed survival instinct to cover his ass, and he will; ruining your data in the process. It will happen this way until we cease to be human, because it is a human problem, which brings me to:

Observation 4

A tool isn’t going to fix it.

Buying a fancier, web-based spreadsheet isn’t going to help. You’ll just have to trust me on that one.

For Ops and Dev, context-switching is expensive. So if you can suck time-tracking data out of their commit comments, or put an API out there to which they can code their own interfaces, or have them check out blocks of time, or charge them for it via a cost-center, or put physical buttons on their desks with project names on them, that’s great. Make the process as lightweight integrated, and clever as you can, but understand that the lying won’t stop until you clue in and stop incentivizing it.

If you try to fix it with clever tools, the lying will only get more creative. If you put up an API, they’ll write random number generators to it, if you put in a cost center, they’ll rob Peter to pay Paul, if you put buttons on their desks, they’ll hack together random number generators with pneumatic actuators. But they won’t give you real data because you’re not asking for real data. You’re asking for pretty numbers, and they’re perfectly happy to oblige you.

Observation 5

Anonymity gives you what you want: Good data.

Tossing out the employee name is the cheapest way to solve all time tracking problems. Your employees will give you the best possible data and you will know how much it costs to build a thing. You won’t be tempted to bludgeon them, and they won’t need to lie to you. If you want to measure their productivity, stop being silly about it. Instead, set real, measurable, goals for them and let them accomplish those goals by employing their time in whatever way they need. They want to be happy and productive. They want to build things. Give them a way to throw hours at a project anonymously, or better yet, let them build it for you, and live happily ever after.

]>
<![CDATA[meshnet]> 2013-08-09T17:00:00-05:00 http://www.skeptech.org/blog/2013/08/09/meshnet I wish that I could say I’ve watched the ongoing Snowden escapade with shock and horror and dismay, but I can’t quite manage any of it. That isn’t to say I find it irrelevant. Quite the opposite, as someone colocated mere yards away from Lavabit the whole thing hits close to home literally and figuratively.

But, at this point, after decades of being shocked, horrified and dismayed by crime after crime committed by our politicos in the name of the greater good, the only kind of surprise I can manage is surprise that anyone is surprised. My own emergent political awareness necessarily coincided with with the realization that our government and the constitution have been orthogonal pursuits for at least a hundred years. I don’t hope to sound controversial in saying that, it’s a realization that everyone makes who has read seriously into polisci. We are an empire, and our contemporary political notions revolve mostly around the direction and magnitude in which we should abuse our power.

It took them, I think, longer than anyone expected (assuming our current understanding of the time line is accurate), but the government finally broke the internet. And it’s beyond time to accept that, despite we academics fettering about at our conferences over encryption protocols and whether message traffic between Alice and Bob might be compromised, the whole network is now pretty much compromised and Alice and Bob couldn’t care less.

Once you’ve realized that the internet is broken and no one cares, here’s another interesting thing to think about: What would theoretically be more useful to a regime that wanted to enforce thought-crime laws? A computer network that didn’t let you say what you wanted, or one that encouraged you to say anything with a false sense of anonymity? So there’s some silver lining for you, we probably don’t really need to worry about internet censorship in America (and even were it to come, its effect would be to protect us from our government). Here’s an interesting corollary; the nations undertaking internet censorship are therefore either too stupid to realize that they could be spying on their citizens, or too stupid to figure out how. Maybe they just have better means.

Anyway, in spite of all of that, I don’t know many people who would point at the internet in it’s current incarnation and say, broken though it is, that it was a net loss. A lot of good came out of our building that network, and although we were probably a little naive in our initial protocol design, and arguably too trusting in our housing of it (although, to whom were we to entrust it? Government?), we managed to help a lot of people be extremely productive, and creative, and disruptive (and very very sexual), and all of that is not only awesome, but sort of unprecedented in human history. So good job us.

Here’s another obvious observation for you; some really important stuff has changed since we hacked together DARPANET all those years ago. We have both better transport technology (especially wireless) , and the benefit of perspective. We no longer doubt that people find world-wide computer networks useful and eventually worth adopting (even if they didn’t understand at first), nor do we doubt that it will be attacked and subverted despite our best efforts at bolting-on security after the fact.

So here’s an interesting idea: lets start over, but this time we’ll make two important changes. First, we’ll use wireless transport, and peer between teensy private entities (like you and me) instead of huge corporate sell-outs. Second, we’ll build end-to-end encryption into the data-link layer, or transport protocols to make it a little harder to listen in.

I hear you. That’s nuts. It’ll take FOR-EV-AR. Nobody will understand or use it but kiddie-porn watching, heroine dealing, Satan worshiping terrorists. The government will just outlaw it. They’ll employ jamming systems to thwart it. They’ll adopt and subvert it. It’s too expensive. It’s too dangerous. Nobody CARES about privacy!

My first unix shell account was from Primenet. An ISP in Arizona who, in 1993, started giving free shell accounts to college kids. One wonders who but college kids would have paid for a Unix shell account in 1993. I, as a high school kid in California couldn’t afford one, but luckily they had a dial-up number in my area code and didn’t do a thorough job of fact checking. Two of the many reasons no doubt that they’re not around today.

I remember that shell account, with the pine, finger, talk, kermit, and the mudding via telnet in the same way most guys my age remember their BBS days. It was slow, and buggy and ugly, and overly technical and only nerds did it, and we none of us ever felt more connected to something than we did then, despite lifetimes of connecting together a world of people.

Today I have a 100Mb/s fiber line directly connected to my house and a pocket phone that can locate the proximate source of bacon to within 10 feet of where I am currently standing, yet I feel uninspired and wary. Everything is kind of overly abstracted and dumbed down. We devops our clouds and hope not to run afoul of the secret police; to be informed by decree from a secret court that our efforts are kind of, sort of, not entirely but sometimes federally illegal. When this happens we wink and nod to our contemporaries and allude to the gag-order. Meanwhile every other LLC in the bay area knows how often I buy bacon, and they evidently chat about me with the NSA over espresso. It wasn’t supposed to end like this.

Do you know what would inspire me? A decentralized, secure, crowd-sourced replacement to the internet; a dark-net. Black-clad activist-nerds would climb radio towers and access high-rise rooftops in the dead of night to install repeaters for it. Or maybe Solar-powered stratospheric gliders with a raspberry-pi-powered WiMax darknet repeater payload. We could launch them by balloon. I imagine them silently struggling like wild salmon against the jet-stream as they forward packets – so numerous and small there is no hope of destroying them or thwarting their signal. We could custom engineer a repeater for the every-man, and mass-produce it to get the costs down. Everyone could run a repeater at their house and give one to their neighbor for $20.

If (when) it became illegal I imagine I’d do it anyway. When installing open networks becomes a crime, I guess I’ll accept my role as a criminal and dangle from radio towers until they lock me up. I’ll be in good company by then. Anyway, that’s what freedom looks like in my daydreams (well partly), and if you think me juvenile and silly I’d probably agree with you.

Except, as it turns out, they’ve already built it in Seattle. The Meshnet project has evidently been at this at least as long as I’ve been daydreaming about it. I thought I was a silly dreamer (and I’m right), but I’m not the only one. I just didn’t get the memo that we were all moving to Seattle.

]>
<![CDATA[Hearsay]> 2013-07-27T11:39:00-05:00 http://www.skeptech.org/blog/2013/07/27/hearsay #monitoringsucks gave us more tools, but didn’t make using them any easier.

Exponential Complexity (almost)

Every monitoring system has it’s own import and export hooks, which necessitates the reconfiguration of every monitoring system in your infrastructure every time you add a new one. There is a word that describes this arrangement where I come from – a word that rhymes with “full-spit”.

TooManyConfigs

I don’t think the systems are themselves to blame. The best each can do is provide the most elegant and simplistic solution for data I/O that makes sense in their own context, and most of these interfaces are, in fact, elegant and simple. (mostly) No one objects to this or that interface specifically, but the burden of gluing three or four of them together is the sort of thing that sends all of us raging into the blogosphere

Maybe another monitoring system isn’t what we need

What if every monitoring system imported and exported a common data format so that instead of that ugly picture above, they looked like this instead:

CommonConfigs

Impossible you say?

The Riemann Event Struct does a pretty great job of describing a system-agnostic blob of monitoring data. Whatever your monitoring system, I’d wager the data it collects can be imported into this struct. For most systems this struct is overkill.

In fact, when I think about any other monitoring system in the context of this struct, not only does it fit, but a procedure for performing the translation springs to mind. With Nagios, for example, I’d create a new notification command that used Nagios macros to write this struct out in JSON.

Import is a little more difficult, but still easy enough to imagine. For Nagios, we’d take a JSON-encoded blob off the wire, parse it into a passive check result and inject it into CMD file. There are smarter ways but that gets us there.

The important point is that if every monitoring system provided native support for this struct, we wouldn’t need to think about import/export at all. If we were careful about naming our services etc.. data exchange would “just work”, and all we’d need to worry about is getting blobs on the wire; queuing them, and routing them around – which really is the problem we WANT to worry about, because network architecture, and scale, and environmental specifics is the stuff that actually differs for us users.

Introducing libhearsay

I think that, before we grab our torches and pitchforks and mob the vendor floor we should prove out the model. I want to see it working in practice and build a few broken things to make sure we get it right. So to that end I’ve written libhearsay.

Hearsay implements a common data model for monitoring systems, and includes two tools that take care of most of the messaging details. A “scrap” of hearsay is a Riemann Event Struct, plus an optional UID field (to assist with de-duplication and commuting).

The Spewer utility takes a JSON-encoded scrap of hearsay on STDIN or a TCP socket. It then validates the scrap(adding default values if necessary), and puts it on the wire using either a ZeroMQ PUSH or PULL socket.

The generic Listener utility takes a JSON-encoded scrap of hearsay off the wire, validates it, and places it on STDOUT. It takes a “filter” string which you can use to filter out messages you don’t want, and has a ‘Nagios’ mode where it outputs passive check results instead of JSON.

I’m thinking about and writing special purpose listeners for specific monitoring tools which inject the scraps directly into various monitoring systems in the way those systems expect to receive input. The critically important part of this actually working, I think, resides in an admin’s ability to have the listeners “just work”. We should just be able to point the listener at the spewer cloud and magically start seeing updates in the Monitoring systems UI.

Here’s a short list of systems I want to make special-purpose listeners for (your help would be appreciated, if you contribute one of these I will buy you a beer at the next conference we both go to):

  • nagios
  • ganglia
  • graphite
  • rrdtool
  • riemann
  • zenoss
  • zabbix
  • recconoiter
  • munin
  • mysql

Hearsay is written in GO and depends on the gozmq package. It’s super buzzword compliant.

Going Forward

Using just spewer and generic listener and some shell scripts, we should be able to get some systems talking to each other, and even experiment with some messaging patterns to see where stuff breaks, and what I haven’t thought about.

Assuming this isn’t a long road to a dead-end, here’s the plan:

  • Step 1. Get a lib implemented and a few simple tools (mostly almost done kind of)
  • Step 2. Hack up special purpose listeners (and maybe spewers) to lower the barrier to entry
  • Step 3. Push for Native adoption EVERYWHERE.
  • Step 4. Narnia

See my hearsay page or the github site for more info.

take it easy

-dave

]>
<![CDATA[Unhappy Bean Factory]> 2013-03-28T14:47:00-05:00 http://www.skeptech.org/blog/2013/03/28/unhappy-bean-factory What follows is the actual conversation between myself and a project manager who noticed that a certain back-end process wasn’t working. The content has been changed slightly to protect the innocent and/or the names of internal data structures et al. It could be inferred from the following that neither of us are Java people, and those inferrnaces would be correct.

(13:10:16) Kathy : hum… ok… VISA didn’t get transactions today

(14:38:13) dave: megha’s beans are not autowiring

(14:38:19) dave: and the factory is very upset

(14:38:33) Kathy : oh no

(14:38:43) dave: yeah it’s throwing things and refusing to nest

(14:38:55) Kathy : not good…. how much time to fix

(14:39:03) dave: it’s throwing nesting exceptions. EXCEPTIONS TO NESTING. basically, it’s anti-nesting

(14:39:06) dave: on account of her beans

(14:39:23) dave: we’re re-jarring her beans

(14:39:29) Kathy : let me get off this phone call and then i’ll call you

(14:39:29) dave: or at least she is

(14:39:46) dave: and then we’re going to see if maybe the factory will nest

(14:40:07) Kathy : are you talking jelly beans or pork and beans?

(14:40:45) dave: well clearly they aren’t autowiring beans…

(14:41:00) dave: Error creating bean with name ‘userService’: Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private ; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name ‘foo’: Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: .mailService; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name ‘mailService’: Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private org.springframework.mail.javamail.JavaMailSender .services.MailServiceImpl.mailSender; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [org.springframework.mail.javamail.JavaMailSender] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency.

(14:41:29) Kathy : what the hell…

(14:41:32) dave: you see how the factory is upset about the beans?

(14:41:40) Kathy : totally, I would be too

(14:42:01) Kathy : we don’t want to make the factory upset

(14:42:18) dave: well I think it’s too late for that Kathy

(14:42:24) Kathy : crap

]>
<![CDATA[PF Limits in OpenBSD]> 2013-01-15T14:44:00-06:00 http://www.skeptech.org/blog/2013/01/15/pf-limits-in-openbsd This article documents one of several insidious little gotchas I’ve encountered using OpenBSD systems in a core-router/firewall capacity in lieu of Cisco 2851 or Juniper j4350 class hardware. Specifically, various hard memory limits built into PF, which, when encountered, cause PF to stop accepting new connections.

Incidently, here is the story of how I wound up replacing the preponderant quantity of my networking gear with openBSD and saved metric-oodles of coinage.

Anyway, the upshot is that if you use OpenBSD with PF in a production environment and you aren’t aware of PF’s memory limit (especially the state-related memory limits), you have a ticking time-bomb on your network. Just FYI.

I’d been playing with OpenBSD for fun, in low-budget side projects, and non-prod environments for years before that fateful day that I ran into the state-table limit like a brick wall.

It was shortly after I’d replaced the cisco-based core routing infrastructure of our Headquarters building with OpenBSD. It presented as a sort of network “glitch”. You know, the unexplainable little connectivity loss that only affects one user. Probably his cable, or wall socket. But then it was two or three users, and then it was a user whose connectivity was working fine, except for he couldn’t create new ssh connections suddenly (wha?). It was gone as quickly as it appeared, and never seemed to adhere to any sort of consistent set of symptoms. It was quite maddening.

At some point I noticed that if I was quick enough, I could catch a “no route to host” error message from PF on the console of the core routers, and that’s when I really started looking at them in earnest.

It turns out, as I’ve already said, the kernel keeps memory set aside for PF to do things like create state tables and state table entries. In my case I was hitting the limit on the total number of states PF was allowed to track at once. This meant that new connections would fail with no route to host until some other state expired and made room for the new one. This looked downright wierd troublshooting from the outside because protocols like HTTP (which are stateless) would still work pretty well, while others like SSH (which requires a constant connection) were more likely to have problems.

You can see the default sizes of these limits using pfctl -sm:

# pfctl -sm
states        hard limit    10000
src-nodes     hard limit    10000
frags         hard limit     5000
tables        hard limit     1000
table-entries hard limit   200000

These are pretty sane defaults for most people who are running OpenBSD routers, which is to say, nerds who have wedged it on to their soekris board or the wrt54 they found at the second-hand store, or the 8086 they found under the sink in their dad’s house.

If you’re running production routers on real hardware you’re going to want to raise those a bit. And by ‘bit’ I mean like two orders of magnitude. Do this with a line in your pf.conf that looks something like this:

set limit { states 1000000, frags 1000000, src-nodes 100000, tables 1000000, table-entries 1000000 }

You can check to see if you’ve ever hit one of these limits with pfctl -si, which displayes the values for a whole bunch of couters tracked by PF:

[dave@a][~]--> sudo pfctl -si
Status: Enabled for 686 days 01:20:03            Debug: err

State Table                          Total             Rate
    current entries                    39401               
    searches                    587674569722         9914.3/s
    inserts                      23981800145          404.6/s
    removals                     23981760744          404.6/s
Counters
    match                        24166482278          407.7/s
    bad-offset                             0            0.0/s
    fragment                               0            0.0/s
    short                                  0            0.0/s
    normalize                           1282            0.0/s
    memory                                 0            0.0/s
    bad-timestamp                          0            0.0/s
    congestion                           204            0.0/s
    ip-option                         433656            0.0/s
    proto-cksum                            0            0.0/s
    state-mismatch                    135709            0.0/s
    state-insert                           0            0.0/s
    state-limit                            0            0.0/s
    src-limit                              0            0.0/s
    synproxy                               0            0.0/s

If you have RRDTool installed, you can use this shell script to push some of these values into an RRD (or repurpose it to feed collectd or gmond or whatever):

#!/usr/local/bin/bash

gawk="/usr/local/bin/gawk"
pfctl="/sbin/pfctl"
rrdtool="/usr/local/bin/rrdtool"
RRDHOME='/home/pcap/rrd'

pfctl_info() {
    local output=$($pfctl -si 2>&1)
    local temp=$(echo "$output" | $gawk '
        BEGIN {BytesIn=0; BytesOut=0; PktsInPass=0; PktsInBlock=0; \
               PktsOutPass=0; PktsOutBlock=0; States=0; StateSearchs=0; \
               StateInserts=0; StateRemovals=0}
        /Bytes In/ { BytesIn = $3 }
        /Bytes Out/ { BytesOut = $3 }
        /Packets In/ { getline;PktsInPass = $2 }
        /Passed/ { getline;PktsInBlock = $2 }
        /Packets Out/ { getline;PktsOutPass = $2 }
        /Passed/ { getline;PktsOutBlock = $2 }
        /current entries/ { States = $3 }
        /searches/ { StateSearchs = $2 }
        /inserts/ { StateInserts = $2 }
        /removals/ { StateRemovals = $2 }
        END {print BytesIn ":" BytesOut ":" PktsInPass ":" \
             PktsInBlock ":" PktsOutPass ":" PktsOutBlock ":" \
             States ":" StateSearchs ":" StateInserts ":" StateRemovals}
        ')
    RETURN_VALUE=$temp
}

### collect the data
pfctl_info

### update the database
$rrdtool update ${RRDHOME}/pf_stats_db.rrd --template BytesIn:BytesOut:PktsInPass:PktsInBlock:PktsOutPass:PktsOutBlock:States:StateSearchs:StateInserts:StateRemovals N:$RETURN_VALUE

And then use the following to draw graphs from it:

#!/bin/sh

RRDHOME='/home/pcap/rrd'
cd ${RRDHOME}

#####
######## pf state rate graph
/usr/local/bin/rrdtool graph pf_stats_states.png \
-w 785 -h 151 -a PNG \
--slope-mode \
--start -86400 --end now \
--font DEFAULT:7: \
--title "pf state rate" \
--watermark "`date`" \
--vertical-label "states/sec" \
--right-axis-label "searches/sec" \
--right-axis 100:0 \
--x-grid MINUTE:10:HOUR:1:MINUTE:120:0:%R \
--alt-y-grid --rigid \
DEF:StateInserts=pf_stats_db.rrd:StateInserts:MAX \
DEF:StateRemovals=pf_stats_db.rrd:StateRemovals:MAX \
DEF:StateSearchs=pf_stats_db.rrd:StateSearchs:MAX \
CDEF:scaled_StateSearchs=StateSearchs,0.01,* \
DEF:States=pf_stats_db.rrd:States:MAX \
CDEF:scaled_States=States,0.01,* \
AREA:StateInserts#33CC33:"inserts" \
GPRINT:StateInserts:LAST:"Cur\: %5.2lf" \
GPRINT:StateInserts:AVERAGE:"Avg\: %5.2lf" \
GPRINT:StateInserts:MAX:"Max\: %5.2lf" \
GPRINT:StateInserts:MIN:"Min\: %5.2lf\t\t" \
LINE1:scaled_StateSearchs#FF0000:"searches" \
GPRINT:StateSearchs:LAST:"Cur\: %5.2lf" \
GPRINT:StateSearchs:AVERAGE:"Avg\: %5.2lf" \
GPRINT:StateSearchs:MAX:"Max\: %5.2lf" \
GPRINT:StateSearchs:MIN:"Min\: %5.2lf\n" \
LINE1:StateRemovals#0000CC:"removal" \
GPRINT:StateRemovals:LAST:"Cur\: %5.2lf" \
GPRINT:StateRemovals:AVERAGE:"Avg\: %5.2lf" \
GPRINT:StateRemovals:MAX:"Max\: %5.2lf" \
GPRINT:StateRemovals:MIN:"Min\: %5.2lf\t\t" \
LINE1:scaled_States#00F0F0:"States" \
GPRINT:States:LAST:"Cur\: %5.2lf" \
GPRINT:States:AVERAGE:"Avg\: %5.2lf" \
GPRINT:States:MAX:"Max\: %5.2lf" \
GPRINT:States:MIN:"Min\: %5.2lf\n" 

which will yield you a pretty, two-axis graph like the one below which should help you avoid limits in the future.

PF State Graph

]>
<![CDATA[Unscrewed; A story about OpenBSD]> 2013-01-13T15:32:00-06:00 http://www.skeptech.org/blog/2013/01/13/unscrewed-a-story-about-openbsd To put it in no uncertain terms, I was screwed. It was 5am, and the network was down. The network was down because the core routers had crashed. Yes, both of them, master and backup. And to make matters worse, this was only 12 or so hours after the network was down because the switches had crashed. Yes. All of them.

Have you ever even heard of something like that? 3 switches? core dumped?! Well, pull up a chair brother, and let me tell you the tale. I was THERE. Hell, I was even responsible.

I was well and truly screwed.

It all began in the year two thousand and seven, and I was in the midst of implementing a network design for the web-hosting company I work for, so they could migrate their 90’s era infrastructure to shiny new tech. Where before there were Alteons, there would now be application-layer balancers. While in the past they had bulky, power-hungry application servers, now they would have VM’s. Where before there was an inside and a DMZ, now there would be a proper four-tier multi-homed network with a dedicated static content tier. Can you see it brother? The promised land, where rainbows arch over free-flowing rivers of beer, and everything smells like bacon.

The only, teeny, tiny problem, was that we couldn’t quite afford to implement the network we needed on Cisco gear. It would have been upwards of 150k in network gadgets to get us there with the big C, and we had lots of other stuff to buy, so I began researching alternatives.

Now around that time, Juniper Networks was just beginning in earnest to condescend to make the little routers and security appliances that guys like I need. They had a slew of really interesting looking routers and switches that were perfect for our new environment, and a very new and shiny JunoOS, to go along with them. Especially given the new OS, the Juniper stuff was far more flexible compared to the Cisco gear, and would save us all kinds of money on hardware. And bonus, everything in Juniper-land ran on the same OS. Given their reputation with the MX series, I figured what could possibly go wrong?

Well let me tell you brother; a lot. And by “a lot”, I mean pretty damn-well everything. I tell you, I built those things up like the Burj Khalifa, and down they fell like Hephaestus from the heavens; down, down, and down… for days. The thing was, the new JunOS was just too different and too new. I couldn’t buy training at the time, none of the other network operators at any of the Colo’s we frequented had used it, and the books that were available were all written to earlier versions of the OS, which were quite different. It was a perfect storm of bad timing, ironic potential, and untried tech. So there I was, right out on the razors-edge, implementing production networks with an untested OS on brand new hardware. And oh how it failed.

But what choice had I? And indeed, now that the websites were down and every piece of network gear I’d used to implement this network had proved itself unusable, that was surely the question of the hour. What were my choices? Even if I could somehow conjure the hundred and a half thousand dollars we needed to acquire Cisco gear, it would be days before I could actually get my hands on it and configure it all. I needed uptime, and I needed it now.

Enter OpenBSD

If you’re in the packet delivery business, and you’ve never tired OpenBSD, then you’re really missing out. Pretty much everything you care about as a network guy on production networks is configured via a virtual interface. This includes CARP, IPSEC, and all manner of encapsulation and tunneling protocols. This is awesome because all the tools designed to work on interfaces, like tcpdump, work on these virtual interfaces too. So if I want to get a look at my VPN traffic, I can tcpdump enc0.

Which brings up another great point, with OpenBSD, your packet inspection and general network troubleshooting toolbox is way better. Nmap, Argus, sflow, tcpdump, snort, daemonlogger, and etc.. all the best tools are right there on your router if you want them. No need to use a packet tap, because your router is the packet tap.

OpenBSD has myriad built-in daemons for OSPF, BGP, and every other router protocol, as well as application-layer protocol proxies. OpenBSD is by far the fastest, easiest way to setup an ftp proxy that I know of. It also has a kernel-space packet filter called PF, which is crazy feature-rich and and easy to use. If you can console configure an ASA, or are an iptables user, you’ll pick up PF’s syntax in about 15 minutes. All the normal stuff like NAT, redirection, and forwarding are there. Further, PF can do things like policy routing, where you tag packets based on criteria you choose, and then make routing decisions later based on those tags. PF has packet queuing and prioritization built-in, so you can make some classes of traffic more important than others.

Bi-directional load balancing is available, so you can tell PF to distribute out-bound traffic across different upstream paths, or distribute in-bound traffic across a farm of web servers. PF can even use a special virtual interface to share it’s state table with other systems running PF, making possible seamless failover to up to 255 other PF hosts (limited by CARP).

But I digress. Anyway, at the time, I’d been playing with OpenBSD for a couple years and had some pet projects going with it, but I hadn’t honestly considered replacing ASA/Pix/j4350 class core router hardware with it. Well mired as I was, in the pit of molten exigency, the possibility dangled into view like a rope ladder. I knew it would be stable. I knew it was more than sufficiently functional. But I didn’t really know how well it’d perform.

So how did it perform?

So I grabbed a few Cisco 2950 switches that we had laying around and threw up a pair of OpenBSD boxes on 1u SuperMicro hardware that we also had laying around. That stabilized things, and bought me some time to ponder next steps.

While I looked for a permanent solution, that wouldn’t break the bank, I kept a nervous eye on those initial band-aid systems and while they kept up fine, (no jitter, latency equal to an ASA, less retransmission than we had on our Cisco-routed networks) they didn’t achieve the line speeds that my dedicated network gadgets did. The load-times on the sites we host weren’t noticeably higher, but it was noticible on large file transfers between internal systems.

That was interesing, so I Googled around, and came across this article from calomel.org on tuning (free|open)BSD for network performance. A few sysctl tweaks later, and my little duct-tape routers were indistinguishable from their posh green and/or blue colored counterparts.

Well, they weren’t totally indistinguishable, they were far more flexible and inexpensive than their posh green-colored counterparts, and far more stable, flexible, and inexpensive than their posh blue-colored counterparts. Was this a final answer? Was I really going to run production infrastructure – websites that were discussed on Oprah, and plugged in TV adds during the NCAA playoffs – on a general purpose OS with commodity hardware instead of industry standard network gear?

Brother, you’re damn right I was.

I liked them so much, I returned all of the Juniper gear, and used a teeny fraction of it to buy a pair of front I/O super micro boxes, with two 4-port EEPROs in the PCI slots, so they had a proper home. This has become a production standard for us. Anywhere you’d find an ASA, router or other mid-range network doohicky in somebody else’s network, you’ll find a happy little pair of SuperMicro 515’s in ours. We run OpenBSD on smaller, high-end Soekris-style hardware for dedicated tunneling stuff like OpenVPN and IPSEC (we maintain several IPSEC tunnels with 3rd parties where the other end is a Cisco ASA, and that works great). The only reason I use “real” network gear anymore is:

  • In the switch layer (duh)
  • When I need specific interface hardware for SIP, DS3, coax etc..
  • If I ever need big-iron mx480’s or whatever for core routing 10gbps links.
  • When my boss makes me.

And there you have the story of how I was unscrewed by OpenBSD. What’s wrong brother? I see you sitting there, arms crossed, with that skeptical look of disgust on your face. What’s that? I’m just a Unix-idiot who failed to plan? There’s no way OpenBSD can achieve the performance of dedicated hardware? Cisco and/or Juniper are for-the-win and I’m an ignorant windbag?

For the record, I didn’t write this post with the intention of dissing up on your preferred network hardware vendors. Personally I think Juniper and Cisco make great stuff, and while I agree in theory that there’s no way OpenBSD can fling packets as fast as a 2950 on a gigabit line, in practice, I can’t tell the difference with the tools at my disposal. In fact, I can tell you from experience that whatever latency I’m introducing is more than made up for by the introduction of application-layer solutions on the core routers themselves. For example, running a caching dns server on the end-user gateway device in HQ is faster for the users than running an ASA with a caching NS somewhere else on the network. And bonus, I get to run DJBDNS because the gateway is a Unix box.

Further, for the record, I ran the Juniper j4350’s and EX4200’s for a few months in that environment to make sure they were stable. Why and how they suddenly failed within 24 hours of each other I cannot guess, but I wasn’t doing anything fancy, and after that happened I just couldn’t risk any more down time. If I tried the same setup again today with current JunOS, I would almost certainly have a different experience.

Honestly though I’m glad it happened the way it did because while it didn’t meaningfully lower my estimation of either of those vendors, it substantially changed my perspective on the use of general purpose OS’s like OpenBSD instead of embedded systems to backbone real production networks. Where before I would have been hesitant, these days – for the preponderance of situations I normally encounter – I greatly prefer to run OpenBSD routers. They’re just more efficient.

  • They save you hardware and hops because they can pull double duty as a router, firewall, FTP proxy, NAT gateway, DNS cache, VPN gateway, caching squid reverse-proxy, etc… ad-infinitum

  • Your failover systems are fully functional (not nerfed-license versions of the real thing)

  • No licenses. All of your routers have pretty much the same capabilities (no wondering if that regional 2800 has a crypto license)

  • You choose your own hardware, and have real control over optimization.

  • Better tools for everything, including troubleshooting and config backup/restore

  • Run real monitoring agents (unless you actually enjoy SNMP, in which case, feel free to continue using it)

  • Everything that comes with Unix (chef for the network layer!)

Anyway, the end.

]>
<![CDATA[The problem with PJLOTR]> 2013-01-12T15:31:00-06:00 http://www.skeptech.org/blog/2013/01/12/the-problem-with-pjlotr If you’re reading this, then you’re probably already frustrated at me. You probably really liked Peter Jacksons Lord of the Rings movies, and can’t understand how curmudgeons like myself don’t. Instead of talking to you at length about it, I probably pointed you here. I’m pretty tired of having this argument, so I’m sorry if I came off like a pompous jerk. But if you actually want to understand why us curmudgeons feel the way we do, I offer you my own answer though I can’t pretend to speak for us all.

If I were to give you a bucket full of all of the verbs in the English language, and you were to sort them into two piles: things that a person can do, and things an army can do, I think you might discover a thing or two about the nature of humanity.

It’s sort of an obvious distinction that we rarely think about, but there is a radical and absolute difference between these two classes of action. They don’t overlap. As Mitchell says: “A person cannot invade Normandy any more than an army can play the violin”. It’s obvious when we think about it, but we don’t, and that gets us in trouble.

That last sentence is a perfect example. In point of fact, “we” can’t think at all – thinking is something that only an individual person can do. Thought, reason, wonder and love are verbs you’ll find in the person pile. It’s (tellingly) a much larger pile. And yet the actions of groups seem to, in many peoples minds, outweigh those of individuals. It is a pervasive and dangerous fallacy.

There is a popular interpretation of LOTR as being symbolic of the struggle by free republics against turn-of-the century authoritarianism of the sort most people associate with Nazism. But the zeitgeist of that era was not particular to that nation. Nor was it a blind hostility toward the populace as much as it was the adoption of central planning at the cost of individual liberty. The men of that time were hell-bent on shaping the world into a “better” place, a place that involved ‘citizens’ working toward a common ‘good’ (good as defined by them of course) instead of individuals working to improve their own immediate surroundings, and that vision was important enough that anyone who disagreed with it had to be… well ‘purged’, or at least re-educated. The result was mostly barbed-wire fences and dead people.

It’s arguable whether the experiments of those men were successful or not. Their undertakings have certainly shaped the minds of our generation to an extent that is difficult to describe, and we remain infected with their dystopian vision. So the central struggle in LOTR, the essence of the story and primary message of the work – that of the triumph of the individual over the collective – is a real one. And it’s, going on right now (and in real life, we’re losing).

So, the biggest practical difference between Tolkien and Peter Jackson, is that the former was a wise and learned man, who understood and deemed important the choices and actions of normal every-day people. This is why he designed main characters (hobbits) in such a way as to accentuate their physical weakness. This is why he doesn’t involve the elves in the conflict directly. This is why Gandolf spends hundreds of years in study and diplomatic endeavors instead of assassinating bad guys and shooting lightening bolts from his fingertips. This is why the super-hero fellowship of the ring failed, and its weakest members carried on alone. Tolkien writes stories about the actions of little people who choose to do right in the face of really big reasons not to (and also powerful people like Saruman, who choose to go along to get along). I’ll say that again because it’s important: Tolkiens narrative is made up of little people who choose to defy impossibly powerful authoritarians simply because it’s the right thing to do. “Choose”, there’s another verb for the person pile.

Jackson by comparison is wholly a product of the collective. He lives in a world where the actions of hordes are important. This is just POUNDED into us throughout his films where again and again he undermines the actions of Tolkiens individual characters while taking the utmost care to recreate in detail every last pitched battle and skirmish, and even adding several to wit. Jackson’s view of history is a laundry list of wars, his view of philosophy would undoubtedly be described in so many -ists and -isms, and his understanding of literature has probably come from textbooks written by committee. He understands the ring bearer no more than he understands the cross bearer. He has no grasp of the power of an individual human intellect to shape world events by simply doing the right thing, and balks as a result, when he encounters it in LOTR.

Strong words, I know, and I could, at this point launch into a litany of examples, but this is already running long, so instead, lets consider a single example: that of what Jackson did to Faramir. You remember him, the brother of Baromir. Tolkien, in the appendix, describes him thus:

“He read the hearts of men as shrewdly as his father, but what he read moved him sooner to pity than to scorn. He was gentle in bearing, and a lover of lore and of music, and therefore by many in those days his courage was judged less than his brother’s. But it was not so, except that he did not seek glory in danger without a purpose.”

The most important account of his character we have through his own actions, when, in Return of the King, he is presented with an opportunity to take the one ring, and doesn’t even consider it, saying instead:

“But fear no more! I would not take this thing, if it lay by the highway. Not were Minas Tirith falling in ruin and I alone could save her, so, using the weapon of the Dark Lord for her good and my glory. No, I do not wish for such triumphs, Frodo son of Drogo”

Aristotle defined virtue as the ability to act in accordance to what one knows to be right. This is a pivotal moment in the series, one of the many times, but for the actions of virtuous, and well-reasoned individuals, it all would have gone to shit. Faramir has before him the ultimate temptation, a weapon of limitless power, but because he is a person, and has the cognitive ability to reason, he knows right – knows it like some mathematical axiom – and would no more let temptation override his mind than he would equate 5 to 2 and 2. Again, this is the meat of Tolkien’s narrative; it is here, in the actions of individuals that the world is saved. The battles and sieges are just dice rolls between moments like this one, for as long as the power of the individual intellect survives, there will always be hope, always another chance.

Hordes cannot reason, and they cannot know right. Hordes, by extension, cannot be virtuous, and so, in Peter Jackson’s world, actions like Faramir’s just don’t compute. Faramir, as a result is translated by Jackson into a jack-booted thug, and an imbecile, who, after torturing Smegul, snatches the Hobbits and proceeds to drag them back to home-base (why such a man wouldn’t just slit all of their throats and take the ring for himself is beyond me). Faramir then encounters a Jackson-invented skirmish at the river and is twice defiled when he allows an insipid (and winy) appeal to emotion from Samwise to override his course of action. Jackson’s Faramir is now not only an imbecile, but a sentimental weakling, and Jackson has little sympathy it seems for weaklings. In his defense, Jackson claims that he needed “an extra climax”, and that Faramir’s actions as written in the book “wouldn’t translate to movie audiences”.

I don’t doubt that Jackson got a climax from unceremoniously lobotomizing the virtuous Faramir; no doubt Voldemort, Darth Vadar, and Sauron himself would have experienced something akin to sexual arousal in that act, but it’s hardly a defense to prosecution in my court. As for the second justification, the elitist absurdity that people who watch movies can’t understand virtue. To that I can only wave my middle finger in the general direction of Hollywood…. … …. ….. OK, done. No, wait …. …… ……. Ok, done.

The truth is Peter Jackson undermines the contribution of individuals because he doesn’t understand the contribution of individuals. His mind has been broken by that turn of the century zeitgeist toward collectivization that I referred to earlier. If a person manages to take an action that changes the course of human history, then that person is an exception; some sort of old-world superhuman badass whom we weaklings should immediatly appoint dictator for life. Otherwise it must have been the result of a happy accident (and usually an accident contrary to the intention of the individual in question).

One almost feels bad for the guy. It must have been rough for such a man to adapt such a series to film. Badasses in abundance stand around in tree houses playing with pretty fountains and singing, while the actual narrative is constantly focused on the comings and goings of these insignificant weakling hobbits. Heros like Faramir who could grab the nukes and use them to end the war don’t and poor Jackson just can’t extract any meaning from it all.

Without fail every character who isn’t a super-hero, and sometimes even groups of characters who are (like the Ents) are undermined, marginalized, nerfed, or otherwise made into twisted, collectivist shadows of their Tolkien counterparts. Sometimes, as in the case of Gimli, whom Tolkien obviously intended to be a superhuman badass, Jackson gets confused, (Gimli is probably too short for Jackson to consider badass) and seems to react as if he’s encountered yet another of Tolkiens weaklings. In Gimli’s case, Jackson relegates him to the position of comic relief. But now I’m digressing into other examples.

What can I say of a man who reads a story (one hopes he did actually read it) that figuratively grabs us by the collar and screams in our faces that the magic is going away, that the mindless hordes are gathering, that our only hope for survival is the application of well-trained individual cognitive abilities – to reason and to seek virtue – and hands us back a movie wherein the meek of the world passively cower in wait for a team of superhero badasses to deliver them their fate? A movie where world peace is handed to us by benevolent emperors and happy accidents? Of the man I’m uncertain what to say; he is certainly a man of our times. I’ve done him a great service in assuming his shortcomings are the result ignorance and not malice. Can I objectively say that the work is worse or better? I think I can objectively say, at least, that it’s not the same story. For myself I’ll take the original, and keep a close eye on those who prefer the films.

]>
<![CDATA[kindle hate]> 2013-01-08T14:59:00-06:00 http://www.skeptech.org/blog/2013/01/08/kindle-hate Once upon a time, a man wrote a book. In fact, he wrote three volumes that comprised a book. The word ‘volume’ is uniquely apropos here; his volumes were as lengthy as they were deadly, by which I mean, when he undertook to write this work, he knew that people would die as a result of it being written. Good people. Innocent people. And die they did, slowly, painfully, and alone.

This work carries a weight that is as tangible physically as it is metaphysically. It is the permanent record of the mind of a genius, and also nothing short of the truth of humanity itself. When you read it, you will know the mind of a genius, and also you will be scarred by it, just as he is scarred. But even without reading a single word, you can hold it in your hand, and, thumbing across its pages, you can glean a sense of what an undertaking it was. You can close your eyes to its words, and as the pages fall past your hand you can see him in an oily lamplight, squinting at the page and scribbling away with a nervous ferocity. You can feel in the fall of those pages his terrified pauses, his looking over his shoulder at the softest hint of footsteps on the stairwell beyond his flimsily barred door. You can imagine him working thus, page by page, for several thousand pages.

It is a masterpiece, and I want it to exist. I want to interact with it, feel its weight, and breathe the must of its pages. When my Nephew is old enough, I want him to know that odor, and feel its burden. I want it to take up space and to be inconvenient. I want to be inconvenienced by it like we are inconvenienced by love, because I love it, and I want other people to feel it and carry it, and scribble annotations in it, and lend it to their friends, and beg for it back. I want them to spare a space for it in their home and in their lives like I have. A real, physical space.

When it all falls apart for me, and the things that surround me scatter to the winds, I hope these volumes find their way on to the shelves of a used book store, and I hope someone picks it up, and seeing my little notes in its pages, is delighted the same way I am delighted when I find such a book, with happy exclamations drawn in the margins, or a gift-givers note written in the blank space of the dedication page.

A book is more than the sum of its words and punctuation. It cannot be encompassed by bits on a memory medium any more than could a JPEG of the Mona Lisa encompass that work. If Kindle attempted this from a desire to make it more accessible, there would at least be a worthy goal behind its deadly vision. It does not. Kindle is exactly the opposite; the very implementation of a business plan dependent on making books less accessible. This is abhorrent in and of itself, but I think what really angers me about it is that the business model works because it pays lip service to convenience and cool-factor. That is why people buy it, why it will win, and why it’s SO sad.

edit 1/29/13: Evidently I’m not alone

]>
<![CDATA[New year, new blog]> 2013-01-01T00:01:00-06:00 http://www.skeptech.org/blog/2013/01/01/new-year-new-blog And here we go again. This will be my umpteenth attempt at establishing a place I can refer to as “home” on the interweb. But I’m quite convinced that this effort will succeed where the others have failed. After all,

  • I’m undertaking this as a new-years resolution, which, anyone will tell you, makes it’s sucess a veritable certainty.

  • I’m using octopress and git, both of which are made of win.

  • My publishers told me to get a better website, so I kind of have to.

So welcome to the new site. I’ll be moving most of my odds-and-ends here as time permits.

]>