Which Oar Best Rows The Boat

Ted Dzuiba wrote a post that boiled down to identifying his own “language bigotry” as a step along the way to software engineering “Mastery”.

He’s absolutely correct about introspecting why one chooses to “fight” other languages. And by saying “languages” I could substitute OS platform, cloud computing, database manager, code editor, car manufacturer, religion,  Constitutional Rights, regional sports team, patriotism, etc.

We identify ourselves by our passions and beliefs.  As a means of vetting prospective employees, I have long made a habit of asking candidates “what are you?” The answers over the years include things like “mom”, “Catholic”, “firefighter”, “LARPer”, “Mac geek”, “soldier”, “Cisco network engineer”, “hunter”, “professional juggler”, “Red Sox fan”, “clergy”, “MySQL admin”, “mountain climber”, “Chevy nut”, “Java programmer”, and on and on.  When someone identifies themselves as a something you can pretty safely assume some other things: A “Red Sox fan” most likely loves baseball, hates the Yankees, also likes the Patriots [American] football team, and is located or grew up in the New England region. A “hunter” most likely feels strongly about their Second Constitutional Amendment Rights, votes centrist or right on the political scale, dislikes leftists, also enjoys the outdoors, probably fishing, and most likely desires to live somewhere more rural than more urban. Everything there is a prejudice for or against something, and it’s perfectly natural. (And yes, it’s stereotyping: that’s how we templatize correlated attributes, please carry on)

The key to finding a good employee is sussing out which of those will get in the way of delivering business value. The key to being a good employee, is introspection about why you have those feelings and if they’re relevant. As Ted says perfectly (I emboldened his italics):

I feel orders of magnitude more useful delivering business value than I feel delivering code.

A clutch statement right there – it’s not about the tools, it’s about the product and its value. I’m not advocating, and I doubt Ted is either, that any intelligent engineer should just clam up and do as their told – if there is value or purpose to moving against the grain, then speak up! Your value as an engineer is only partially what is produced, and constructive engagement debating how something is produced is exceptionally important as well. But know when your beliefs- your prejudices- are pointlessly in the way. Introspect on why.

My most recent position requires the use of an Apple Macintosh (gross), is completely hosted using Amazon Web Services (ick), systems scripts almost entirely written in Ruby (*twitch*): all of which I knew in advance of accepting the position. I don’t prefer a single one of those things, but would it have been a better position if they used HP laptops running Linux, had a hardware server farm, and churned out klocs of Perl? Nope. Would it be a better business? Not at all. So I use a Mac for hours a day, architect in concert with AWS’s strengths and weaknesses for hours a day, write Ruby for automation as needed, and am perfectly at peace with all of that.

Sure, I still run Linux on my own hardware, remind my colleagues that if we ran our databases on hardware vs. “the cloud” we wouldn’t have “that” problem (for whatever the problem might be), and occasionally sneak in a quick Perl script that would have taken three-to-six-times as long for me to write in Ruby (not necessarily Ruby’s fault) – but not at the cost of value.

A bad workman complains about his tools. A craftsman works with what is at hand.

Posted in Architecture, Coding, Life, Opinions | Tagged , , , , | Leave a comment

Ruby String.each

While I find Ruby to be a half-assed attempt at an object-oriented Perl, I have been using it quite a bit lately to stay consistent with a lot of existing intellectual property. One of the more maddening things is that somewhere along the way, within the 1.9.x series, the Cardinals of Ruby decided to remove the “each” method from the String object. While logically inconsistent, this method allowed one to create a function that iterated over an array, or if the item passed was a String, iterate over that one item, without extra code to detect if it was “only” a String and handle it differently. Add the below to your rb or include it in a file to get that feature “back”.


class ::String
  def each(&block)
    Array(self).each(&block)
  end
end

*sigh*

Posted in Coding, Linuxy, Rants/Tirades | Tagged , | 1 Comment

MySQL com_select Nugget

The com_select counter isn’t a raw count of how many SELECT operations the server has performed, but rather the number of SELECT operations that did not get returned from the query cache. To see the real number of SELECTs (assuming query caching is on), you need com_select + qcache_hits.

This is in MySQL’s documentation, but I thought I’d share.

Posted in Linuxy, Work | Leave a comment

SPDY for Apache

Mod_SPDY for Apache is out. If you don’t know what SPDY is, I’d recommend some light reading … or heavy reading if you’re that kind of person.

 

Posted in Architecture, Linuxy, Products | Leave a comment

NConf 1.3.0 Pass-through HTTPD Auth

If you’d like to use NConf, but want your HTTPD, e.g. Apache, to do the auth for it, apply the below patch to set the NConf user to the currently authenticated user.

--- include/head.php.orig    2012-04-03 19:34:13.774594705 +0000
+++ include/head.php    2012-04-03 19:21:39.470169672 +0000
@@ -70,7 +70,12 @@
 }else{
     // NO authentication
     $_SESSION['group'] = GROUP_ADMIN;
-    $_SESSION["userinfos"]['username'] = GROUP_ADMIN;
+    # M@
+    if( isset($_SERVER['REMOTE_USER']) ){
+    $_SESSION["userinfos"]['username'] = $_SERVER['REMOTE_USER'];
+    }else{
+        $_SESSION["userinfos"]['username'] = GROUP_ADMIN;
+    }
     message($debug, 'authentication is disabled');
     message($debug, $_SESSION["group"].' access granted');
 }
Posted in Coding, Linuxy, Work | Leave a comment

find -delete

If I hear one more person recommend using a pipe to xargs or -exec rm -f {}  to the question “how can I make ‘find’ delete the files it finds?” I’m going to scream. It’s really simple:

find /wherever -mtime +7 -type f -delete

That’s it. Nothing to remember. No shelling (exec) or piping. Real easy. Real fast.

Posted in Coding, Rants/Tirades, Work | Tagged , | Leave a comment

Video Captcha Prior Art

Nucaptcha claims to have invented video captchas. They didn’t. Neither did I, proabably, but I have talked about them publicly a few times, including this blog post from 2009.

Posted in Architecture, Life, Products | Tagged , | Leave a comment

Caching Functions In Perl

Synopsis

There are occasions, where you write a function that takes some parameters and outputs consistentish data. Example:

sub add {
    my($first,$second)=@_;
    return $first + $second;
}

If you call add(1,1) you get 2: always. Consistent input yields consistent output. Now let’s say you have something much more complex, possibly slow, that will possibly be called over and over again in the same program. Yes, we could create a hash table of value and before calling the function, we check the hash table… But that’s not why we have machines to work for us, and is certainly not why we have Perl.

Lesson 1 – Basics

By using three neat tricks of Perl- The symbol table, anonymous functions, and variable scope – we can do that work behind the scenes, and let our calling code be oblivious to whether they’re getting “live” data, or “cached”.

Symbol Table

The symbol table lets us mess with things. Take the below two functions:

sub a { return "a"; }
sub b { return "b"; }

For whatever reason, we’ve decided that everything in our program that calls function “a”, should actually be calling function “b”. Perhaps we don’t have access to the programs making the calls. Perhaps we’re just awesome. Either way, the solution is simple:

sub a { return "a"; }
sub b { return "b"; }
*a = &b;

So you call &b and get “b”. You call &a and get “b”. Magic. We’ll be back to this.

Anonymous Functions

Perl allows you to do some pretty powerful things. One of those things is the very simple idea of an anonymous function. An anonymous function isn’t called traditionally, but rather crammed into a variable and possibly passed around to other functions or whatever. Example:

my $afunc = sub { return "a" }

In this case $afunc, a variable, now contains a reference to the code declared by the sub, so we can pass it around as any other variable, and invoke it ala:

$return = &$afunc; # $return eq "a"
$return = $afunc->(); # $return eq "a"
$return = $afunc; # $return now contains a copy of the function
                  # that $afunc contains

Variable Scope

To many people, Perl’s scoping logic is maddening. I’m not going to talk about it at length here, but rather show a very simply example that is important for this exercise.

my $var = "x";
{
  print "$var"; # x
  $var = "y";
  print "$var"; # y
}
print "$var"; # y

If you’ve programmed in any programming language that has blocks, this makes total sense, and you’re wondering why I even bring it up.  Let’s change this slightly:

my $var = "x";
$xfunc = sub {
  print "$var"; # x
  $var = "y";
  print "$var"; # y
}
print "$var"; # x

We’ve now turned that block into an anonymous function. The first time we call that function, we get the expected output of “xy”. Thereafter, we get “yy” because the function doesn’t have a copy of $var, rather it is using the parent’s $var. So what happens if we pass $xfunc as a variable to some other function miles away where $var doesn’t exist? It still references the existing $var which will exist in memory until nothing else in memory is pointing to it. Consider $var our cache: This is critical to how I will implement this cache.

Lesson 2 – A Simple Cache

So we know how to mess with the symbol table, compose anonymous functions, and how variable scope can be perverted to our needs. So how do combine those into a function that creates caching functions?

sub cache_it {
  my $func = shift;
  my %cache;

  $afunc = sub {
    my $key = join ',', @_;

    unless(exists $cache{$key}) {
      # We don't have a cached value, make one
      my $val = $func->(@_);

      # ... and cache it
      $cache{$key} = $val;
    }

    # return the cached value
    return $cache{$key};
  };
  return $afunc;
}

So this function takes a function reference as a value, and returns a caching anonymous function.  It’s cache is a hash that is declared in its parent’s scope, and thus will outlive individual invocations of the anonymous function itself.

We key the cache by joining the arguments with a character, in this case a simple comma. We check the cache to see if there is a matching entry, if so we return it. If not we call the real function that we’ve stored anonymously, and store its return values in the cache. In all cases, we return the cached version of the results.

So, when we have a function we want to make a caching version of, it’s as easy as:

sub myfunc {
  my $thing1=shift;
  # Do something complicated with $thing1, and generate
  # consistent result $thing2
  return $thing2;
}

*myfunc = cache_it(\&myfunc); # Magic

That’s it.  Anywhere in your code you call myfunc, you’ll now be calling the caching version. If you’re having problems and need to debug, just comment out that one little line starting with the asterisk, and your code will call the real myfunc “as usual”. It’s that easy.

Lesson 3 – A Better Cache

I don’t use the cache I wrote above. I tend to write long-living infrastructure code that caches or micro-caches result sets that need to expire or that I need to control. Take the above function and adding item expiration and some simple commands is pretty easy, however.

sub cache_it {
  my ($func,$life) = @_;
  my %cache;

  $afunc = sub {
    my $key = join ',', @_;
    my $now=time();

    if($key eq 'CLEARCACHE') {
      %cache=();
      return;
    }

    # Should the cache be cleaned?
    if($key eq 'CLEANCACHE') {
      foreach my $ckey (keys %cache) {
        delete $cache{$ckey} if $cache{$ckey}->{expires} < $now;
      }
      return;
    }

    unless(exists $cache{$key} and $cache{$key}->{expires} >= $now) {
      # We don't have a cached value, make one
      my $val = $func->(@_);

      # ... and cache it
      $cache{$key} = {
           value => $val,
           expires => $now+$life,
      };
    }

    # return the cached value
    return $cache{$key}->{value};
  };
  return $afunc;
}

And when we call it:

*myfunc = cache_it(\&myfunc,300); # Magic

Where the second parameter is the number of seconds we want a cache item to stay valid. The anonymous function our new cache_it function returns now does three things:

  1. If you call the function with the parameter “CLEARCACHE”, it will empty the entire cache
  2. If you call the function with the parameter “CLEANCACHE”, it will purge cache items that have expired.
  3. If you call the function normally, it will save both the results and the current timestamp into the cache entry, and call the real function if the results in the cache are longer than the life you specified when you created it.

Use

I’ve used very similar caching technology in lots of places:

  1. Anagram generator (or any recursive, consistent function, like Fibonacci or gcd)
  2. DNS resolver
  3. Database queries
  4. Surge-handling in web apps (you can “turn on” that caching when you start surging, and “turn off” when request load goes down)
  5. Smart logging functions (inverse cache – don’t bother logging something if you’ve logged it already)
  6. Distributed caching infrastructure

Prove It

Fibonacci numbers are fun:

sub fib {
    my $n = shift;
    return $n if $n < 2;
    return fib($n - 1) + fib($n - 2);
}

So this will make 2 calls to itself for every one call where n>=2, but by remembering what fib(2),fib(3),fib(4) etc are, it won’t have to recompute them

[root@mlap ~]# perl ./x 40
    102334155 : 146
[root@mlap ~]# perl ./x 40 cache
    102334155 : 0

The first is a run of the above fib function on a value of 40. The ‘answer’ is 102334155, and it took 146 seconds. The second is a run of the above fib function that has been cached using a cache method very similar to the “simple cache” in Lesson 1. The ‘answer’ is 102334155, and it took <1 second.

Gotchas

This is not a magic bullet. If your function doesn’t return consistent data given a consistent parameter list, or if your function affects something else that will be missed by returning cached results, this isn’t your solution.

The function above assumes that the function it is capturing returns a single scalar (which could be a reference). If the cached function returns an array or a hash, the cache function will not work as expected. This is very easy to fix (4 lines of code), but I wanted to point that out before you get overly frustrated.

 

Posted in Architecture, Coding, Linuxy | Tagged | Leave a comment

Sorting Strings Ending In Numbers In Perl

Synopsis

I deal with a lot of names that look like “somedumbserver2″ and “somedumbserver15″. Using Perl’s default sort, “somedumbserver2″ comes before “somedumbserver15″ because the character “2” is greater than the character “1”, and that’s where the sort stops.  This sort, “snsort” or “string-number sort”, is a little wiser, and all else being equal compares the number “2” to the number “15”, and properly orders it. It’s also safe to use if some strings don’t conform to this pattern.

I wrote this to be a Vmethod to list objects in Template Toolkit, so I kept the surrounding code intact in case anyone wants to use it as-is, but “sub sns” can be taken out and put in any Perl program, and used in conjunction with the Perl “sort” function.

Code

$Template::Stash::LIST_OPS->{ snsort } = sub {
        my $list = shift;
        return sort sns @$list;
        
        sub sns {
                my @aparts=$a =~ /^(\D+)(\d+)$/;
                my @bparts=$b =~ /^(\D+)(\d+)$/;
                
                return $a cmp $b unless @aparts and @bparts; # Safety
                return $aparts[0] cmp $bparts[0] unless $aparts[0] eq $bparts[0];
                return $aparts[1] <=> $bparts[1];
        }
};
Posted in Coding, Linuxy | Tagged , , | 1 Comment

Fixing Mis-cased URIs Under Apache

Synopsis

This is rather old code, but saved my bacon more than once.Runs under Apache with Mod_Perl, and corrects the URI requested when it is giving lazily. Thus a request for “/INDEX.HTML” is rewritten to “/index.html” as appropriate.

Code

=head1 NAME

M::Apache::fixcase - Want to fix case, showing the user the error of their ways?

=head1 SYNOPSIS

PerlModule M::Apache::fixcase

<VirtualHost>

PerlTransHandler M::Apache::fixcase

</VirtualHost>

=cut

package M::Apache::fixcase;

use Apache2::Const qw(DECLINED);
use Apache2::RequestUtil ();

sub handler {
        my $r = shift;
        my $file=$r->document_root() . $r->uri();
        unless(-e $file) {
                # File doesn't exist, let's try to find it!
                my @uribits=split(/\//,$r->uri());
                shift @uribits; # shift off the beginning ''
                my $newuri=$r->document_root(); # $newuri is the uri we're building
                my $sofar=0;
                foreach my $bit (@uribits) {
                        $bit =~ s/\(|\)|\`//g; # stupid url tricks
                        $sofar=0; # reset sofar, so we know if we're still on track
                        opendir(FD,$newuri) or last;
                        foreach my $dthing (readdir(FD)) {
                                if($dthing =~ /^\./) { next; } # safety first;
                                if($dthing =~ /^$bit$/i) { # case-insenstive pattern match
                                        # We have a match!
                                        $sofar=1;
                                        $newuri .= "/" . $dthing;
                                        last;
                                }
                        }
                        closedir(FD);
                        unless($sofar) { last; } # we missed this bit, don't bother recursing further
                }
                if($sofar) {
                        # We made it!
                        my $dr=$r->document_root();
                        $newuri =~ s/^$dr//; # strip off the document_root from the new uri
                        $r->uri($newuri); # Set the uri to the new uri
                } # else we can't do anything...
        }
        return DECLINED; # Always pass it on
}

1;
Posted in Coding, Linuxy | Tagged , , | Leave a comment