About the Author

Chris Shiflett

Hi, I’m Chris: web craftsman, community leader, husband, father, and partner at Fictive Kin.


PHP Advent Calendar Day 24

Today's entry is provided by Nate Abele.

Nate Abele

Name
Nate Abele
Blog
cake.insertdesignhere.com
Biography
Nate Abele of OmniTI has been a core developer of the CakePHP web framework for over two years. He is known in some circles as the Johnny Cash (or "Man in Black") of the PHP community.
Location
New York, New York

Today, dear readers, I offer you no lofty words of wisdom, no dishwasher analogies, and no deep thoughts to ponder. What I can offer is a piece of simple, practical advice that you can use today, and some bits of code that I humbly submit to you below. Without further ado, let's begin.

Late last year, Chris and I (mostly Chris), came up with a clever defense-in-depth strategy for session security over a couple of beers. (We were sober most of the time; why do you ask?) The basic idea is that when dealing with session security, you want to do the best you can to ensure that the authenticated user you started talking to is the same user you're talking to now. You're probably familiar with at least some of the various forms of session attacks, so I won't bore you with the details.

There are all sorts of details in each request that you can potentially use to help be sure that you're still talking to the same user. Examples include the user's IP address and the User-Agent header. Hopefully you know that most of these details aren't reliable, because proxies and various other factors can change these things over the course of a legitimate user's session, causing you to potentially mistake a good guy for a bad guy. Common wisdom is that you can therefore never use such information.

The solution might not be black and white. If the user's IP address has been consistent over the last 50 requests, is it reasonable to assume that it's going to be the same for the next request? If it's suddenly not the same, then it seems reasonable to treat the request with a modicum of suspicion. To mitigate the risk, all you need to do is ask the user to re-authenticate. This can be as simple as prompting for the password again, a minor inconvenience for legitimate users, but a big hinderance to attackers.

In order to implement this simple system, you need two things:

  • A piece of information (key) you believe can be consistent, at least for a reasonably large portion of your users. (For example, an IP address.)

  • A threshold, after which you believe you can rely on this piece of information to be consistent in future requests.

The threshold is the important part; it's what makes this idea work. The threshold can be either a number of requests or an interval of time, but after this threshold has been met, the key is considered trustworthy, and any changes to its value will raise a red flag. Note that it's important to properly tune your threshold rules so that they're appropriate for your application, your users, and the piece of information you're tracking. Analyze your log files, and find and a sensible balance.

While the concept itself is simple, and can be implemented in any number of ways, I've created it as a component for CakePHP that you can use as follows:

<?php
 
class AccountsController extends AppController {
    var $components = array('Trending');
 
    function beforeFilter () {
        $this->Trending->track(array('User-Agent' => 20,
                                     'Net:!' => '+30 minutes'));
    }
 
    function index() {
        /* ... */
    }
 
    function edit($id = null) {
        /* ... */
    }
 
    /* ... */
}
 
?>

This example sets up two rules: one for the User-Agent header, and one for the IP address. In the context of domain-specific language I've constructed for these rules, : denotes a type of matching other than HTTP headers (e.g., Net), and ! means an exact match. So, this example only establishes a trend if the IP address is an exact match during a 30-minute period. For specifying thresholds, use strings to represent time (relative to now), and integers to represent a number of requests. So, this example establishes another trend if the User-Agent header remains consistent for 20 requests.

If either of these values change after the threshold is met, the trending component is going to to simply blackhole the request by sending back an error page and preventing any further execution. However, it has a callback property which can be used for more graceful handling, such as redirecting to a login page.

For reference, the current implementation is provided in Listing 1.

Listing 1:

<?php
/* SVN FILE: $Id: session_history.php 5112 2007-05-15 20:01:44Z nate $ */
/**
 * Define session history rules to mitigate session attacks
 *
 * Concept by Chris Shiflett
 *
 * PHP versions 4 and 5
 *
 * CakePHP(tm) :  Rapid Development Framework <http://www.cakephp.org/>
 * Copyright 2005-2007, Cake Software Foundation, Inc.
 *                                1785 E. Sahara Avenue, Suite 490-204
 *                                Las Vegas, Nevada 89104
 *
 * Licensed under The MIT License
 * Redistributions of files must retain the above copyright notice.
 *
 * @filesource
 * @copyright        Copyright 2005-2007, Cake Software Foundation, Inc.
 * @link                http://www.cakefoundation.org/projects/info/cakephp CakePHP(tm) Project
 * @package            cake
 * @subpackage        cake.cake.libs.controller.components
 * @since            CakePHP(tm) v 0.10.0.1076
 * @version            $Revision: 5112 $
 * @modifiedby        $LastChangedBy: nate $
 * @lastmodified    $Date: 2007-05-15 16:01:44 -0400 (Tue, 15 May 2007) $
 * @license            http://www.opensource.org/licenses/mit-license.php The MIT License
 */
uses('security');
 
/**
 * Allows the user to define historical tracking thresholds for trending commonly-repeated HTTP
 * request headers.
 *
 * Also allows for defining handling rules for when trends are violated.
 *
 * @package        cake
 * @subpackage    cake.cake.libs.controller.components
 */
class TrendingComponent extends Object {
/**
 * Components used by this class
 *
 * @var array
 */
    var $components = array('Session', 'Security', 'RequestHandler');
/**
 * Contains the rules...
 *
 * @var array
 */
    var $rules = array();
/**
 * After checks have been run, this holds any rules which have met the threshold but failed validation
 *
 * @var array
 */
    var $failures = array();
/**
 * A controller method to call should the validation fail
 *
 * @var string
 */
    var $callback = null;
/**
 * Sets trending threshold / violation handling rules for trending
 * request data
 *
 * @param array $rules An array of rules, keyed by HTTP header name (or other criteria)
 * @return void
 */
    function track($rules) {
        $_rules = array();
 
        foreach ($rules as $criteria => $rule) {
            if (is_int($criteria)) {
                $criteria = $rule;
                $rule = 20;
            }
            $rule = is_array($rule) ? $rule : array('threshold' => $rule);
            $key = $criteria;
 
            if (strpos($criteria, ':') === false) {
                $criteria = 'HTTP_' . up(r(array('-', ' '), '_', $criteria));
            } elseif (strpos($criteria, 'Net:') === 0) {
                /*
                 * Network class rules:
                 * Net:domain
                 * Net:subnet
                 * Net:{3}.{3}.{~20}.* (Where * is a wildcard match, and ~20 represents a range [+/- 20])
                 *
                 * The only rule actually defined so far is "!", which means an exact match
                 * @todo Finish the parsing/matching code for network class rules
                 */
            } else {
                // Other types of rules
            }
            $rule['key'] = $key;
            $_rules[$criteria] = $rule;
        }
        $this->rules = $_rules;
    }
/**
 * Checks rule definitions setup in track(), and manages increments/removals for
 * rules below the threshold.
 *
 * @param object $controller
 * @return void
 */
    function startup(&$controller) {
        $this->failures = $this->incrementAndInvalidate();
 
        if (!empty($this->failures)) {
            if (!empty($this->callback)) {
                $controller->{$this->callback}();
            } else {
                $this->Security->blackHole($controller, 'login');
            }
        }
    }
/**
 * Increment the token for each request rules
 *
 * @return void
 */
    function incrementAndInvalidate() {
        $failures = array();
 
        foreach ($this->rules as $header => $rule) {
            $cur = $this->_config($header);
 
            if (strpos($header, ':') === false) {
                $hashValue = Security::hash(Configure::read('Security.salt') . env($header));
 
                if (isset($cur['value']) && $this->thresholdMet($header)) {
                    if ($cur['value'] != $hashValue) {
                        $failures[] = $header;
                    }
                } else {
                    $cur = $this->increment($rule, $cur, $hashValue);
                }
            } else {
                list($type, $match) = explode(':', $header, 2);
 
                switch ($type) {
                    case 'Net':
                        $value = $this->RequestHandler->getClientIP();
 
                        if (!isset($cur['value'])) {
                            $cur['value'] = $this->_getNetworkAddressPattern($match, $value);
                        }
 
                        if (preg_match($cur['value'], $value)) {
                            if (!$this->thresholdMet($header)) {
                                $cur = $this->increment($rule, $cur, $value);
                            }
                        } else {
                            if (!$this->thresholdMet($header)) {
                                // Reset the count and the network pattern
                                $cur = $this->increment($rule, null, $cur['value']);
                            } else {
                                $failures[] = $header;
                            }
                        }
                    break;
                }
                // Implement other counter checks here
            }
            $this->_config($header, $cur);
        }
        return $failures;
    }
/**
 * Returns true if the threshold for a rule has been met
 *
 * @param string $rule
 * @return boolean
 */
    function increment($rule, $state, $value) {
        $state = is_array($state) ? $state : array();
        $state['value'] = isset($state['value']) ? $state['value'] : $value;
 
        if (is_int($rule['threshold'])) {
            $state['count'] = ($state['value'] == $value) ? $state['count'] + 1 : 0;
        } else {
            if (empty($state['count'])) {
                $state['count'] = strtotime($rule['threshold']);
            }
        }
        return $state;
    }
/**
 * Returns true if the threshold for a rule has been met
 *
 * @param string $rule
 * @return boolean
 */
    function thresholdMet($rule) {
        if (!isset($this->rules[$rule])) {
            foreach ($this->rules as $key => $val) {
                if ($val['key'] == $rule) {
                    $rule = $key;
                    break;
                }
            }
        }
 
        if (!isset($this->rules[$rule])) {
            return false;
        }
        $config = $this->rules[$rule];
        $cur = $this->_config($rule);
 
        if (!$cur['count']) {
            return false;
        }
        $base = strtotime(date('Y') . '-1-1');
 
        if ($cur['count'] < $base && $config['threshold'] < $base) {
            // This is a request-count-based threshold
            return ($cur['count'] >= $config['threshold']);
        } else {
            // This is a time-based threshold
            return ($cur['count'] < strtotime('now'));
        }
        return false;
    }
/**
 * Returns the stored session state for a given threshold rule
 *
 * @param string $key A rule name array key
 * @param array $state An optional state to save to the session
 * @return mixed
 */
    function _config($key, $state = null) {
        $key = r(array('.', '!', ':', '{', '}', '(', ')'), '_', $key);
 
        if ($state === null) {
            $state = $this->Session->read("Session.History.{$key}");
            return (empty($state) ? array('count' => 0) : $state);
        } else {
            return $this->Session->write("Session.History.{$key}", $state);
        }
    }
/**
 * Generates a regular expression from a network address and a match pattern
 *
 * @param string $pattern
 * @param string $base
 * @return string
 */
    function _getNetworkAddressPattern($pattern, $base) {
        $result = '.*';
 
        if ($pattern == '!') {
            $result = preg_quote($base, '/');
        }
        return '/' . $result . '/';
    }
}
 
?>

Future plans for the component include scoping by network address class and additional flagging options, but there are plenty of other things you can do such as integrating an IP-to-coordinate system, and making the threshold geography-based. There are numerous of other possibilities as well, but the important thing is to start thinking more broadly about your security strategies. Find patterns in your data and learn to use them to your advantage.

Happy holidays!

About this post

PHP Advent Calendar Day 24 was posted on Mon, 24 Dec 2007. If you liked it, follow me on Twitter or share:

6 comments

1.Nate Abele said:

You can tell this was edited. If I wanted to do a holiday-themed sign-off, I'd have at least said something politically-incorrect like "merry Christmas".

:-P

Tue, 25 Dec 2007 at 06:07:38 GMT Link


2.Kaloyan Tsvetkov said:

That's a very interesting concept. There are some areas that I want to question about and discuss.

Let's start with the fact that I have almost no CakePHP experience ;) I understand that the above article and code is more "proof-of-concept" type of thing. Anyway, looking at the sample code above it seems to me that the AccountsController::beforeFilter() method will be called each time an action from this controller is called (regarding my lack of CakePHP knowledge I hope I am right). This method calls TrendingComponent::track(), passing the threshold arguments. In it there is a piece of code that swaps the key and value of the rules array if the key(the criteria) is an integer value. I can see that helps absent-minded developers, but I also see it as a waste, which slows down the execution of the script. Isn't it a better policy to raise an exception or throw some error in order to hint that the argument (or an element of it) is not correct ?

Another thing that I think slows down the execution of TrendingComponent::track() is the way the rules are presented. They are strings that are parsed, in order to extract the "real" criteria out of them - like transforming User-Agent into HTTP_USER_AGENT. This transformation seems "expensive" to me, because there are several functions stacked on extracting the final results that do all sort of string operations on the original string. Isn't it better just to pass 'HTTP_USER_AGENT' as argument in the first place, and skip all the extra checks and transformations ? This will surely make the tracking method run faster, and the overall performance of the AccountsController actions will be better. BTW, I can tell the same for parsing the "period" rules - it will be more suitable if the periods are presented in a PHP-native way, instead of providing human-friendly string, that has to be parsed in order to extract its reall value. You do not need another layer of "syntax" on top of PHP to pass the criteria - because in the end, this "syntax" is again transformed into some form of "php-native" data, which is used to apply the real tracking criteria.

Tue, 25 Dec 2007 at 10:41:14 GMT Link


3.Nate Abele said:

Hi Kaloyan, thanks for the questions; I'll do my best to address them. The "swap" that appears to take place actually has nothing to do with fixing parameters which are "incorrect". What it actually does is allow you to pair each rule with an array of options, rather than a single threshold value, making the API flexible enough for future use-cases and modifications.

As far has having to parse special syntaxes, in the case of the network rules, I started implementing a small DSL because it's the best and most flexible way to specify how strict you want to be in matching IP addresses or host subnets. Based on the rules used, a custom regular expression will be generated using the current remote host setting.

As far as the other string transformations, the reason for that is that I prefer more readable, human-friendly code. It makes my development experience more enjoyable, and, as they say, processor cycles are cheaper than developer cycles.

Now, I know PHP-land is full of people who'd try to debate me on this point ad infinitum, and you're entitled to your opinion, I simply disagree. :-) If a case-change and a string replace seem "expensive" to you, then might I suggest implementing your next project in C?

;-)

Wed, 26 Dec 2007 at 22:06:03 GMT Link


4.Morgan Tocker said:

I think the way you describe having a certain threshold is just another example of a Bayesian filter.

I wonder if someone could write a library to factor a bunch of these factors in; geographic location[1], user agent, operating system. Maybe you could even factor in login time if you deemed it of statistical relevance.

The key point is it needs to be self learning. In my case, I change geographic location because I could be coming from home (Canada) or either of the two VPN networks I login from (California and Sweden). I think gmail understands that these three locations "fit my profile", but coming from other locations, and I have been required to re-authenticate.

[1] You'll notice I didn't say IP address, but that could be a actor as well. It's easy enough to convert an IP address to Latitude/Longitude with maxmind Geoip.

Fri, 28 Dec 2007 at 05:32:53 GMT Link


5.Joseph Crawford said:

Chris and I had this discussion a while back and I created an article about it, it is not geared towards cake and in no way is the code as nice as the above but it shows how to handle this outside of a framework.

You can see it here

http://josephcrawford.com/php-artic...e-php-sessions/

I just noticed all the html entities and have to fix them ;(

Wed, 09 Jan 2008 at 01:38:38 GMT Link


6.Shade said:

Morgan,

I've read recommendations to create a hash of IP address, USER_AGENT, and so forth, presumably to keep the storage requirements fixed; I didn't like those, though, because what if one value changed? The whole hash would. My solution was to attach a multidimensional array to each session, with the first index naming "IP", "USER_AGENT", and so on, and then, within those secondary arrays, allowing a certain number (perhaps more than one) of actual (string) values. If a new value showed up with the user's connection and couldn't be found in the array, a check would follow for whether there was room to fit another; and if not, Bad Thingsââ€ΕΎÂ¢ would happen.

It would be trivial to preload sessions of known users with approved values; slightly less trivial, but still not much work, to un-hardcode the approved number of different values and instead place that number in the beginning of each array. How you code the "self-learning" to discover those values is up to you :)

Mon, 13 Apr 2009 at 19:02:16 GMT Link


Hello! What’s your name?

Want to comment? Please connect with Twitter to join the discussion.