About the Author

Chris Shiflett

Hi, I’m Chris: entrepreneur, community leader, husband, and father. I live and work in Boulder, CO.


PHP Advent Calendar Day 13

Today's entry, provided by Terry Chay, is entitled Filter Input; Escape Output: Security Principles and Practice.

Terry Chay

Name
Terry Chay
Blog
terrychay.com/blog/
Biography
When Zend puts your face on a trading card, you've either arrived in the PHP world, or you're a terrorist. Terry Chay is a PHP terrorist. Being the software architect of Tagged pays the bills. When he isn't saying politically incorrect things about web development, he is in ur Web 2.0 event, eating ur lunch, taking ur photos, and fighting off ur Ruby developers with his mad ninja coding skillz. He also likes to "draw the line at yellow."
Location
San Francisco, California

One of the strangest things about living in the Bay Area is the total lack of PHP support groups. We probably have the largest density of PHP developers (skilled and unskilled) in the world, and yet finding 50 people willing to go in on a shipment of elePHPants turns you into a rock star out here.

So when there was a San Francisco PHP Meetup, I had to go. The topic this month was security. In light of that, I thought I'd use my Advent calendar entry to show that I'm not just a front-end Ajax developer guy or a PHP design patterns guy. I can also roll with the big boys and talk about web app security.

Terry and Security, the Real Oxymoron

Those who have heard my latest talk, The Internet is an Ogre, might find my choice of topic a bit ironic. After all, I am the one who said, "Web security is a luxury," and:

For any of you think good coding, design aesthetic, or web security are important, I have only one word for you: MySpace.

If there is one thing I have learned from blogging, it's that when you say outrageous things, people listen, and a few actually believe you. And, if the claim has the added bonus of being possibly true, that's just gravy. (Even if you're full of shit, people will just say you're perceptive; nobody has the balls to call you out on it, except people on Slashdot, and everyone knows they're like a stopped clock, only right twice a day.)

Besides, conferences are not fun unless you can get a rise out of Chris Shiflett, Ed Finkler, and Ilia Alshanetsky at the same time.

The truth is that security questions are some of my favorite interview questions. I'm going to cover what I started ranting about at the San Francisco PHP Meetup: these interview questions, how I'd answer them, and how this applies to my understanding of PHP and web application security.

No candidate has ever answered all these questions correctly. A coworker observed one of my interviews and said afterward, "When you asked those security questions, I thought [the candidate] was actually going to cry." I have had many headhunters complain about my interview questions to my boss.

Practice Comes from Principles

A lot of you are asking, "What book should I buy?" I recommend Chris Shiflett’s book (Essential PHP Security). (If you are too poor to buy the book, just visit the old PHP Security Guide instead.) Why? Because it is impossibly small.

Web app security is both really simple and an infinite mass of shit. If you start with an ad hoc approach, it will seem to only be the latter; but, if you take to the time to learn the building blocks which form the language of security principles, then it starts to all make sense and become the former.

Books like this, by being small, focus on the vocabulary and principles without drowning you in detail. I want you to take the time to learn this language. If you don't have the vocabulary, then you can't do web app security. Now, onto the interview questions.

Question: What is an SQL Injection Attack? Give Me an Example.

SQL injection is a vulnerability that allows input to manipulate the format of an SQL query, causing unwanted SQL to be executed by the database.

Most candidates get that part, but the second part trips about half of all candidates.

Little Bobby Tables

The basic points I'm looking for are:

  • Have you ever thought like an attacker? If you can't think like an attacker, you can't think like a defender; if you can't create exploits, you can't defend against them.

  • Does your exploit include a basic escape sequence?

  • Does your exploit inject data?

Bonus points if you can explain why PHP's MySQL extension isn't vulnerable to the xkcd exploit.

Question: Name Three Safeguards Against SQL Injection. For Each, Explain Where You Use It.

The key to answering this question is to understand that the nature of the attack is often focused on a quote mark. So, the solutions are to remove the quote mark, escape the quote mark, or use a built-in feature to protect against the quote mark.

In other words:

  1. You can filter the quote mark on input.

  2. You can escape the quote mark, using something like mysql_real_escape_string(), just before output (to the database).

  3. You can use a prepared statement, if your database supports it. (PDO emulates prepared statements if the database doesn't support them. Both the database and the extension must support prepared statements, else what you are doing is just an abstracted version of the second answer.)

Almost every candidate can give at least one, although that's not quite fair, since they have their DB interview first; they can learn about prepared statements there. Many candidates get all three with a little guidance!

Bonus points if you mention mysql_real_escape_string() and more if you explain the difference between it and addslashes(). This has never occurred, so I'm not too sure how I'd feel if someone pointed it out.)

Very few people know where to implement these safeguards. I'm hoping the new filter extension changes this. A number of people have argued with me about the correct place to filter input and escape output. Many are competent Perl developers. You'll better understand why they can't accept their mistake later in this entry.

Question: Which Safeguard Against SQL Injection Is Best?

This is a trick question. My answer is that I filter input and use prepared statements. (I escape output if prepared statements are not available.) There is no single best approach, although I give props if you assert that prepared statements are better than the alternatives.

Why? That's just good security!

Security is not a impenetrable wall. It is a decently-sized wall, with a moat in front of it, a mountain surrounding it all parts but the entrance, and a good number of guards on the battlements.

I can tell stories for hours about people who haven't understood this principle and have paid the price. But, in this case, it's easier to ask you about the following cases:

  • What if the data is a person's name, and he's Tim O'Reilly?

  • What happens if, at a later date, you decide to use the data in a different context (such as the filesystem, memcache, or HTML) prior to or instead of storing it in the database?

  • What if you migrate from MySQL to SQL Server, which has a different method for escaping? (Using '' instead of \' to represent an escaped single quote.)

  • What if you migrate to a data store that doesn't support prepared statements?

Things change. The principle here is if the security protocol is inconvenient, always implement it as early as possible in the application flow. More on this later.

Question: Create a Single Audit Point for Injection Attacks.

People always miss the earlier questions, so I never ask this anymore. I'm tired of having headhunters talk shit about me behind my back to everyone in the Bay Area.

My answer would be a Data Access pattern. If you're a framework guy, you can use a persistence layer like ActiveRecord to abstract yourself entirely from the database, because, apparently, LEFT JOIN is just too damn hard for you.

Given how many people fail this series of questions in interviews, I'm inclined to agree.

Question: What Is Cross-Site Scripting (XSS)? Cross-Site Request Forgery (CSRF)? Session Fixation? Give Me an Example of Each.

The reason I ask for examples is to give you the opportunity to apply this stuff in practice. To think like an attacker shows real, practical knowledge beyond the simple theory. Besides, the principles come from the practice.

Wikipedia has decent definitions of XSS, CSRF, and session fixation.

I'll confess that the only reason I ask about session fixation is because I occasionally meet a candidate who can regurgitate XSS and CSRF descriptions, and the sadistic side of me wants to see if I can break them. Asking about session fixation is like asking how to laugh in hexadecimal. (Answer: 48 41 48 41.) It's an obscure vulnerability known by us old-timers that is easily corrected and fun to lord over people.

CSRF is especially important because of the abundance of Ajaxified web sites. But, the one I'm really interested in is XSS, because I focus on it later in the questioning.

Question: Explain How the MySpace Worm Works. Give Me an Example that Uses CSRF to Determine the Login State on a Remote Site.

I don't typically ask these questions, but some of the more belligerent candidates bitch about the previous series as being too pedantic and "just about terminology." This dismissal is the interview equivalent of "I'm not really into Pokémon." Do they think I ask these questions for fun?

You Better Be Into This Pokémon

If you can't answer the above two questions, figure them out on your own. Understanding how to answer these is how you'll develop a zen-like ability to quickly understand security vulnerabilities and be able to build security practices once you've taken the time to understand these simple security principles. (In this case, you must combine XSS, CSRF, and Javascript exceptions.)

Question: What Does "Filter Input; Escape Output" Mean? Give Me an Example of Each.

You can reference Wikipedia’s definition, but put simply, it is the principle that filtering should be done as soon as data enters, and escaping should be done just before it exits.

Even if people can intuit that or have already heard it (surprising few candidates have, but most can guess), many haven't contemplated what filtering and escaping really mean. That's why I ask the other questions.

You can apply this principle to SQL injection and XSS (and any other injection attack):

  • Filtering can help protect against SQL injection by removing all single quotes. (This isn't foolproof, because not all SQL injection attacks require a single quote.)

  • Filtering can help protect against XSS attacks by removing HTML tags with strip_tags(), ensuring the data adheres to a specific pattern with regular expressions, or using a combination of HTML normalization and a DOM walker that implements a whitelist or blacklist filter. These techniques remove the <script> tags as well as injections into CSS. (You may need a CSS parser unless you strip out all style attributes and <style> tags.)

  • Escaping can help protect against SQL injection by maintaining the distinction between the SQL query and the data. Use something like mysql_real_escape_string() or prepared statements.

  • Escaping can help protect against XSS attacks by maintaining the distinction between the HTML and the data. Use htmlspecialchars() or htmlentities(). Both will do things like replace < with &lt; but htmlentities() does a bit more if you know the output is HTML and not XML. (Be sure to match the character encoding of your Content-Type header.)

It might have helped if we had called it encoding instead of escaping, but we don't, so deal.

From this, the rest follows.

  • Filter on input to adhere to the practice of implementing as many security safeguards as possible as early as possible. When data enters the application, filter it first. (Some candidates misinterpret this as filtering in the client side code. That's easily avoided by the most inexperienced attackers. They forget that they're PHP developers, not front end developers; we are talking about input into the PHP application.)

  • Escape on output, because escaping functions are going to be different depending on where the data goes. If it goes to a MySQL database, you use mysql_real_escape_string(). When using it as an argument on the command line, use escapeshellarg(). When sending it back to the user in HTML, use htmlentities()

    .

These principles apply to XSS just as they did to SQL injection:

  • What if you want the HTML to be output as HTML, because your MySpace-like site has HTML editing and customization? You can't escape, but you can still filter. You've already filtered on input right?

  • When you know it isn't HTML, you should always escape for HTML on output to an HTML template. By doing so, you are protected against XSS. XSS forms the foundation of many attacks such as session hijacking and CSRF worms, so that's a good thing.

  • Therefore, you apply both security practices, because neither can offer complete coverage. And, because you don't want to rely on security being an impenetrable wall, right?

This leads to further understanding of the concept of input and output. Input doesn't necessarily mean input from the user; it means input into the application. Output doesn't necessarily mean output to the user as HTML; it means any output from the application to an external source.

PHP does what it does best but with the hard-won security principles tacked on. In the old days, this meant gluing the user to a database back end and back, but on a modern website, this means so much more. We've gone from a "3-tier" or even an "n-tier" architecture to a complicated bunch of highly-cohesive, external services, one of which is the user of the web site.

Aside: Principle to Practice

Here's an example of how I applied the above principle of "filter input; escape output" at Tagged. Because we're a MySpace-like social network, we have to base our input filtering of certain fields on a blacklist of illegal tags, properties, and URLs instead of a whitelist of allowed tags (which is more common among many libraries). I knew I could not "build an inpenetrable wall" with a blacklist; the spec is always changing and there is an "infinite mass of shit" in security.

Instead, I did what I had time to do, and I applied the principle of "filter input; escape output" to the system architecture, not just with a kick-ass user input filter (filter input); user output (remember, I can't always escape the HTML); and escaping for the database; but also on input from the database and memcache stores.

I encoded the version number of the HTML input filter on output to the database and memcache, so on input, I could check to see if it was out of date and run the HTML filter again!

Why? Because, one day, we were hacked. Within hours, a rogue XSS worm, injected into the style tags of the widgets on our site, had infected 60,000 user profiles. Even if we stopped it on user input, we still would have the 60,000 infected user profiles to deal with, spanning across all the databases in the federation, containing 50 million user profiles. It would have taken days to clean the mess!

Instead, I asked for a copy of the exploit, figured out the nature of the attack, added a CSS parser (which I had lying around, because I knew about this attack but was too lazy to look up its exact nature), hooked it up to the HTML filtering object, and bumped the version number.

All new uploaded content was filtered against it. And, as users used the site, they were fixing it (in memcache). Then, at our leisure, we could test the new filter against regressions and slowly remove the attack permanently from the database, all using the same code, and all because of the FIEO principle.

What is Magic Quotes, and Why Is It Bad?

This is the gravy. Many of you know "magic quotes is bad." But, did you know that it's monotonically bad? (For instance, a case can be made for register_globals, but none can be made for magic_quotes_gpc.

You say this simply:

Magic quotes is bad, because it escapes input, and you should filter input and escape output!

Hard experience has taught people developing large PHP sites that you should only escape on output. Anyone who has written PHP code that is deployed on scale or on a variety of hosted services knows about the nightmare that is magic_quotes_gpc. Magic quotes has the implicit assumption that the output of all input is a MySQL or PostgreSQL database, and the attacker is not very clever.

This brings us full circle to the Perl developers who continue to argue that you can (and should) escape the input. Perl is "311 code" (chmod 311 *.pl); writer can write and execute, his team and the world can execute, nobody can read. TIMTOWTDI means the Perl developer simply isn't used to the concept that their code will be broken into parts and edited by a team of people.

PHP may have its roots in "Rasmus wants to build his personal home page, and he wants a template tool to do it," but it now has to power large-scale, complex web sites such as Yahoo! and Facebook (and Tagged).

Code has become complex. Magic quotes made sense when we we were naïve about input and output; input meant from the user, and output meant to the database; magic quotes made sense when we didn't really understand the difference between filtering and escaping. Many PHP developers have since spent years trying to remove things like magic quotes from the wreckage of late-night debugging sessions. Are you going to learn from their experiences, or are you going to doom yourself to repeat them?

That is the nature of security; practice has created good principles. These principles make for good practice.

Now We're Done

Can you see now why I don't understand why I have headhunters talking shit about me behind my back? I guess they really want web sites to fall flat on their asses and be poorly-architected, vulnerable piles.

See that picture of me at the top of this entry? I'm the PHP Security Grinch, and I'm here to tell you that my heart isn't going to grow three sizes this season, and I won't be saying my questions are too hard. Bah! Humbug!

Why? Because, I actually want your site staying online; I want you able to enjoy this holiday season without the fear of being "on call" for a late-night security session.

Parting Shot

Returning to the "security is a luxury" statement that begin this entry...

One candidate's resume listed web app security under skills. Of course, they got this battery of questions from me, and, as luck would have it, it was especially egregious. I was about to move on, but the candidate, clearly frustrated by this experience, said to me:

Look, if you give me a web site, I can make it secure.

I was crestfallen; the candidate didn't know what an SQL injection attack was! Then, I realized he was right. Shit. I can make a web site secure, too; disconnect it from the Internet.

So, I suppose this is as good of an Advent tip as any:

If you absolutely have to make your web site secure this Advent, go to the colo and pull out all your network cables.

Happy Holidays!

About this post

PHP Advent Calendar Day 13 was posted on Thu, 13 Dec 2007. If you liked it, follow me on Twitter or share:

11 comments

1.Kaloyan Tsvetkov said:

Terry's a Genius! I really enjoyed reading this :)

Fri, 14 Dec 2007 at 12:23:21 GMT Link


2.Terry Chay said:

Don't feed the egos. :-)

BTW, there is a "typo" chmod 511 should be chmod 311. ;-)

Fri, 14 Dec 2007 at 18:30:29 GMT Link


3.Keith Casey said:

The Data Access pattern is one of the more useful ones in my book. Within web2Project - the recent dotProject fork - we're implementing it as an audit tool for the project managers to know who is doing what to which projects. It was pretty much mandated by a handful of higher security-conscious organizations.

Fri, 14 Dec 2007 at 20:00:01 GMT Link


4.John Campbell said:

I don't think the terms "filter input" and "escape output" is the best way to explain the underlying concept. The fundamental problem that when "data" is treated as "code" there is a security hole.

The concept of "code" includes: Html, http/email headers, regexs, sql, javascript, shell scripts, etc.

The rule is very simple: Whenever data is mixed with code, take the appropriate steps to guarantee the data is not interpreted as code.

An example:

$safe_arg = escapeshellarg($arg);
 
exec("ls '$safe_arg'");

A person who asks, "Does this prevent $arg from being treated as code?" is much more likely to spot the problem than the person who asks "Has $arg has been escaped?"

Fri, 14 Dec 2007 at 23:46:43 GMT Link


5.Lody Simps said:

Great article and a must read for any (php)-developer that wants to improve his skills in security. Awesome read. Thanks a lot, Terry

Sat, 15 Dec 2007 at 12:16:12 GMT Link


6.Terry Chay said:

@John Campbell

I see your point, but a problem is that data on one level is code on another level. Many PHP developers don't realize, for instance, how data on the level of application server output treated as code on the level of the browser input (XSS). It is not all places where the front end developer and the app developer are the same person or even in the same team—in fact, I believe it is better if they aren't the same person as good Javascript seems to be better among those who don't know C++/Java/PHP style object syntax.

Similarly, code on the level of output from a browser is often treated as data on the input into the application server. Not understanding that this data could have come from anywhere (CSRF) is the basis of many Ajax holes on Web 2.0 sites.

The changing together of attacks forms the basis of most exploits.

Many people see the adding of CSRF protection into the latest Ruby framework as a good thing. I tend to see it as being overly abstracted from understanding simple security protocol. It is that same abstraction that causes people to miss the SQL injection questions (most people are far abstracted from it using persistence patterns in things like Active Record in Rails or Hibernate).

The costs are similarly hidden. This is why they end up eventually paying companies like OmniTI and Thoughtworks to come fix their shit if they've achieved a modicum of success.

@Lody

Thanks for the compliment. Though my friends tell me it would have been easier and shorter to just write a book. :-D

Sun, 16 Dec 2007 at 04:56:46 GMT Link


7.Terry Chay said:

By the way, read Paul's post today about output handling

http://shiflett.org/blog/2007/dec/p...calendar-day-15

Sun, 16 Dec 2007 at 04:58:48 GMT Link


8.Asanka Dewage said:

Thanks for the great article Terry. I enjoyed reading it and this will be a reference point for me to cover a lot of points... A brilliant mind with a funny face. You remind me of Jeff the wiggle. :-)

A big thanks to Chris as well for gathering all these great people to release their knowledge into this advent calendar.

Sun, 16 Dec 2007 at 11:03:24 GMT Link


9.Elizabeth Naramore said:

I meant to post this earlier, but that is one hawt shirt you're wearing.

Fri, 28 Dec 2007 at 01:49:53 GMT Link


10.Richard Lynch said:

I think your interview questions are perfectly reasonable for any serious applicant to a high-end PHP job!

If the head hunters are talking [bleep] then they need to do their job and send you better candidates.

I don't think filtering input is a very good way to prevent SQL injection attacks, however, as the pattern/rule for what should be valid input for some fields -- such as this one, would be invalid as input for an SQL attack. :-)

PS

I hate using myspace, and much of why I hate it is exactly what you reference. It may be successful, but that doesn't make it good...

ymmv

Sat, 29 Dec 2007 at 19:06:05 GMT Link


11.J Bruni said:

Thanks for writing.

Thanks for sharing.

Thank a lot.

Tue, 08 Jan 2008 at 11:45:48 GMT Link


Hello! What’s your name?

Want to comment? Please connect with Twitter to join the discussion.