PHP Advent Calendar Day 13, by Chris Shiflett

PHP Advent Calendar Day 13

13 Dec 2007

Today's entry, provided by Terry Chay, is entitled Filter Input; Escape Output: Security Principles and Practice.

Terry Chay

Name: Terry Chay
Blog: terrychay.com/blog/
Biography: When Zend puts your face on a trading card, you've either arrived in the PHP world, or you're a terrorist. Terry Chay is a PHP terrorist. Being the software architect of Tagged pays the bills. When he isn't saying politically incorrect things about web development, he is in ur Web 2.0 event, eating ur lunch, taking ur photos, and fighting off ur Ruby developers with his mad ninja coding skillz. He also likes to "draw the line at yellow."
Location: San Francisco, California

One of the strangest things about living in the Bay Area is the total lack of PHP support groups. We probably have the largest density of PHP developers (skilled and unskilled) in the world, and yet finding 50 people willing to go in on a shipment of elePHPants turns you into a rock star out here.

So when there was a San Francisco PHP Meetup, I had to go. The topic this month was security. In light of that, I thought I'd use my Advent calendar entry to show that I'm not just a front-end Ajax developer guy or a PHP design patterns guy. I can also roll with the big boys and talk about web app security.

Terry and Security, the Real Oxymoron

Those who have heard my latest talk, The Internet is an Ogre, might find my choice of topic a bit ironic. After all, I am the one who said, "Web security is a luxury," and:

For any of you think good coding, design aesthetic, or web security are important, I have only one word for you: MySpace.

If there is one thing I have learned from blogging, it's that when you say outrageous things, people listen, and a few actually believe you. And, if the claim has the added bonus of being possibly true, that's just gravy. (Even if you're full of shit, people will just say you're perceptive; nobody has the balls to call you out on it, except people on Slashdot, and everyone knows they're like a stopped clock, only right twice a day.)

Besides, conferences are not fun unless you can get a rise out of Chris Shiflett, Ed Finkler, and Ilia Alshanetsky at the same time.

The truth is that security questions are some of my favorite interview questions. I'm going to cover what I started ranting about at the San Francisco PHP Meetup: these interview questions, how I'd answer them, and how this applies to my understanding of PHP and web application security.

No candidate has ever answered all these questions correctly. A coworker observed one of my interviews and said afterward, "When you asked those security questions, I thought [the candidate] was actually going to cry." I have had many headhunters complain about my interview questions to my boss.

Practice Comes from Principles

A lot of you are asking, "What book should I buy?" I recommend Chris Shiflett’s book (Essential PHP Security). (If you are too poor to buy the book, just visit the old PHP Security Guide instead.) Why? Because it is impossibly small.

Web app security is both really simple and an infinite mass of shit. If you start with an ad hoc approach, it will seem to only be the latter; but, if you take to the time to learn the building blocks which form the language of security principles, then it starts to all make sense and become the former.

Books like this, by being small, focus on the vocabulary and principles without drowning you in detail. I want you to take the time to learn this language. If you don't have the vocabulary, then you can't do web app security. Now, onto the interview questions.

Question: What is an SQL Injection Attack? Give Me an Example.

SQL injection is a vulnerability that allows input to manipulate the format of an SQL query, causing unwanted SQL to be executed by the database.

Most candidates get that part, but the second part trips about half of all candidates.

Little Bobby Tables

The basic points I'm looking for are:

Have you ever thought like an attacker? If you can't think like an attacker, you can't think like a defender; if you can't create exploits, you can't defend against them.
Does your exploit include a basic escape sequence?
Does your exploit inject data?

Bonus points if you can explain why PHP's MySQL extension isn't vulnerable to the xkcd exploit.

Question: Name Three Safeguards Against SQL Injection. For Each, Explain Where You Use It.

The key to answering this question is to understand that the nature of the attack is often focused on a quote mark. So, the solutions are to remove the quote mark, escape the quote mark, or use a built-in feature to protect against the quote mark.

In other words:

You can filter the quote mark on input.
You can escape the quote mark, using something like mysql_real_escape_string(), just before output (to the database).
You can use a prepared statement, if your database supports it. (PDO emulates prepared statements if the database doesn't support them. Both the database and the extension must support prepared statements, else what you are doing is just an abstracted version of the second answer.)

Almost every candidate can give at least one, although that's not quite fair, since they have their DB interview first; they can learn about prepared statements there. Many candidates get all three with a little guidance!

Bonus points if you mention mysql_real_escape_string() and more if you explain the difference between it and addslashes(). This has never occurred, so I'm not too sure how I'd feel if someone pointed it out.)

Very few people know where to implement these safeguards. I'm hoping the new filter extension changes this. A number of people have argued with me about the correct place to filter input and escape output. Many are competent Perl developers. You'll better understand why they can't accept their mistake later in this entry.

Question: Which Safeguard Against SQL Injection Is Best?

This is a trick question. My answer is that I filter input and use prepared statements. (I escape output if prepared statements are not available.) There is no single best approach, although I give props if you assert that prepared statements are better than the alternatives.

Why? That's just good security!

Security is not a impenetrable wall. It is a decently-sized wall, with a moat in front of it, a mountain surrounding it all parts but the entrance, and a good number of guards on the battlements.

I can tell stories for hours about people who haven't understood this principle and have paid the price. But, in this case, it's easier to ask you about the following cases:

What if the data is a person's name, and he's Tim O'Reilly?
What happens if, at a later date, you decide to use the data in a different context (such as the filesystem, memcache, or HTML) prior to or instead of storing it in the database?
What if you migrate from MySQL to SQL Server, which has a different method for escaping? (Using '' instead of \' to represent an escaped single quote.)
What if you migrate to a data store that doesn't support prepared statements?

Things change. The principle here is if the security protocol is inconvenient, always implement it as early as possible in the application flow. More on this later.

Question: Create a Single Audit Point for Injection Attacks.

People always miss the earlier questions, so I never ask this anymore. I'm tired of having headhunters talk shit about me behind my back to everyone in the Bay Area.

My answer would be a Data Access pattern. If you're a framework guy, you can use a persistence layer like ActiveRecord to abstract yourself entirely from the database, because, apparently, LEFT JOIN is just too damn hard for you.

Given how many people fail this series of questions in interviews, I'm inclined to agree.

Question: What Is Cross-Site Scripting (XSS)? Cross-Site Request Forgery (CSRF)? Session Fixation? Give Me an Example of Each.

The reason I ask for examples is to give you the opportunity to apply this stuff in practice. To think like an attacker shows real, practical knowledge beyond the simple theory. Besides, the principles come from the practice.

Wikipedia has decent definitions of XSS, CSRF, and session fixation.

I'll confess that the only reason I ask about session fixation is because I occasionally meet a candidate who can regurgitate XSS and CSRF descriptions, and the sadistic side of me wants to see if I can break them. Asking about session fixation is like asking how to laugh in hexadecimal. (Answer: 48 41 48 41.) It's an obscure vulnerability known by us old-timers that is easily corrected and fun to lord over people.

CSRF is especially important because of the abundance of Ajaxified web sites. But, the one I'm really interested in is XSS, because I focus on it later in the questioning.

Question: Explain How the MySpace Worm Works. Give Me an Example that Uses CSRF to Determine the Login State on a Remote Site.

I don't typically ask these questions, but some of the more belligerent candidates bitch about the previous series as being too pedantic and "just about terminology." This dismissal is the interview equivalent of "I'm not really into Pokémon." Do they think I ask these questions for fun?

You Better Be Into This Pokémon

If you can't answer the above two questions, figure them out on your own. Understanding how to answer these is how you'll develop a zen-like ability to quickly understand security vulnerabilities and be able to build security practices once you've taken the time to understand these simple security principles. (In this case, you must combine XSS, CSRF, and Javascript exceptions.)

Question: What Does "Filter Input; Escape Output" Mean? Give Me an Example of Each.

You can reference Wikipedia’s definition, but put simply, it is the principle that filtering should be done as soon as data enters, and escaping should be done just before it exits.

Even if people can intuit that or have already heard it (surprising few candidates have, but most can guess), many haven't contemplated what filtering and escaping really mean. That's why I ask the other questions.

You can apply this principle to SQL injection and XSS (and any other injection attack):

Filtering can help protect against SQL injection by removing all single quotes. (This isn't foolproof, because not all SQL injection attacks require a single quote.)
Filtering can help protect against XSS attacks by removing HTML tags with strip_tags(), ensuring the data adheres to a specific pattern with regular expressions, or using a combination of HTML normalization and a DOM walker that implements a whitelist or blacklist filter. These techniques remove the <script> tags as well as injections into CSS. (You may need a CSS parser unless you strip out all style attributes and <style> tags.)
Escaping can help protect against SQL injection by maintaining the distinction between the SQL query and the data. Use something like mysql_real_escape_string() or prepared statements.
Escaping can help protect against XSS attacks by maintaining the distinction between the HTML and the data. Use htmlspecialchars() or htmlentities(). Both will do things like replace < with < but htmlentities() does a bit more if you know the output is HTML and not XML. (Be sure to match the character encoding of your Content-Type header.)

It might have helped if we had called it encoding instead of escaping, but we don't, so deal.

From this, the rest follows.

Filter on input to adhere to the practice of implementing as many security safeguards as possible as early as possible. When data enters the application, filter it first. (Some candidates misinterpret this as filtering in the client side code. That's easily avoided by the most inexperienced attackers. They forget that they're PHP developers, not front end developers; we are talking about input into the PHP application.)
Escape on output, because escaping functions are going to be different depending on where the data goes. If it goes to a MySQL database, you use mysql_real_escape_string(). When using it as an argument on the command line, use escapeshellarg(). When sending it back to the user in HTML, use htmlentities()
.

These principles apply to XSS just as they did to SQL injection:

What if you want the HTML to be output as HTML, because your MySpace-like site has HTML editing and customization? You can't escape, but you can still filter. You've already filtered on input right?
When you know it isn't HTML, you should always escape for HTML on output to an HTML template. By doing so, you are protected against XSS. XSS forms the foundation of many attacks such as session hijacking and CSRF worms, so that's a good thing.
Therefore, you apply both security practices, because neither can offer complete coverage. And, because you don't want to rely on security being an impenetrable wall, right?

This leads to further understanding of the concept of input and output. Input doesn't necessarily mean input from the user; it means input into the application. Output doesn't necessarily mean output to the user as HTML; it means any output from the application to an external source.

PHP does what it does best but with the hard-won security principles tacked on. In the old days, this meant gluing the user to a database back end and back, but on a modern website, this means so much more. We've gone from a "3-tier" or even an "n-tier" architecture to a complicated bunch of highly-cohesive, external services, one of which is the user of the web site.

Aside: Principle to Practice

Here's an example of how I applied the above principle of "filter input; escape output" at Tagged. Because we're a MySpace-like social network, we have to base our input filtering of certain fields on a blacklist of illegal tags, properties, and URLs instead of a whitelist of allowed tags (which is more common among many libraries). I knew I could not "build an inpenetrable wall" with a blacklist; the spec is always changing and there is an "infinite mass of shit" in security.

Instead, I did what I had time to do, and I applied the principle of "filter input; escape output" to the system architecture, not just with a kick-ass user input filter (filter input); user output (remember, I can't always escape the HTML); and escaping for the database; but also on input from the database and memcache stores.

I encoded the version number of the HTML input filter on output to the database and memcache, so on input, I could check to see if it was out of date and run the HTML filter again!

Why? Because, one day, we were hacked. Within hours, a rogue XSS worm, injected into the style tags of the widgets on our site, had infected 60,000 user profiles. Even if we stopped it on user input, we still would have the 60,000 infected user profiles to deal with, spanning across all the databases in the federation, containing 50 million user profiles. It would have taken days to clean the mess!

Instead, I asked for a copy of the exploit, figured out the nature of the attack, added a CSS parser (which I had lying around, because I knew about this attack but was too lazy to look up its exact nature), hooked it up to the HTML filtering object, and bumped the version number.

All new uploaded content was filtered against it. And, as users used the site, they were fixing it (in memcache). Then, at our leisure, we could test the new filter against regressions and slowly remove the attack permanently from the database, all using the same code, and all because of the FIEO principle.

What is Magic Quotes, and Why Is It Bad?

This is the gravy. Many of you know "magic quotes is bad." But, did you know that it's monotonically bad? (For instance, a case can be made for register_globals, but none can be made for magic_quotes_gpc.

You say this simply:

Magic quotes is bad, because it escapes input, and you should filter input and escape output!

Hard experience has taught people developing large PHP sites that you should only escape on output. Anyone who has written PHP code that is deployed on scale or on a variety of hosted services knows about the nightmare that is magic_quotes_gpc. Magic quotes has the implicit assumption that the output of all input is a MySQL or PostgreSQL database, and the attacker is not very clever.

This brings us full circle to the Perl developers who continue to argue that you can (and should) escape the input. Perl is "311 code" (chmod 311 *.pl); writer can write and execute, his team and the world can execute, nobody can read. TIMTOWTDI means the Perl developer simply isn't used to the concept that their code will be broken into parts and edited by a team of people.

PHP may have its roots in "Rasmus wants to build his personal home page, and he wants a template tool to do it," but it now has to power large-scale, complex web sites such as Yahoo! and Facebook (and Tagged).

Code has become complex. Magic quotes made sense when we we were naïve about input and output; input meant from the user, and output meant to the database; magic quotes made sense when we didn't really understand the difference between filtering and escaping. Many PHP developers have since spent years trying to remove things like magic quotes from the wreckage of late-night debugging sessions. Are you going to learn from their experiences, or are you going to doom yourself to repeat them?

That is the nature of security; practice has created good principles. These principles make for good practice.

Now We're Done

Can you see now why I don't understand why I have headhunters talking shit about me behind my back? I guess they really want web sites to fall flat on their asses and be poorly-architected, vulnerable piles.

See that picture of me at the top of this entry? I'm the PHP Security Grinch, and I'm here to tell you that my heart isn't going to grow three sizes this season, and I won't be saying my questions are too hard. Bah! Humbug!

Why? Because, I actually want your site staying online; I want you able to enjoy this holiday season without the fear of being "on call" for a late-night security session.

Parting Shot

Returning to the "security is a luxury" statement that begin this entry...

One candidate's resume listed web app security under skills. Of course, they got this battery of questions from me, and, as luck would have it, it was especially egregious. I was about to move on, but the candidate, clearly frustrated by this experience, said to me:

Look, if you give me a web site, I can make it secure.

I was crestfallen; the candidate didn't know what an SQL injection attack was! Then, I realized he was right. Shit. I can make a web site secure, too; disconnect it from the Internet.

So, I suppose this is as good of an Advent tip as any:

If you absolutely have to make your web site secure this Advent, go to the colo and pull out all your network cables.

Happy Holidays!

Chris Shiflett Boulder-based founder, designer, and developer. Co-founder of Studioworks and Schoolcase, and founder of Faculty, a product studio. Writing about building things on the web since 2000. More about Chris →

Photo Location