Just a little reminder before you start: In part 1 we explored the infrastructure of the Internet and the make-up of a Web server. We left off at the stage where our Web server software is up and running again, and we’ve just double-checked this by telnetting an HTTP request and received the successful response code. It’s now time for…
Finding Your Website
The 200 code means that your home page is okay, and you should be able to visit it in your browser. However, it may not show what you expected, and your fabulous Widget 3000 page may still be absent.
Virtual Hosts and Streams
As mentioned above, many servers host multiple websites. One of these is the default website. It is the website you get when you visit the server by IP address http://80.72.139.101/ instead of by name, or when you leave off the Host: line in the HTTP request while telnetting. The rest of the websites are known as virtual hosts. Every one of these websites has a physical location on the server known as its document root. To further investigate your website woes, you need to discover its document root.
Fortunately and sensibly, most server management packages like Plesk store their virtually hosted websites according to their domain name, so you can usually just find directly on the domain name. The / in the command below tells find to search the whole file system, the -type d looks only for directories, and the -name part searches for any directories containing “smashingmagazine”. The asterisks are wild cards. You’ll need to either escape them *smashingmagazine* or put them in quotes “*smashingmagazine*”:
If you run this command as a normal unprivileged user, you will probably see lots of “Permission denied” as find tries to explore forbidden places. You are actually seeing two types of output here: stdout for “standard output” and stderr for “standard error”. They are called output streams and are confusingly mixed together.
You have already encountered the pipe symbol | for piping the output stream (stdout) of one command into the input stream (stdin) of another. The symbol > can redirect that output into a file. Try this command to send all the matches into a file called matches.txt:
In this case, all the stdout is redirected into the file matches.txt and only the error output stream stderr is displayed on the screen. By adding the number 2 you can instead redirect stderr into a file and just display stdout:
There is a special file on Linux, UNIX and Mac computers which is basically a black hole where stuff gets sent and disappears. It’s called /dev/null, so to only see stdout and ignore all errors:
The end result is that this find command tells you roughly where your document root is. In Plesk, all the virtual hosts are generally stored within the /var/www/vhosts directory, with the document roots in /var/www/vhosts/domain.com/httpdocs.
The Long Way
You can find the document root more accurately by looking through the configuration files. For Apache servers, you can find the default website’s document root by looking through the main configuration file which is usually /etc/apache2/apache2.conf or /etc/httpd/conf/httpd.conf.
Somewhere inside this conf file will also be an Include line which references other conf files, which may themselves include further conf files. To find the DocumentRoot for your virtual host, you’ll need to search through them all. You can do this using grep and find but its a long command, so we will build it up gradually.
First, we will find all the files (because of the -type f) on the whole server (the /) whose names end in “conf” or “include”. The -type f finds only files and the -o lets us look for files ending in “conf” or “include”, with surrounding escaped parentheses. As above, the errors are banished into the ether:
This is not quite complete as any files with spaces will confuse the grep command we are about to attempt. To fix that you can pipe the output of the find command through the sed command which allows you to specify a regular expression. Regular expressions are a huge topic in their own right. In the command below, the s/ /\ /g will replace all spaces with a slash followed by a space:
Now you can use a backtick to embed the results of that find command into a grep command. Using ` is different than | as it actually helps to build a command, rather than just manipulating its input. The -H option to grep tells it so show file names as well. So, now we will look for any reference to “smashingmagazine” in any conf files.
This may take a few seconds to run. It is finding every conf file on the server and searching through all of them for “smashingmagazine”. It may reveal the DocumentRoot directly. If not, it will at least reveal the file where the ServerName or VirtualHost is defined. You can then use grep or less to look through that file for the DocumentRoot.
You can also use the xargs command to do the same thing. It also allows the output from one command to be embedded into another:
The end result, hopefully, is that you’ve definitively found the document root for your website.
You can use a similar technique for nginx. It also has a main conf file, usually in /etc/nginx/nginx.conf, and it can also include other conf files, however its document root is just called “root”.
Apache Control Interface
With Apache, there is yet another way to find the right conf file, using the apachectl or newer apache2ctl command with the -S option.
If this whizzes by too fast, you can try piping the results through grep. It won’t work, however, because grep only operates on stdout and for some reason apachectl outputs its information to stderr. So, you have to first direct stderr into stdout and then send it through grep. This is done by redirecting the error stream 2 into the output stream 1 with 2>&1, like this:
This also reveals the conf file which contains the DocumentRoot for this website. As above further grep or less will reveal the DocumentRoot.
Checking the Document Root
Now that you’ve found the document root, you can snoop around to make sure it’s alright. Change to the directory with cd:
If you get the error message “No such file or directory”, that is bad news. Either the DocumentRoot has been incorrectly set or your whole website has been deleted. If it is there, you can list the files with ls. The -a also shows hidden files which start with a dot, and -l displays them in long format with permissions and dates:
Every folder will at least show these two entries. The single “.” is for the current directory and “..” is for the parent directory. If that’s all there is, then the directory is empty.
While you’re there, you can double-check you are in the correct place. Create a new file using echo and again using the > symbol to send the output to a file.
This will create a file called testfile.html containing a bit of HTML. You can use your browser or telnet or curl or wget to see if the file is where it should be.
If that worked, then well done, you have found your website! Remove that test file to clean up after yourself with rm testfile.html and keep going.
Back up and Restore
The tar and zip commands can be used to back up and restore. If your website is missing, then restore won’t help you much unless you have previously backed up. So go back in time and backup your data with one of the commands below. To go back a whole day:
Just kidding — but it would be nice! The tar command stands for tape archive and comes from the days when data was backed up on magnetic tapes. To create an archive of a directory, pass the cfz options to tar which will create a new archive in a file and then zip it in the gzip format.
All Mac and Linux computers support the tar command and most also have zip. To do the same with zip:
To see what an archive contains, run:
Or for zip format:
Both tar and zip strip the leading slashes when they backup. So when you restore the files, they will be restored within the current directory. To restore them in the same location they were backed up from, first cd to /.
The “v” above stands for verbose and causes tar to show what it’s doing. zip has a similar option:
Website Errors
Let’s assume your website hasn’t actually disappeared. The next place to look is the error log file.
Finding the Log File
When using a server management package like Plesk, each website probably has its own log file. You can find it by grepping for the word “log” in the conf file you identified above. The -i means case-insensitive.
There is also a server-wide log where any non-website-specific errors go. You can find this in the main conf file:
Htaccess Errors
It is very easy to screw up a website. You can quite readily bring down a very big website by removing a single character from the .htaccess file. Apache uses the file .htaccess to provide last-minute configuration options for a website. It is most often used for URL rewriting rules. They look like this:
This rule says to rewrite any URL in the form “products/widget-3000/123” to the actual URL “products/view.php?id=123”. The L means that this is the last rule to be applied and QSAattach any query string to the new URL. URL rewriting is often used for search engine optimization so that Web managers can get the name of the product into the URL without actually having to create a directory called “widget-3000”.
However, make a single typo and your whole website will give a 500 Internal Server Error.
The tail command will display the last 10 lines of a log file. Give it a -1 to display the single last line instead. An .htaccess problem will look like this:
You can grep for all of these types of errors:
PHP Parse and Runtime Errors
Many websites use the LAMP combination: Linux, Apache, MySQL and PHP. A common reason for Web pages not showing up is that they contain a PHP error. Fortunately, these are quite easy to discover and pinpoint.
There are two broad classes of PHP errors: parse errors and runtime errors. Parse errors are syntax errors and include leaving off a semicolon or forgetting the $ in front of a variable name. Running errors include undefined functions or referencing objects which don’t exist.
Like .htaccess errors, parse errors will cause an HTML response code 500 for Internal Server Error, often with a completely blank HTML page. Runtime errors will give a successful HTML response of 200 and will show as much HTML as they have processed (and flushed) before the error happened. You can use telnet or wget -S or curl -i to get only the headers from a URL. So now, copy and paste your erroneous page into a command:
PHP Error Settings
To find the exact error, you need to make sure errors are being reported in the log file.
There are several PHP settings which cover errors. display_errors determines if errors are shown to the website visitor or not, and log_errors says whether they will appear in log files. error_reporting specifies the types of errors that are reported: only fatal errors, for example, or warnings and notices as well. All of these can be set in a configuration file, in .htaccess or within the PHP script itself.
You can find out your current settings by running the PHP function phpinfo. Create a PHP file which calls the function and visit it in your browser:
phpinfo function showing configuration settings.
The two columns show the website and server-wide settings. This shows that display_errors is off, which is good, because it should be off on live websites. It means that no PHP errors will ever be seen by the casual visitor. log_errors on the other hand should be on. It is very handy for debugging PHP issues.
The error_reporting value is 30719. This number represents bit flags or bit fields. This is a technique for storing multiple yes/no values in a single number. In PHP there are a series of constants representing different types of errors. For example, the constant E_ERROR is for fatal errors and has the value 1; E_WARNING is for warnings and equals 2; E_PARSE is for parsing or syntax errors and has the value 4. These values are all powers of two and can be safely added together. So the number 7 means that all three types of errors should be reported, as E_ERROR + E_WARNING + E_PARSE = 7. A value of 5 will only report E_ERROR + E_PARSE.
In reality, there are 16 types of errors from 1 for E_ERROR to 16384 for E_USER_DEPRECATED. You can type “30719 in binary” into Google and it will give you the binary equivalent: 0b111011111111111. This means that all errors are switched on except the twelfth, which is E_STRICT. This particular setup has also been given a constant E_ALL = E_ERROR + E_WARNING + E_PARSE + etc = 30719. From PHP version 5.4.0, E_ALL is actually 32767 which includes all the errors include E_STRICT.
If your error_reporting setting is 0, then no errors will show up in the log file. You can change this setting in the file php.ini, but then you have to restart Apache to make it have an effect. An easier way to change this setting in Apache is to add a line in a file called .htaccess in your document root: php_value error_reporting 30719.
Or you can do that on the command line, using the double arrow which appends to an existing file or creates the file if it doesn’t exist:
Refresh your erroneous Web page. If there is a PHP error in your page it should now show up in the error log. You can grep the log for all PHP errors:
If you have referenced variables or array indices before assigning them values, you may see thousands of PHP notices like the one above. It happens when you do things like <? $total = $total + 1 ?> without initially setting $total to 0. They are useful for finding potential bugs, but they are not show stoppers. Your website should work anyway.
You may have so many notices and warnings like this that the real errors get lost. You can change your error_reporting to 5 to only show E_ERROR and E_PARSE or you can grep specifically for those types of errors. It is very common to chain grep commands together like this when you want to filter by multiple things. The -e option below tells the second grep to use a regular expression. This command finds all log entries containing “PHP” and either “Parse” or “Fatal”.
Seeing Errors in the Browser
If you are tracing a runtime error rather than a parse error, you can also change the error_reporting setting directly in PHP. And you can quickly turn display_errors on, so you will see the error directly in your browser. This makes debugging quicker, but means everyone else can see the error too. Add this line to the top of your PHP page:
These two functions change the two PHP settings. The | in the error_reporting call is a bit OR operator. It effectively does the same as the + above but operates on bits, so is the correct operator to use with bit flags.
Any fatal errors or warnings later in the PHP page will now be shown directly in the browser. This technique won’t work for parse errors as none of the page will run if there’s a parse error.
Bit Flags
Using bit flags for error_reporting avoids having 15 separate arguments to the function for each type of error. Bit flags can also be useful in your own code. To use them, you need to define some constants, use the bit OR operator | when calling the function and the bit AND operator & within the function.
Here’s a simple PHP example using bit flags to tell a function called showproduct which product properties to display:
This will display “Widget 3000: $10” in the browser.
Infinite Loops
PHP’s error reporting may struggle with one class of error: an infinite loop. A loop may just keep executing until it hits PHP’s time limit, which is usually 30 seconds (PHP’s max_execution_time setting), causing a fatal error. Or if the loop allocates new variables or calls functions, it may keep going until PHP runs out of workable memory (PHP’s memory_limit setting).
It may, however, cause the Apache child process to crash, which means nothing will get reported, and you’ll just see a blank or partial page. This type of error is increasingly rare, as PHP and Apache are now very mature and can detect and handle runaway problems like this. But if you are about to bang your head against the wall in frustration because none of the above has worked, then give it some consideration. Deep within your code, you may have a function which calls some other function, which calls the original function in an infinite recursion.
Debuggers
If you’ve gotten this far, and your page is still not showing up, then you’re entering more difficult territory. Your PHP may be executing validly and doing everything it should, but there’s some logical error in your programming. For quick debugging you can var_dump variables to the browser, perhaps wrapping them in an if statement so that only your IP address sees them:
This method will narrow down an error but it is ungraceful and error-prone, so you might consider a debugging tool such as Xdebug or FirePHP. They can provide masses of information, and can also run invisibly to the user, saving their output to a log file. Xdebug can be used like this:
This bit of code logs all function calls and arguments to the file /tmp/xdebugtrace.txt. It displays even more information when there is a PHP notice or error. However, the overhead may not be suitable for a live environment, and it needs to be installed on the server, so it’s probably not available in most hosting environments.
FirePHP, on the other hand, is a PHP library that interacts with an add-on to Firebug, a plugin for Firefox. You can output debugging information and stack traces from PHP to the Firebug console.
Security Issues
By this point, you should have some HTML reaching your browser. If it’s not what you expect, then there’s a chance that your website has been compromised. Don’t take it personally (at first). There are many types of hacks and most of them are automated. Someone clever but unscrupulous has written a program which detects vulnerabilities and exploits them. The purpose of the exploit may simply be to send spam, or to use your server as part of a larger attack on a more specific target (a DDoS).
Server Hacks
Operating systems are very complex pieces of software. They may be built from millions of lines of programming code. They are quite likely to have loopholes where sending the wrong message at just the wrong time will cause some kind of blip which allows someone or something to gain entry. That’s why Microsoft, Apple, Ubuntu and others are constantly releasing updates.
Similarly, Apache, nginx, IIS and all the other software on a typical server is complicated. The best thing you can do is keep it up to date with the latest patches. Most good hosts will do this for you.
A hacker can use these flaws to log in to your server and engineer themselves a terminal session. They may initially gain access as an unprivileged user and then try a further hack to become the root user. You should make this as hard as possible by using good passwords, restrictive permissions, and being careful to run software (like Apache) as an unprivileged user.
If someone does gain access, they may leave behind a bit of software which they can later use to take control of your server. This may be detectable by an antivirus scanner or something like the Rootkit Hunter, which looks for anomalies like unexpected hidden files. But there are also a few things you can do if you suspect an intrusion.
The w command shows who is currently logged in to a server and what they are doing:
The last command shows who has logged in recently in date order. Pipe it through head to show only the first 10 lines.
It tells you who has logged in and for how long, plus any terminal session they have open. down means until the server shut down. Look for unexpected entries and consult your host or a security expert if you are in doubt.
PHP Hacks
More common are hackers who gain entry though vulnerabilities in PHP scripts, especially popular content management systems like WordPress. Anybody can write a plugin for WordPress and, if it’s useful, people will install it. When writing a plugin, most developers think primarily about the functionality and little about security. And because WordPress allows file uploading, hackers who find vulnerabilities can use them to upload their own PHP scripts and later take control of a computer.
These PHP scripts can use the PHP mail function to send out spam on demand, but they can also try to execute commands in much the same way as you can via a terminal session. PHP can execute commands with its exec or system functions. If you do not need to use these functions, it is advisable to disable them. You can do this by adding the disable_functions directive to your server’s php.ini file (or php5.ini for PHP 5) or to the file php.ini within your document root. If you search for “php disable functions” in Google, you will find a whole list of functions which should be disabled in this way:
A quick check you can make for this type of hack is to look for all PHP files modified recently and make sure there are no anomalies. The -mtime -1 option tells find to only consider files modified within the last day. There’s also -mmin for minutes. This command searches all websites within /var/www/vhosts for recently modified files ending in “php” or “inc”:
PHP hacks are difficult to detect because they are designed to not stick out. One method hackers use is to gzip their PHP and then encode it as base64. In that case, you may have a PHP file on your system with something like this in it:
Another method is to encode text within variables and then combine them and evaluate them:
Both these methods use the PHP eval function, so you can use grep to look for eval. Using a regular expression with *eval* means that the word “eval” must have a word boundary before and after it, which prevents it being found in the middle of words. You can combine this with the find command above and pipe through less for easy reading:
If you do find this type of hack in your website, try to discover how they got in before completely removing all the tainted files.
Access Logs
Along with error logs, Apache also keeps access logs. You can browse these for suspicious activity. For example, if you found a PHP hack inside an innocuous looking file called test.php, you can look for all activity related to that file. The access log usually sits alongside the error log and is specified with the CustomLog directive in Apache configuration files. It contains the IP address, date and file requested. Search through it with grep:
This looks for GET and POST requests for the file test.php. It provides you with an IP address, so you can now look for all other access by this address, and also look for a specific date:
This kind of debugging can be very useful for normal website errors too. If you have a feedback form on your website, add the user’s IP address to the message. If someone reports an error, you can later look through the logs to see what they have been up to. This is far better than relying on vague second-hand information about reported problems.
It can also be useful for detecting SQL injection attacks, whereby hackers try to extract details from your database by fooling your database retrieval functions. This often involves a lot of trial and error. You could send yourself an email whenever a database query goes wrong and include the user’s IP address. You can then cross-reference with the logs to see what else they have tried.
Last Resorts
William Edward Hickson is credited with popularizing the saying:
“If at first you don’t succeed, try, try, try again.”
Hickson was a British educational writer living in early Victorian times. His advice is not appropriate for the modern Web developer, lying in bed on a Saturday morning, drowning in frustration, staring at a blank Web page, preparing to chuck an expensive laptop against a brick wall.
You’ve now been through all the advice above. You’ve checked that the world hasn’t ended, verified your broadband box, tested the Internet and reached your server. You’ve looked for hardware problems and software problems, and delved into the PHP code. But somehow or other, your Widget 3000 is still not there. The next thing to do is...
Have Breakfast
Get out of bed and take your mind off the problem for a little while. Have some toast, a bowl of cereal, something to drink. Maybe even indulge in a shower. Try that new lavender and citrus shampoo you bought by mistake. While you’re doing all this, your subconscious is busily working on the website issue, and may unexpectedly pop a solution into your thoughts. If so, give it a try. If not…
Ask for Help
Check the level of support that you are entitled to by your hosting company. If you are paying $10 per month, it’s probably not much. You may be able to get them to cast a vague glance in your direction within the next 72 hours. If it’s substantially more, they may log in and have a look within the next few minutes. They should be able to help with hardware or software issues. They won’t help with Web programming issues. Alternatively, ring a colleague or freelancer. If you are still stuck…
Prepare
…to release some nervous energy. Find one of those squidgy balls that you can squeeze mercilessly in your hands, or a couple pencils to use as drumsticks, or a pack of cigarettes and a pot full of coffee. And then try the last resort to any computing problem…
Reboot
When your laptop or desktop goes wrong, a common solution is to reboot it. You can try the same trick on your Web server. This is a quite risky. Firstly, it may not solve the problem. If it’s a PHP error, then nothing will change. If, however, your issue is caused by some obscure piece of software becoming unresponsive, then it may well help, though it may not fix the problem permanently. The same thing may happen next week.
Secondly, if the reboot fails then you will be really stuck. If the server shuts down but fails to start back up again, then someone may have to go and press the power button on the physical machine. That someone is an employee of your hosting company, and they may be enjoying their breakfast too, in a nice comfortable office somewhere. They may have left their jumper at home. They may not want to enter the air-conditioned bunker where all the servers are kept. You will be thoroughly dependent on their response time.
Given all the risks, the command is:
The reboot command is really just a wrapper for /sbin/shutdown -r now. It causes the server to shut down and then restart. That may take a few minutes. Soon after issuing the command above your SSH session will come to an abrupt end. You will then be left for a few nervous minutes wondering if it will come back up again. Use the tools you prepared above.
While you are waiting, you can issue a ping to see if and when your server comes back. On Windows use ping -t for an indefinite ping:
You can breathe a sigh of relief when ping finally responds. Wait a couple more minutes and you’ll be able to use ssh again and then try to view the Widget 3000 in your Web browser.
Conclusion
This has been an epic journey, from the end of the world to a single misplaced character in a file. Hopefully, it will help you through the initial few minutes of panic when you wake up one morning and the beautiful product page you created last night is gone.
Some of the reasons and solutions above are very rare. The most likely cause is simply a slight malfunction in your broadband box. Running out of disk space or getting hacked are the only other things that are in any way likely to happen in the middle of the night when nobody else is working on the website. But throw in other developers, server administrators and enthusiastic clients — and anything is possible. Good luck!
Footnotes
1. Oxford Dictionary of Quotations (3rd edition), Oxford University Press, 1979
Paul Tero's chapter has been reviewed by Ben Dowling and Sergey Chikuyonok.
Pre-Order Your Copy Today!
We hope you've enjoyed this sample chapter of the Smashing Book #4! We would sincerely appreciate your support with an occasional tweet, Facebook post or Google+ update, or just a word to your friends and colleagues! You can also learn more about the Smashing Book #4 first. Again, thank you for all your support!
(og) (il) (vf) (ea) (cm)
© The Smashing Editorial for Smashing Magazine, 2013.