Coding, is? Fun!

Thursday, March 13, 2008

Drupal - Common Do's, Don'ts

If you are starting up a Drupal project or you are analyzing one, here are a few Do's and Don'ts:
1. When you are developing with Drupal you will be required to create custom modules or extend existing modules. Please remember that Drupal is a developing product and therefore to manage upgrades they recommend a few standards. The first one is that any custom modules you develop should NOT be under the modules folder. Create them under the /sites/all folder. You can still install them and configure them from the administrative interface. This way custom modules are isolated from Drupal core modules.

2. It is tempting in Open Source frameworks to modify the core code. This is generally not necessary in Drupal. As an example, to create a custom block in a profile page with user information, you need not modify the User module. You can add a file called template.php in your theme's folder. You can then override the theme_user_profile method, by adding a _user_profile method.

3. Similarly, it is a common requirement to select a subset of content at some point. Let us say your content depends on an additional key such as department. So for a particular department you want to show content only for that department. You should NOT modify the node module to accomplish this. The node module and other core Drupal modules allow you to modify the queries they send to the database by using the db_rewrite_sql hook. You can add your own custom filters (such as the department id) to a query created by the node module by overriding that hook.

4. If you do modify core Drupal modules, keep track of which were modified and where, by using commenting patterns.

5. Use the Filtered HTML filter always in user content so that you can protect the site from Cross-site Scripting attacks.

6. Use the .install files in your custom modules to install custom tables in the database. Avoid creating your own MySql (or Postgres) scripts to run separately.

7. If you are using Ajax in a Drupal site use the JQuery functions instead of rolling your own. JQuery is packaged with Drupal and it is easier handling Ajax through it.

The key idea is that when you think of modifying core Drupal code, most of the time you can get away by extending existing modules, or adding hooks. The framework is built with the idea that it should be extensible, not mutable. If you spend some time figuring out the way to extend a module (instead of hacking into the core modules) you will avoid lots of headache later.

Labels: ,

Lessons from a Live Server Issue in .NET

(Updated Below)
A couple of months back my company was asked to upgrade the MCMS based public site of a client. We performed the upgrade for a subsite from .NET 1.1 to .NET 2.0. When we rolled out the upgraded code, we also changed the .NET runtime in their main rootsite and other virtual directories to .NET 2.0 runtime.
The load balanced site ran for 4 hours on both servers and then started giving "server very busy" (HTTP 503) errors. We were alerted and then worked off and on for 3 weeks and found the problem.
The problem had to do with HttpWebRequest-Response objects in .NET 1.1 and 2.0.
The 1.1 and 2.0 documentation for HttpWebResponse clearly states that not closing the response stream will cause a server to run out of connections. The behavior thereafter is not defined, but we found the behavior alright - subsequent threads that try to open the response stream just HANG. They block the entire request execution and the server starts throwing HTTP 503 errors.

Here is the offending code:
string sURL = "http://localhost/NR/...";
HttpWebRequest wReq = (HttpWebRequest)WebRequest.Create(sURL);
wReq.Method = "HEAD";
HttpWebResponse resp = (HttpWebResponse)(wReq).GetResponse();
string cntLen = resp.Headers["Content-length"];


This code does NOT close the response object. It should call “resp.Close();”.

Yet, this code worked fine in the client's live environment for 2 years. Why?
We found that .NET 1.1 actually recovers the response stream even though the documentation says otherwise. That is, there is a loophole in .NET 1.1 for people who do not close the response stream.
In .NET 2.0, that loophole is closed. Therefore, the above codebase hangs the server if upgraded to .NET 2.0.

Lessons we learnt
This was a contentious engagement with accusations and defenses about what went wrong. But we learnt a few lessons that I am documenting here:
1. When we upgrade or make a change in an environment and the server starts misbehaving ALWAYS suspect the upgrade or the change. With automatic windows updating and external system dependencies, it is easy to dismiss a problem as coincidental to an upgrade. Most of the time that is not the case. Most of the time it is a change that was consciously made.
In our case, the code that we actually upgraded to 2.0 was working fine. It was the legacy 1.1 code that was running in the 2.0 runtime that showed issues. Therefore, from the first, we thought the upgrade was NOT the cause. Of course, it turned out it was.

2. Always suspect code. If threads hang or we get weird errors in an environment with load, it is natural to suspect that the code is not the problem. We tend to suspect external system configurations, any change to configuration, or hardware issues. In my experience, things keep coming back to the code. Suspect the code first and do not be afraid of debugging in the server with trace statements, file logs - anything.

3. Always check connections to external systems. More specifically, suspect resources such as database connection pools, socket connections, web services. The above point (point 2) should be used in conjunction with this - instead of randomly checking the code or code reviews, check wherever external connections such as sockets or database readers are opened.

4. One thing that helps immensely is a simple stress test with a tool such as Web Application Stress Tool (WAST) from Microsoft. It is free and there is no reason not to record a session and run a test with 50 threads for 30 minutes every time you make major changes. I also suggest running such a test in every environment - not just staging or live. Run it in your own computer, if possible.

5. Do not hesitate to put trace statements or Response.Writes or writes to a file in code in the server - if time is of essence. This is an advantage with ASP.NET 2.0's dynamic compilation or even legacy ASP and PHP. I found the threads hanging by using System.IO.File.WriteAllText method. It takes a path+filename and a string and writes the string to that file. By putting debug statements in different parts of the code, I could find the last place where the code was stuck.


Update
The performance counters that were most useful during this issue were the ASP.NET 2.0 (or ASP.NET 1.1) object with the counters Requests Queued and Requests Current.
During stress testing the criteria that was most useful in WAST was Socket Timeouts. There were many timeouts and after fixing the issue there were none.

Labels: , , ,