Systasis
 
Thursday, 7 November 2024 GMT
[ journal | e-cards | reminders | surveys | about ]
Articles
[next]

20/3/2004: The Nits in Paradiso


The train


Theatre review: Purple Heart


Coding conventions we are fond of

[previous]
Your comments appear at once, but will be censored if they don't follow the usual etiquette guidelines. We do not require registration in order to encourage dialogue, so please don't abuse this courtesy. HTML tags allowed (in the message body only) are B, I and A, hence you should use &lt; instead of <, unless opening the specific tags.
Coding conventions we are fond of
Tuesday, 15 July 2003 Symeon Charalabides (symeon@systasis.com)

Every developer has their own coding style, influenced by personal habits and preferences, their working environment, the software they use and the kind of projects they are geared towards, to name but a few. In this article, we wish to present the stronger habits we have developed after 2 years of developing PHP/MySQL applications and about 6 writing HTML. No evangelism is intented, merely a description and explanation of specific choices. Another drip into the pool of cumulative experience, if you will.


[HTML]

[+] We always use hexadecimal values to describe colours and never their names. This way, an extended search/replace process will harm fewer irrelevant things, whether that is colour or text. The single notable exception to this rule is the silver quote sheets on the articles of Systasis, which is done because MySQL dumps comment out everything between a hash and a carriage return.

[+] We do not use double (and certainly not single) quotes unless we have to. In fact, I can't remember quoting any HTML except for ALT tags, FACE tags and javascript events. Is this practice W3C-compliant? No. Is it legal XHTML? Neither. Yet, there isn't a single web browser that cringes at the practice and, given the declining progress of the genre which shows all signs of continuing, neither is it likely to happen any time soon. If/when it does, though, I know I won't be answering any calls for a while...
The reason for this choice is that it saves a lot of space, both on the disk and on the screen, certainly much more than those puny 'compressors' do by slashing newlines and double spaces and making the code unreadable, while at the same time greatly enhances the legibility of the code.

[+] We write HTML tags in uppercase only. Again, a case-sensitive search/replace process will sift through both code and text and only change the required parts this way.


[PHP]

[+] Text is output via print and never echo. Certainly, echo is faster, by an astonishing 3% in fact, a difference that drops with the amount of variables contained in the output string. The difference is negligible in contrast to the fact that print is a proper function, not a mere construct and returns a value. This means that the command

$aprilfools== 1 and print 'We demand more cream of celery soup.';

will not work with echo. Neither will, for that matter, the following

$chrystus== 1 or print 'Cry "havoc" and let slip the dogs of war!';
$prewarp== 1? print 'unitol': print 'sublith';

although the following variant works fine:

echo ($prewarp== 1)? 'unitol': 'sublith';


The reason why we don't use both commands is that we want to be able to collect all the possible output for a whole website with a single extended search. Probably greedy, especially since "print" is more likely to be found as text than "echo", but I find solace in uniformity. In addition, print just looks better. I'm sure!

[+] We always wrap strings in single quotes and never in double ones. The parser goes through them faster, since it doesn't look for variables to interpret, but that is not the reason. Variables should be appended to strings and not included anyway. Even with colour-coding, the difference between the two lines

print 'This '.$size.' ultra-rich '.$type.' must be deprived of its '.$monster.' pronto.;
print "This $size ultra-rich $type must be deprived of its $monster pronto";

can equal the difference between spending Saturday morning on the promenade or at the optician's.
The other reason is, of course, that such strings often contain HTML code which sometimes contains double quotes but hardly ever single ones, so there is less need to backslash them out. There: visually cleaner code again. Story of my life.

[+] We use the .php extension for all our servable text files. Recently, we even had to change all the filenames of PHPMyChat on webmate.gr because they were all .php3 and wouldn't automatically be served any more. Needless to say, a single extended replace routine took care of their contents.
Uniformity again, but there is an unfortunate trend to use included and other help files with non-standard extensions (.inc comes to mind). This practice can easily lead to embarrassment, not to mention the ultimate security hole: you can find one of my most cherished includes here. The problem is obvious, and the simple step of renaming the file in question to "dummy1.php" would solve it for good.

[+] Initialising all variables on a script is not as cumbersome as it sounds, even when done after the script has gone live (the whole of Systasis was converted this way, and it only took a couple of hours). It makes good sense as a practice partly because of the potential security issue that is fixed and partly because the PHP development team keeps threatening to issue the next version with register_globals=off by default, which would effectively crash 10% of the internet...

[+] We try to flush long variables after they are past their usefulness. This is actual advice given by Rasmus Lerdorf himself in the outstanding book "Programming PHP". By assigning a short string to a variable that previously contained a long one, memory is freed up. A related coding technique would be to keep reusing a few temporary variables.

[+] mt_seed and mt_rand invariably run about 30% slower than seed and rand, on Windows at least. The latter duo also seems to be more random, but this might as well not be so. These are the ones we use in any case, since we have never been too concerned with the quality of our variable number generators (and we get to save 6 bytes, too).

[+] Defining a word as... anything has proved to be a whooping 130% more time-consuming than assigning that anything to a new variable. Consequently, we never use define, only variables.


[PHP / MySQL]

[+] We only use mysql_fetch_array and always with column names instead of numbers. The latter is quite obvious: you don't need to look at the table structure to realise which rows the command is hassling, and you don't need to go back to your code and rewrite it if you ALTER a table. So why not use mysql_fetch_assoc instead? Because it wasn't implemented before PHP 4.0.3. Backwards compatibility is not a serious issue (I am not aware of any provider that habitually downgrades), but a few safeties like this one or fixing potential security issues can contribute to a healthier and altogether more relaxing sleep at nights.
In fact, we even use it over mysql_result, which usually turns out to be slower (!) anyway. There is no problem, as far as the parser is concerned, with the command

$album= mysql_fetch_array(mysql_query('SELECT * FROM albums WHERE artist="Steve Stevens"'));

Like with previous examples, it is practical to be able to have a single search/replace routine dish out all your database digs.

[+] We usually use time() for the unique ID column of a table, especially for tables that are manually fed. Between 1 January 1970 and 19 January 2038, and as long as rows are not inserted faster than 1 per second, the value returned by time() is unique. It is also only 10 numerical digits when uniqid() returns 13 alphanumeric ones and thus the column can be declared as decimal (10,0) instead of varchar(13).
On top of this performance boost, no date column need be declared, as the unique ID itself carries the date of first insertion down to the second. An extra date column should be defined if such information as a last update is required, since it is seldom practical and never a good idea to change a unique ID.

[+] Using a database is no panacea for performance issues. The connection overhead can be significant and a possible bottleneck in busy systems. Databases are great at handling complex operations, sorting output and moving data around. However, when single lists of one or even two entities are required, PHP is often faster at handling the associated issues itself, using arrays or hash tables respectively and simple file read/write commands.
To illustrate the point, we present the mailing list algorithm we've used on several websites. The prompt is the standard text field for an e-mail address, plus two radio buttons for subscription/unsubscription, all residing in the page "list.php". The code that, on submission, picks up the tale is as follows:

<?php
$email= strtolower(trim($email)) or exit(header('Location:list.php'));

eregi("^[a-z0-9~!#$%&_-]+(\.[a-z0-9~!#$%&_-]+)*@[a-z0-9~!#$%&_-]+(\.[a-z0-9~!#$%&_-]+)*(\.[a-z]{2,4})$", $email)
or exit(header('Location:list.php?msg=1'));

include 'mailinglist.php';

foreach ($mlnglst as $value)
$email!= $value
and $new[]= $value;

$action== 'subscribe' and $new[]= $email;

$fp1= fopen('mailinglist.php', 'w');

fwrite($fp1, '<?php'."\n".'$mlnglst=array('."\n");

if (is_array($new)) {
sort($new);
foreach ($new as $value)
fwrite($fp1, "'".$value."',\n");
}
fwrite($fp1, ');'."\n".'?>');

$msg= ($action== 'subscribe')? 2: 3;
header('Location:list.php?msg='.$msg);
?>


The code reads in, and subsequently overwrites, the file "mailinglist.php" which defines $mlnglst as an array of e-mail addresses. The array is sorted before being written again. The code can exit back to "list.php" in various points with the relevant message. The pure process only takes up 12 lines of very sparse code and takes much less time to complete than it would on a similar MySQL table.

Thus is this article concluded. There are many more ways to optimize the execution time and memory requirements for both PHP and MySQL. The latter, especially, has a whole chapter of its manual dedicated to the subject, and it is well worth reading. Advice specific to PHP can be found on countless articles like this one. HTML is certainly less of an issue. Though such advice is good to know, it is not necessarily good to heed and we should only adopt habits when they suit our personal preferences and intentions.

 Send this article
Printer-friendly form
Your e-mail: remember
Recipient's e-mail: remember
 Comments - [ write ]
On a more recent note... (1/12/03, symeon@systasis.com)
  IP: 44.220.255.141 OS:   Agent:     Overhead: 0.006 sec info@systasis.com