Teh Lack of Speed

From: Dave!! 5 Jul 2006 11:35
To: Drew (X3N0PH0N) 33 of 72
Or 4) Look around for new hosting.
From: Drew (X3N0PH0N) 5 Jul 2006 11:50
To: Dave!! 34 of 72
S'pose. Seems like overkill though. We shouldn't need a dedicated server for a web forum.
From: Peter (BOUGHTONP) 5 Jul 2006 12:00
To: Drew (X3N0PH0N) 35 of 72
It wouldn't need to be a cron job - it could be a manual admin function. This place has got, what, five years of posts? So even on busier places it'd probably not need to be run more than yearly, thus speed isn't a huge issue; people expect/accept that archive operations take a period of time.


The search page could have an additional option:
Search [current|archived|both]
Or maybe it could be automatic based on the from/to dates selected, with a message signifying whether archived messages would be searched.


As for the difficultly of implementing it, I don't agree with that either.
You could keep the existing queries as they are, and if relevant do a second SELECT on the archived bits and UNION it to the initial query.


And finally, regarding the no-one will make it bit, I will be working on integrating Beehive with my next site in (hopefully) a month or two, so if nobody else does it I may have a look at doing it.


:)
From: Dave!! 5 Jul 2006 12:10
To: Drew (X3N0PH0N) 36 of 72
True, but this server has always been slow. It's never been as fast as some of our previous hosting has been. Granted it's mostly been faster than recently, but never blazingly quick.
From: ian 5 Jul 2006 12:32
To: Drew (X3N0PH0N) 37 of 72
quote: Homer Simpson
If something's hard to do then it's not worth doing.
From: Drew (X3N0PH0N) 5 Jul 2006 12:37
To: Peter (BOUGHTONP) 38 of 72

Hmm. I wasn't really thinking of it as anything more than a one-off. If it would be a useful feature generally then, aye, makes it worth it.

 

It'd still take ages to execute here and that is a problem.

 

As for complexity - yeah, it could be done as you say but... why bother? That offers no advantage over just making this forum read-only and starting again. Well, little advantage.

 

If it were to be done properly it would happen in a sort of rolling way. And that would be hard. By which I mean not having to run a script to do the archiving but mechanisms put in place whereby older posts (beyond an arbitrary threshhold) aren't involved in queries. But... I have no idea how that might be done.

 

But aye, if you're up for making it then that's cool by me, of course.

From: THERE IS NO GOD BUT (RENDLE) 5 Jul 2006 12:38
To: ALL39 of 72

I'm just looking at the message SQL with my optimizing hat on, and there's quite a lot that could be done to reduce the load on the database there.

 

The SQL includes a join to the THREAD table from every post, which is redundant. The values from the THREAD table can be retrieved in a single SELECT before the message SQL is run.

 

There are four joins to the user table, one of which (APPROVED_USER) is only relevant if posts have to be approved by an administrator, and another (EDIT_USER) only if the post has been edited by someone other than the original author. The APPROVED_USER join could be added in conditionally, and the EDIT_USER could be done using singleton SELECTs after the main select is finished.

 

This one might be a stretch: there are two joins to the USER_PEER table for the relationship and the new peer nickname thing. It might make more sense to do a single select against the USER_PEER table for the logged-in user and cache the result in an array, then get the values from that. It kind of depends on how many peers people have, on average. I've got 23, mostly ignoring sigs. Some testing might be advisable for this one.

From: THERE IS NO GOD BUT (RENDLE) 5 Jul 2006 14:16
To: ALL40 of 72
Also, I reckon the slow performance in the threadlist could probably be improved just by deleting most of the records in the USER_THREAD table, e.g. ones for threads which haven't been updated in the last three(?) months.
From: Peter (BOUGHTONP) 5 Jul 2006 14:26
To: THERE IS NO GOD BUT (RENDLE) 41 of 72
Would that not make all the old threads show up as unread?
From: andy 5 Jul 2006 14:38
To: THERE IS NO GOD BUT (RENDLE) 42 of 72
That's what Delphi used to do, I remember. Although maybe their cut-off point was closer to 6 months. Anyway.

(Pete: You just modify the query to be "SELECT [unread] AND [most-recent-post-in-thread-date > NOW() - 3months]").
From: THERE IS NO GOD BUT (RENDLE) 5 Jul 2006 14:56
To: Peter (BOUGHTONP) 43 of 72
Yes, yes it would. I submit that that doesn't really matter one tiny little bit.
From: Matt 5 Jul 2006 18:49
To: THERE IS NO GOD BUT (RENDLE) 44 of 72

Done all that (the joins I mean). It's not in CVS yet, but it's live here. The messages_get function now only uses POST, USER_PEER (x2) and USER (x2). The join to the thread table now uses the previous call to thread_get() in messages.php and the other joins to the USER table have been moved to separate queries to be done only if the post needs approval or the edit message needs displaying.

 

I've been running a profiler against the code on my own machine (with a copy of the database from here) and noticeable bottle necks are the emoticons code which takes up to 500ms and most noticeably the threads_any_unread() function which is taking at least 1 second to complete(!), compared to the next longest running function being the RSS feed checker which takes 7ms. The threads_any_unread function is usually the cause for the load here. I've tried fiddling with it and adding indexes to the tables and rewriting the query to better use said indexes but it makes diddly squat. It even has a LIMIT clause on it now when it never used to but even that makes no difference, so if it's not the USER_THREAD table that's causing I'm stumped as to what it could be.

EDITED: 5 Jul 2006 18:53 by MATT
From: andy 5 Jul 2006 18:59
To: Matt 45 of 72
The emoticons code as in running that preg_replace against every post at runtime? There any other way to do it? (besides the way it was originally set, converting the emoticons at post-time)
From: Matt 5 Jul 2006 19:28
To: andy 46 of 72
It's the calls to file_exists() in Emoticons->Emoticons() that is the resource hog. It only takes 0.4ms each time to call file_exists() but it's being called quite often. I'm not sure if there is a way around it, maybe initialize the emoticons class within the emoticons.inc.php itself (like the gzipenc.inc.php script does) so when another script includes it it's done automatically. Also maybe don't use file_exists() but rather suppress the error from include() when it doesn't find the definitions.php file.

On my test page, I have 20 posts all with the same number of emoticons and convert() only takes ~35ms to process all 20 of them.
From: andy 5 Jul 2006 19:35
To: Matt 47 of 72
That link just takes me to your portfolio page..
From: Matt 5 Jul 2006 19:45
To: andy 48 of 72
From: Matt 5 Jul 2006 21:47
To: andy 49 of 72
Am I right in thinking this loop below is to weed out any conflicts between the emoticons:

code:
for ($i = 0; $i < $e_keys_size; $i++) {

    for ($j = 0; $j < $e_keys_size; $j++) {

	if ($i != $j) {

	    if (($pos = strpos(strtolower($e_keys[$j]), strtolower($e_keys[$i]))) !== false) {

		$a = $e_keys[$j];
		$b = $e_keys[$i];
		$v = $emoticon[$a];
		$a2 = urlencode($a);

		$a_f = preg_quote(substr($a, 0, $pos), "/");
		$a_m = preg_quote(urlencode(substr($a, $pos, strlen($b))), "/");
		$a_e = preg_quote(substr($a, $pos +strlen($b)), "/");

		$pattern_array[] = "/". $a_f."<span class=[^>]+><span[^>]*>".$a_m."<\/span><\/span>".$a_e ."/";
		$replace_array[] = "<span class=\"e_$v\" title=\"$a2\"><span class=\"e__\">$a2</span></span>";
	    }
	}
    }
}


So it will find :o and :o) and ensure that :o) is matched before :o so :o doesn't replace :o) and leave us with a stray closing bracket and the wrong emoticon showing?

Right?

I'm wondering if the same could be accomplished by simply sorting the emoticon match text array by length with the longest first? Could do that with a single function call see.

Or am I way off base as to what that loop does?
From: andy 5 Jul 2006 22:25
To: Matt 50 of 72
See this is why you're meant to comment your code, Matt. But from staring at it for a bit I'm pretty sure you're right. Which is pretty dumbass, as like you say sorting by length should have the same effect.

And I can access your server through that link. What was I meant to be looking at again? Is that your modified version of the emoticons code or something?
From: Stoo 5 Jul 2006 22:31
To: ALL51 of 72
On a minor note, it's lovely and fast right this very second.
From: Matt 5 Jul 2006 23:13
To: andy 52 of 72

Yes it is the modified code.

 

Emoticons->Emoticons() now executes quicker than the convert function. Hooray.