Teh Lack of Speed

From: Kriv 3 Jul 2006 20:59
To: Matt 23 of 72

HTML wouls stop old posts being archived, therfore stopping the need for intense queries... I'd imagine.

 

While I couldn't help.. I can donate my PC and bandwidth for conversion purposes if needed in some way.

From: Drew (X3N0PH0N) 4 Jul 2006 05:05
To: Matt 24 of 72

Whenever this happens, I'm all for archiving it.

 

I know you've explained this, but I think I didn't understand the explanation. What's the problem (in terms of performance) in making this forum read-only and then starting afresh?

 

I mean, I realise the archived forum would still have the same performance issues but surely it would be used so little as to not matter? It would be used as much as people use the search function currently, at most. Which for me, is very very seldom. Regardless, the overall load should be much lower shouldn't it?

 

If for some reason that's not practical, how about dumping this database, getting someone to host this forum read-only somewhere and again, starting afresh here. The archive might fall over a bit but... better that than this place suffering. The slowness is not the fault of the software or whatever, just the size of this place. Butif it's keeping people from posting then we need to make sacrifices I think.

From: Manthorp 4 Jul 2006 08:00
To: ALL25 of 72
Incidentally, and I assume this serves to illustrate Matt's analysis, it loads better for me - or rather, less slowly - in the ungodly hours of the morning.
From: Dave!! 4 Jul 2006 09:09
To: Manthorp 26 of 72
It's faster for me in the mornings as well, but with onl 6 members online compared with some afternoons and evenings when there's 20-30 online and there you go.
From: Peter (BOUGHTONP) 4 Jul 2006 20:41
To: Matt 27 of 72
What about using bind variables?

Bah, looks like it would mean restricting Beehive to PHP5+MySQL 4.1 (or later),
but it should give a significant boost to query performance; maybe it could be added as a branch/mod?



Also (this may already be done, but I couldn't see anything on a quick glance), what about having shortcut thingies - um, by which I mean something similar to this:
Pseudo-PHP code:
if (looking_for_unread_messages && $_SERVER['LAST_BH_MESSAGE'] > $_SESSION['USER_LAST_THREADLIST'])
{
// perform query as normal
}
else
{
// do cut-down query or cached query or whatever
}

ie: Store a server-wide variable of the last message post time, and use it to determine if there's any point even running a [complete] query.
From: steve 4 Jul 2006 23:29
To: Peter (BOUGHTONP) 28 of 72
I thought about cached stuff to cut down little queries. Although I guess there's not many like that on here. On my bh101 forum all the user details are cached, so if they are needing in a later post on the page they are there ^_^
From: milko 4 Jul 2006 23:39
To: ALL29 of 72

Is there any way of compromising somehow? This feels like wishful thinking but I do not know. Anyway - somehow semiarchiving old stuff so it isn't actually affecting anything except on the rare occassion someone isn't going through old threads. Hmm, no, this doesn't sound likely. Hm.

 

having it all separate and readonly feels wrong somehow.

From: Peter (BOUGHTONP) 5 Jul 2006 10:41
To: milko 30 of 72
We could convert old stuff to static HTML files, and then use a search engine to index both the database and files together.

Lucene is an Apache open-source one of those, which can be integrated with PHP via Zend, although again it appears to be a PHP5 solution.
From: ian 5 Jul 2006 11:06
To: ALL31 of 72

I'm not entirely sure if I know what I'm talking about but would it not be possible to periodically move posts/threads etc. that are older than a certain amount of time (say a year) to a seperate table (or tables) which are then only ever searched when absolutely neccessary?

 

So regular thread list 'stuff' would only look at the 'current' table(s) wheras the Search might look at both the 'current' and the 'archive' table(s) using whatever SQL magic that requires.

 

Any time a post is retrieved it would compare its number to the 'last-archived-thread' and based on that decide which table(s) to get it from. For the most part it would only require recent posts so only ever do stuff with the leaner 'current' table(s), thereby making everything faster, finding a cure for AIDS and achieving world peace.

 

Does that make sense? Am I repeating people? Can I go home yet?!

 

Hmm. But then posting in a thread from 3 years ago might cause problems, or would it? I don't know.

From: Drew (X3N0PH0N) 5 Jul 2006 11:28
To: ian 32 of 72

I think the main problem with that is it would be hard as fuck to make. And would pretty much be a one-off. So no one will make it.

 

And also the script which does the splitting would have to be run as a cron job or something like that. And would be, I reckon, slow as fuck.

 

(That's not to say it's a bad idea, it would be the ideal solution, just I don't see it happening)

 

I think we have three choices:

 

1. something like you suggested
2. carry on as we are
3. start again and archive the rest somewhere

 

I think 1 won't happen and 2 shouldn't happen so I like 3.

From: Dave!! 5 Jul 2006 11:35
To: Drew (X3N0PH0N) 33 of 72
Or 4) Look around for new hosting.
From: Drew (X3N0PH0N) 5 Jul 2006 11:50
To: Dave!! 34 of 72
S'pose. Seems like overkill though. We shouldn't need a dedicated server for a web forum.
From: Peter (BOUGHTONP) 5 Jul 2006 12:00
To: Drew (X3N0PH0N) 35 of 72
It wouldn't need to be a cron job - it could be a manual admin function. This place has got, what, five years of posts? So even on busier places it'd probably not need to be run more than yearly, thus speed isn't a huge issue; people expect/accept that archive operations take a period of time.


The search page could have an additional option:
Search [current|archived|both]
Or maybe it could be automatic based on the from/to dates selected, with a message signifying whether archived messages would be searched.


As for the difficultly of implementing it, I don't agree with that either.
You could keep the existing queries as they are, and if relevant do a second SELECT on the archived bits and UNION it to the initial query.


And finally, regarding the no-one will make it bit, I will be working on integrating Beehive with my next site in (hopefully) a month or two, so if nobody else does it I may have a look at doing it.


:)
From: Dave!! 5 Jul 2006 12:10
To: Drew (X3N0PH0N) 36 of 72
True, but this server has always been slow. It's never been as fast as some of our previous hosting has been. Granted it's mostly been faster than recently, but never blazingly quick.
From: ian 5 Jul 2006 12:32
To: Drew (X3N0PH0N) 37 of 72
quote: Homer Simpson
If something's hard to do then it's not worth doing.
From: Drew (X3N0PH0N) 5 Jul 2006 12:37
To: Peter (BOUGHTONP) 38 of 72

Hmm. I wasn't really thinking of it as anything more than a one-off. If it would be a useful feature generally then, aye, makes it worth it.

 

It'd still take ages to execute here and that is a problem.

 

As for complexity - yeah, it could be done as you say but... why bother? That offers no advantage over just making this forum read-only and starting again. Well, little advantage.

 

If it were to be done properly it would happen in a sort of rolling way. And that would be hard. By which I mean not having to run a script to do the archiving but mechanisms put in place whereby older posts (beyond an arbitrary threshhold) aren't involved in queries. But... I have no idea how that might be done.

 

But aye, if you're up for making it then that's cool by me, of course.

From: THERE IS NO GOD BUT (RENDLE) 5 Jul 2006 12:38
To: ALL39 of 72

I'm just looking at the message SQL with my optimizing hat on, and there's quite a lot that could be done to reduce the load on the database there.

 

The SQL includes a join to the THREAD table from every post, which is redundant. The values from the THREAD table can be retrieved in a single SELECT before the message SQL is run.

 

There are four joins to the user table, one of which (APPROVED_USER) is only relevant if posts have to be approved by an administrator, and another (EDIT_USER) only if the post has been edited by someone other than the original author. The APPROVED_USER join could be added in conditionally, and the EDIT_USER could be done using singleton SELECTs after the main select is finished.

 

This one might be a stretch: there are two joins to the USER_PEER table for the relationship and the new peer nickname thing. It might make more sense to do a single select against the USER_PEER table for the logged-in user and cache the result in an array, then get the values from that. It kind of depends on how many peers people have, on average. I've got 23, mostly ignoring sigs. Some testing might be advisable for this one.

From: THERE IS NO GOD BUT (RENDLE) 5 Jul 2006 14:16
To: ALL40 of 72
Also, I reckon the slow performance in the threadlist could probably be improved just by deleting most of the records in the USER_THREAD table, e.g. ones for threads which haven't been updated in the last three(?) months.
From: Peter (BOUGHTONP) 5 Jul 2006 14:26
To: THERE IS NO GOD BUT (RENDLE) 41 of 72
Would that not make all the old threads show up as unread?
From: andy 5 Jul 2006 14:38
To: THERE IS NO GOD BUT (RENDLE) 42 of 72
That's what Delphi used to do, I remember. Although maybe their cut-off point was closer to 6 months. Anyway.

(Pete: You just modify the query to be "SELECT [unread] AND [most-recent-post-in-thread-date > NOW() - 3months]").