SQL Index types - Teh Forum

From:	Peter (BOUGHTONP)	25 May 2008 22:09
To:	ALL	1 of 5

34429.1
Can anyone provide a good explanation of the differences between index types. Primarily bothered with MySQL, but a general one would be fine. Have been searching, but Google seems determined to return nothing but worthless shit. :( `3.1415P265E589T932E846R64338`

0/0

Reply Quote

More

From:	Matt	25 May 2008 22:55
To:	Peter (BOUGHTONP)	2 of 5

34429.2 In reply to 34429.1

(I have a feeling I'm missing something here ... )

I would have thought they're pretty self explanatory?

KEY is your basic column index. It's a quick reference to values stored in rows for the column the index is applied to. Indexes speed up SELECTS by allowing MySQL to scan the index for matches rows rather than having to load and scan the entire table which it would do in the case of no [applicable] indexes.

UNIQUE as it's name implies requires rows to contain unique values for the indexed column. As far as I'm aware, it alone cannot aid MYSQL in optimising SELECT statements like a KEY index can.

PRIMARY KEY is both UNIQUE and INDEX.

FULLTEXT indexes individual words in a CHAR, VARCHAR or TEXT column.

There isn't much more to it than that.

Unless you mean the storage methods for indexes. Which can be HASH (only available to MEMORY / HEAP tables), BTREE or RTREE, but they're just the file format that the index is stored in. Never bothered to find out if there is any advantage of using one over the other.

doohicky

0/0

Reply Quote

More

From:	Peter (BOUGHTONP)	25 May 2008 23:33
To:	Matt	3 of 5

34429.3 In reply to 34429.2
Heh, sorry, was too vague in my original post. :S quote: Unless you mean the storage methods for indexes. Which can be HASH (only available to MEMORY / HEAP tables), BTREE or RTREE, but they're just the file format that the index is stored in. Never bothered to find out if there is any advantage of using one over the other. That's the bit I'm after. I know there are differences, but not how significant they are. I think HASH is better for direct lookups (eg: find nickname from uid), and in general has less insert/update overhead than BTREE, however BTREE is self-organising, whereas a HEAP/HASH needs to be regularly optimized else performance goes down the pan. I don't even know if that is properly accurate - I think I may be mixing up stuff about table types. :S During my searches I found a PostgreSQL page saying that the pgSQL implementation of HASH is shit and everyone should always use BTREE. No idea if that applies to MySQL or not. `3.1415P265E589T932E846R64338`

0/0

Reply Quote

More

From:	Matt	25 May 2008 23:47
To:	Peter (BOUGHTONP)	4 of 5

34429.4 In reply to 34429.3
According to the MySQL Manual RTREE is used only for an INDEX defined with SPATIAL, otherwise BTREE is the default. I can't find much on BTREE, but this page contains lots of information on MySQL's implementation of "Spatial Extensions" / RTREE. RTREE seems to be a Oracle thing originally if that's any help.

0/0

Reply Quote

More

From:	Mal (BAD)	1 Jun 2008 02:12
To:	Peter (BOUGHTONP)	5 of 5

34429.5 In reply to 34429.3

From a general programming perspective:

An R-tree is a data structure for indexing rectangles, which I guess would tally with Matt's comment about a having key which represents spatial data, ie. a point, line or area.

A B-tree is a similar but much simpler construct for indexing scalar values.

Your summary sounds reasonable though. Hash lookup efficiency is very much dependant on the suitability of the hash function with respect to your dataset. If your data clusters around a few hash values rather than being spread out over many then you are likely to get collisions where two or more key values generate the same hash value. This would be the point where your performance plummets as you can't go directly from the hashed value to the record you desire because you have to then search through all the records with that hash. I guess this is the point you have to optimise your table.

B/R-Trees are not subject to such collision problems but require more work when constructing the index (ie adding/removing records) because they optimise as they go along: Trees are most efficient when they are balanced - ie. at each decision point when traversing the tree there are roughly equal numbers of records in all directions. Most decent tree implementations are self-balancing to a degree, and this is where most of the work needs to be done when adding/removing records. However, it is usually possible to design a pathological dataset that would result in a severely unbalanced tree and if your dataset were unfortunate enough to conform to this then you would need to regenerate the tree to regain efficiency (this process would be largely analogous to optimising your hash table).

In summary, your summary would sound plausible :)

back in a flash! and as dangerous as ever...

0/0

Reply Quote

More