Yesterday I attended a good talk at ZendCon 2007 by Eli White. One of the better pieces of take away was his explanation of the three different types of partitioning for your data at the database layer. These are just my notes from the talk. This is not great literature.
The three types mentioned were:
- Horizontal
- Vertical
- Application Level
Horizontal Partitioning:
The example given would be something like splitting large tables in two (like 50% rows in 1DB, 50% in a second)
Several types of Horizontal Partitioning schemes were mentioned.
Range Based - partition on some range of items
Date Based - partition based on date
Interlaced - hash them out based on some modulus (lots of buckets)
User Based - someusers on one partition others on another
Another example called out was to possibly keep a last 30 days table, and everything table somewhere for when you need it.
Vertical Partitioning - Moving Various columns of your table into different tables
Various types:
- move rarely used columns into aux tables
- move often empty columns into aux tables
- move columns that are not used in where clauses
Application Level Partitioning (I also call this functional partitioning sometimes)
- move single tables to specific servers
- move groups of related tables together to allow joining
- move entire functions (like search) to their own tables/databases
Making choices on where to partition has a lot to do with how and where you want to break joins. So, consider this carefully.
Automate the understanding of your partitions(shards) or you'll be killing yourself w/ complexity. Write code to keep up with it or even create a specilized database to store the relationship data.