Partitioning with Dates in MySQL 5.1
Although I am still busy with writing my thesis paper, I do need a distraction from time to time. Since I started getting phpOpenTracker questions again recently, I decided to (slowly) start working on phpOpenTracker 2.0 again. phpOpenTracker 2.0 will be a complete rewrite, utilizing the features of PHP 5.2 and MySQL 5.1.
The biggest problem with phpOpenTracker 1.x is that its database schema is normalized. Even simple phpOpenTracker API calls result in (possibly multiple) SQL queries that involve multiple tables. These (more often than not rather complicated) JOINs degrade in performance the more data is stored in the database. The database schema for phpOpenTracker 2.0 is denormalized and only consists of two tables (current draft).
While the denormalization should eliminate the JOIN-related performance problems, temporal partitioning of the data can further increase the performance. The backend for MySQL in phpOpenTracker 1.x optionally uses MyISAM Merge Tables for this: Rows for table X that are INSERTed on day D of month M of year Y are not INSERTed into table X directly, but into the MyISAM table X_yyyymmdd. Table X_yyyymm is a MyISAM Merge Table that aggregates all X_yyyymmdd tables of that month, X_yyyy is a MyISAM Merge Table that aggregates all X_yyyymmdd tables of that year. X finally aggregates all X_yyyymmdd tables. SELECT queries that operate on this table X and contain BETWEEN expressions that operate on the timestamp column of the table are transparently rewritten to use the smallest table possible (X_yyyymmdd, X_yyyymm, X_yyyy, X), thus reducing the amount of data that needs to be looked at to process the query.
The Partitioning feature of MySQL 5.1 should make this application-level partitioning with dates obsolete.
- [Partitioning] enables distributing portions of individual tables across a filesystem, according to rules which can be set when the table is created. In effect, different portions of a table are stored as separate tables in different locations, but from the user point of view, the partitioned table is still a single table. [...] Queries against partitioned tables can take advantage of partition pruning. In some cases, this can result in query execution that is an order of magnitude faster than the same query against a non-partitioned version of the same table.
Another issue I have with the partitioning feature in MySQL 5.1 is the rather verbose syntax for defining such a partition:
mysql> CREATE TABLE part_date3 -> ( c1 int default NULL, -> c2 varchar(30) default NULL, -> c3 date default NULL) engine=myisam -> partition by range (to_days(c3)) -> (PARTITION p0 VALUES LESS THAN (to_days('1995-01-01')), -> PARTITION p1 VALUES LESS THAN (to_days('1996-01-01')) , -> PARTITION p2 VALUES LESS THAN (to_days('1997-01-01')) , -> PARTITION p3 VALUES LESS THAN (to_days('1998-01-01')) , -> PARTITION p4 VALUES LESS THAN (to_days('1999-01-01')) , -> PARTITION p5 VALUES LESS THAN (to_days('2000-01-01')) , -> PARTITION p6 VALUES LESS THAN (to_days('2001-01-01')) , -> PARTITION p7 VALUES LESS THAN (to_days('2002-01-01')) , -> PARTITION p8 VALUES LESS THAN (to_days('2003-01-01')) , -> PARTITION p9 VALUES LESS THAN (to_days('2004-01-01')) , -> PARTITION p10 VALUES LESS THAN (to_days('2010-01-01')), -> PARTITION p11 VALUES LESS THAN MAXVALUE ); Query OK, 0 rows affected (0.00 sec)