Partitioning with Dates in MySQL 5.1

Sebastian Bergmann » 31 January 2007 » in MySQL » 8 Comments

Although I am still busy with writing my thesis paper, I do need a distraction from time to time. Since I started getting phpOpenTracker questions again recently, I decided to (slowly) start working on phpOpenTracker 2.0 again. phpOpenTracker 2.0 will be a complete rewrite, utilizing the features of PHP 5.2 and MySQL 5.1.

The biggest problem with phpOpenTracker 1.x is that its database schema is normalized. Even simple phpOpenTracker API calls result in (possibly multiple) SQL queries that involve multiple tables. These (more often than not rather complicated) JOINs degrade in performance the more data is stored in the database. The database schema for phpOpenTracker 2.0 is denormalized and only consists of two tables (current draft).

While the denormalization should eliminate the JOIN-related performance problems, temporal partitioning of the data can further increase the performance. The backend for MySQL in phpOpenTracker 1.x optionally uses MyISAM Merge Tables for this: Rows for table X that are INSERTed on day D of month M of year Y are not INSERTed into table X directly, but into the MyISAM table X_yyyymmdd. Table X_yyyymm is a MyISAM Merge Table that aggregates all X_yyyymmdd tables of that month, X_yyyy is a MyISAM Merge Table that aggregates all X_yyyymmdd tables of that year. X finally aggregates all X_yyyymmdd tables. SELECT queries that operate on this table X and contain BETWEEN expressions that operate on the timestamp column of the table are transparently rewritten to use the smallest table possible (X_yyyymmdd, X_yyyymm, X_yyyy, X), thus reducing the amount of data that needs to be looked at to process the query.

The Partitioning feature of MySQL 5.1 should make this application-level partitioning with dates obsolete.
[Partitioning] enables distributing portions of individual tables across a filesystem, according to rules which can be set when the table is created. In effect, different portions of a table are stored as separate tables in different locations, but from the user point of view, the partitioned table is still a single table. [...] Queries against partitioned tables can take advantage of partition pruning. In some cases, this can result in query execution that is an order of magnitude faster than the same query against a non-partitioned version of the same table.
Unfortunately, the implementation of partitioning in MySQL 5.1 has one limitation that makes me doubt its usefullness for phpOpenTracker: the maximum number of partitions possible for a given table is 1024. This means that a table that is partitioned into one partition per day can only hold data for less than three years (1024 / 365 = 2.8).

Another issue I have with the partitioning feature in MySQL 5.1 is the rather verbose syntax for defining such a partition:
mysql> CREATE TABLE part_date3
    ->      (  c1 int default NULL,
    ->  c2 varchar(30) default NULL,
    ->  c3 date default NULL) engine=myisam
    ->      partition by range (to_days(c3))
    -> (PARTITION p0 VALUES LESS THAN (to_days('1995-01-01')),
    -> PARTITION p1 VALUES LESS THAN (to_days('1996-01-01')) ,
    -> PARTITION p2 VALUES LESS THAN (to_days('1997-01-01')) ,
    -> PARTITION p3 VALUES LESS THAN (to_days('1998-01-01')) ,
    -> PARTITION p4 VALUES LESS THAN (to_days('1999-01-01')) ,
    -> PARTITION p5 VALUES LESS THAN (to_days('2000-01-01')) ,
    -> PARTITION p6 VALUES LESS THAN (to_days('2001-01-01')) ,
    -> PARTITION p7 VALUES LESS THAN (to_days('2002-01-01')) ,
    -> PARTITION p8 VALUES LESS THAN (to_days('2003-01-01')) ,
    -> PARTITION p9 VALUES LESS THAN (to_days('2004-01-01')) ,
    -> PARTITION p10 VALUES LESS THAN (to_days('2010-01-01')),
    -> PARTITION p11 VALUES LESS THAN MAXVALUE );
Query OK, 0 rows affected (0.00 sec)
Why am I not able to just say: "Partition this table over column t into one partition per day"?
Defined tags for this entry: , , ,

Trackback specific URI for this entry

8 Comments to "Partitioning with Dates in MySQL 5.1"

Display comments as (Linear | Threaded)
  1. Chuck Burgess
    31/01/2007 at 16:59 Permalink
    Using some BACK button judo, I found my initial comment text:
    --------------------------------------------------------------------

    I'm not following how that 1024 limitation translates into less than a year... looks like your calculation implies it would cover 2.8 years. I'm obviously missing something, which is why I'm asking.

    On another note, I'd have to have really hard-hit table, getting lots of inserts every day, to need to partition it out on a daily level. It seems to me this partitioning just saves time on how big a full table scan ultimately is, based on how many partitions are lumped together to make the "full table". I'd lean more towards setting a partition scheme based on how long on average it took my table to get several hundred new rows.

    Then again, I'm used to Oracle rather than MySQL, so maybe full table scans over a few hundred rows are more of a performance hit than I'm used to?

    Reply

  2. veridicus
    02/02/2007 at 16:57 Permalink
    "Then again, I'm used to Oracle rather than MySQL, so maybe full table scans over a few hundred rows are more of a performance hit than I'm used to?"

    Oracle optimization is often quite different than MySQL optimization. Although this temporal partitioning makes some issues similar.

    I agree that for Oracle it would be better to know how many rows are inserted over a period of time before decided how to split up the data. I haven't tried this feature in MySQL.

    Reply

  3. Chuck Burgess
    01/02/2007 at 16:52 Permalink
    Hey, did I completely misread that "years" thing, or did you correct it?

    Reply

  4. Sebastian Bergmann
    01/02/2007 at 16:55 Permalink
    No, I corrected it :-)

    Reply

  5. Chuck Burgess
    01/02/2007 at 17:14 Permalink
    OK, then let me add back the five points I deducted from my Sanity Rating...

    Reply

  6. Sam Stevens
    06/02/2007 at 03:02 Permalink
    Thanks for taking some of that ample free time you have and dedicating it to POT2! ;) I quite like POT but I do have some performance issues with it when I'm tracking medium traffic websites. POT2 however should present a noticeable improvement. I look forward to hearing more!

    Reply

  7. Christian Matthies
    25/02/2007 at 21:13 Permalink
    Well, with this db schema your tables will become incredibly big, don't you think so?

    By the way: Since we've talked last time I've hade some thoughts about how to implement a well performing logging engine. Sebastian Nohn suggested to log at first into files and then store that data into the database, which is the best solution I think.

    Nevertheless this doesn't solve your reading problem, therefore a denormalized schema is in fact the correct approach, however I think yours isn't the best yet. I'm going to test a different version, which I could mail you afterwards if you want.

    Reply

  8. Greg
    22/11/2010 at 18:19 Permalink
    Let me share with you this great post about time/date based partitioning and subpartitioning widh huge data flows:
    http://www.tritux.com/blog/2010/11/19/partitioning-mysql-database-with-high-load-solutions/11/1

    It demonstrate how partitioning can be a good solution for scalable architectures.

    Reply

0 Trackbacks to "Partitioning with Dates in MySQL 5.1"

  1. No Trackbacks

Add Comment


To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

Submitted comments will be subject to moderation before being displayed.